join syntax / style performance considerationWhy does the location of a join change performance?Syntax of...

How to remove lines through the legend markers in ListPlot?

Early credit roll before the end of the film

Find some digits of factorial 17

How to prevent users from executing commands through browser URL

Why would the Pakistan airspace closure cancel flights not headed to Pakistan itself?

How to deal with an incendiary email that was recalled

Is it a fallacy if someone claims they need an explanation for every word of your argument to the point where they don't understand common terms?

Pronunciation of umlaut vowels in the history of German

Why is mind meld hard for T'pol in Star Trek: Enterprise?

Am I a Rude Number?

Would the Vulcan nerve pinch work on a Borg drone?

Can a person refuse a presidential pardon?

One Half of Ten; A Riddle

Avoiding morning and evening handshakes

How long is the D&D Starter Set campaign?

Eww, those bytes are gross

How much mayhem could I cause as a sentient fish?

Why does the Spectator have the Create Food and Water trait, instead of simply not requiring food and water?

How to count the characters of jar files by wc

Table formatting top left corner caption

Difference between i++ and (i)++ in C

Why are the books in the Game of Thrones citadel library shelved spine inwards?

Finding a mistake using Mayer-Vietoris

If I deleted a game I lost the disc for, can I reinstall it digitally?



join syntax / style performance consideration


Why does the location of a join change performance?Syntax of INNER JOIN nested inside OUTER JOIN vs. query resultsImproving SQL Query joins performanceWhy does join syntax with stacked on clauses work?Syntax of INNER JOIN nested inside OUTER JOIN vs. query resultsView designer strange join syntaxConverting old SQL Server outer join syntaxEnable TRY style error handlingConsideration for a little bit heavy databaseCheck for existing matches to find the field their grouped-byNeed Help Refining A Queryjoin performanceStrange query plan when using OR in JOIN clause - Constant scan for every row in table













5















We recently found on one of our stored procedures that we get significant performance improvement by changing the join syntax/style of our query from this...



SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
JOIN dbo.TableD d -- <-- Nested join syntax
ON d.yyy = c.yyy
ON c.xxx = b.xxx


To this...



SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d -- <-- Regular way
ON d.yyy = c.yyy


Note: in the real query, there were 10 joined tables including inner and outer joins. The tables were not huge in terms of sql data. No aggregates. There was a DISTINCT on the output. All joins were to a primary key, but the foreign key was not necessarily indexed.



We will certainly change our ways, but I'm still curious as to the proper 'guidance' on such style. I've often used the 'indented' style to indicate a 'more readable' join for such things as lookup tables.










share|improve this question

























  • Were there outer joins involved or only inner joins?

    – ypercubeᵀᴹ
    Oct 28 '13 at 22:20






  • 1





    I don't think I would ever write the first form on purpose. It just doesn't look right.

    – Aaron Bertrand
    Oct 28 '13 at 22:26











  • @AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!

    – Max Vernon
    Oct 28 '13 at 22:30






  • 2





    I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.

    – Aaron Bertrand
    Oct 28 '13 at 22:32











  • I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?

    – Max Vernon
    Oct 28 '13 at 22:44


















5















We recently found on one of our stored procedures that we get significant performance improvement by changing the join syntax/style of our query from this...



SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
JOIN dbo.TableD d -- <-- Nested join syntax
ON d.yyy = c.yyy
ON c.xxx = b.xxx


To this...



SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d -- <-- Regular way
ON d.yyy = c.yyy


Note: in the real query, there were 10 joined tables including inner and outer joins. The tables were not huge in terms of sql data. No aggregates. There was a DISTINCT on the output. All joins were to a primary key, but the foreign key was not necessarily indexed.



We will certainly change our ways, but I'm still curious as to the proper 'guidance' on such style. I've often used the 'indented' style to indicate a 'more readable' join for such things as lookup tables.










share|improve this question

























  • Were there outer joins involved or only inner joins?

    – ypercubeᵀᴹ
    Oct 28 '13 at 22:20






  • 1





    I don't think I would ever write the first form on purpose. It just doesn't look right.

    – Aaron Bertrand
    Oct 28 '13 at 22:26











  • @AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!

    – Max Vernon
    Oct 28 '13 at 22:30






  • 2





    I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.

    – Aaron Bertrand
    Oct 28 '13 at 22:32











  • I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?

    – Max Vernon
    Oct 28 '13 at 22:44
















5












5








5








We recently found on one of our stored procedures that we get significant performance improvement by changing the join syntax/style of our query from this...



SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
JOIN dbo.TableD d -- <-- Nested join syntax
ON d.yyy = c.yyy
ON c.xxx = b.xxx


To this...



SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d -- <-- Regular way
ON d.yyy = c.yyy


Note: in the real query, there were 10 joined tables including inner and outer joins. The tables were not huge in terms of sql data. No aggregates. There was a DISTINCT on the output. All joins were to a primary key, but the foreign key was not necessarily indexed.



We will certainly change our ways, but I'm still curious as to the proper 'guidance' on such style. I've often used the 'indented' style to indicate a 'more readable' join for such things as lookup tables.










share|improve this question
















We recently found on one of our stored procedures that we get significant performance improvement by changing the join syntax/style of our query from this...



SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
JOIN dbo.TableD d -- <-- Nested join syntax
ON d.yyy = c.yyy
ON c.xxx = b.xxx


To this...



SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d -- <-- Regular way
ON d.yyy = c.yyy


Note: in the real query, there were 10 joined tables including inner and outer joins. The tables were not huge in terms of sql data. No aggregates. There was a DISTINCT on the output. All joins were to a primary key, but the foreign key was not necessarily indexed.



We will certainly change our ways, but I'm still curious as to the proper 'guidance' on such style. I've often used the 'indented' style to indicate a 'more readable' join for such things as lookup tables.







sql-server sql-server-2012






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 11 mins ago









jadarnel27

5,94311938




5,94311938










asked Oct 28 '13 at 22:13









JeffInCOJeffInCO

170115




170115













  • Were there outer joins involved or only inner joins?

    – ypercubeᵀᴹ
    Oct 28 '13 at 22:20






  • 1





    I don't think I would ever write the first form on purpose. It just doesn't look right.

    – Aaron Bertrand
    Oct 28 '13 at 22:26











  • @AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!

    – Max Vernon
    Oct 28 '13 at 22:30






  • 2





    I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.

    – Aaron Bertrand
    Oct 28 '13 at 22:32











  • I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?

    – Max Vernon
    Oct 28 '13 at 22:44





















  • Were there outer joins involved or only inner joins?

    – ypercubeᵀᴹ
    Oct 28 '13 at 22:20






  • 1





    I don't think I would ever write the first form on purpose. It just doesn't look right.

    – Aaron Bertrand
    Oct 28 '13 at 22:26











  • @AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!

    – Max Vernon
    Oct 28 '13 at 22:30






  • 2





    I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.

    – Aaron Bertrand
    Oct 28 '13 at 22:32











  • I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?

    – Max Vernon
    Oct 28 '13 at 22:44



















Were there outer joins involved or only inner joins?

– ypercubeᵀᴹ
Oct 28 '13 at 22:20





Were there outer joins involved or only inner joins?

– ypercubeᵀᴹ
Oct 28 '13 at 22:20




1




1





I don't think I would ever write the first form on purpose. It just doesn't look right.

– Aaron Bertrand
Oct 28 '13 at 22:26





I don't think I would ever write the first form on purpose. It just doesn't look right.

– Aaron Bertrand
Oct 28 '13 at 22:26













@AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!

– Max Vernon
Oct 28 '13 at 22:30





@AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!

– Max Vernon
Oct 28 '13 at 22:30




2




2





I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.

– Aaron Bertrand
Oct 28 '13 at 22:32





I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.

– Aaron Bertrand
Oct 28 '13 at 22:32













I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?

– Max Vernon
Oct 28 '13 at 22:44







I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?

– Max Vernon
Oct 28 '13 at 22:44












1 Answer
1






active

oldest

votes


















6














In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.



As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.



To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.



Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.



To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):



Logical tree



I can (carefully) rewrite the query to use 'nested' syntax:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


In which case the logical tree at the same point is:



Input tree 2



The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:



Plan shape



There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:



Plan 2



The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.



In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.



To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)






share|improve this answer
























  • Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.

    – JeffInCO
    Oct 29 '13 at 14:05











  • @JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)

    – ypercubeᵀᴹ
    Oct 29 '13 at 14:46











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f52371%2fjoin-syntax-style-performance-consideration%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









6














In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.



As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.



To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.



Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.



To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):



Logical tree



I can (carefully) rewrite the query to use 'nested' syntax:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


In which case the logical tree at the same point is:



Input tree 2



The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:



Plan shape



There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:



Plan 2



The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.



In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.



To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)






share|improve this answer
























  • Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.

    – JeffInCO
    Oct 29 '13 at 14:05











  • @JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)

    – ypercubeᵀᴹ
    Oct 29 '13 at 14:46
















6














In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.



As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.



To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.



Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.



To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):



Logical tree



I can (carefully) rewrite the query to use 'nested' syntax:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


In which case the logical tree at the same point is:



Input tree 2



The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:



Plan shape



There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:



Plan 2



The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.



In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.



To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)






share|improve this answer
























  • Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.

    – JeffInCO
    Oct 29 '13 at 14:05











  • @JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)

    – ypercubeᵀᴹ
    Oct 29 '13 at 14:46














6












6








6







In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.



As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.



To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.



Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.



To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):



Logical tree



I can (carefully) rewrite the query to use 'nested' syntax:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


In which case the logical tree at the same point is:



Input tree 2



The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:



Plan shape



There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:



Plan 2



The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.



In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.



To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)






share|improve this answer













In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.



As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.



To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.



Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.



To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):



Logical tree



I can (carefully) rewrite the query to use 'nested' syntax:



SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;


In which case the logical tree at the same point is:



Input tree 2



The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:



Plan shape



There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:



Plan 2



The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.



In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.



To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)







share|improve this answer












share|improve this answer



share|improve this answer










answered Oct 29 '13 at 2:11









Paul WhitePaul White

52.7k14281456




52.7k14281456













  • Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.

    – JeffInCO
    Oct 29 '13 at 14:05











  • @JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)

    – ypercubeᵀᴹ
    Oct 29 '13 at 14:46



















  • Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.

    – JeffInCO
    Oct 29 '13 at 14:05











  • @JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)

    – ypercubeᵀᴹ
    Oct 29 '13 at 14:46

















Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.

– JeffInCO
Oct 29 '13 at 14:05





Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.

– JeffInCO
Oct 29 '13 at 14:05













@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)

– ypercubeᵀᴹ
Oct 29 '13 at 14:46





@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)

– ypercubeᵀᴹ
Oct 29 '13 at 14:46


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f52371%2fjoin-syntax-style-performance-consideration%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

ORA-01691 (unable to extend lob segment) even though my tablespace has AUTOEXTEND onORA-01692: unable to...

Always On Availability groups resolving state after failover - Remote harden of transaction...

Circunscripción electoral de Guipúzcoa Referencias Menú de navegaciónLas claves del sistema electoral en...