join syntax / style performance considerationWhy does the location of a join change performance?Syntax of...
How to remove lines through the legend markers in ListPlot?
Early credit roll before the end of the film
Find some digits of factorial 17
How to prevent users from executing commands through browser URL
Why would the Pakistan airspace closure cancel flights not headed to Pakistan itself?
How to deal with an incendiary email that was recalled
Is it a fallacy if someone claims they need an explanation for every word of your argument to the point where they don't understand common terms?
Pronunciation of umlaut vowels in the history of German
Why is mind meld hard for T'pol in Star Trek: Enterprise?
Am I a Rude Number?
Would the Vulcan nerve pinch work on a Borg drone?
Can a person refuse a presidential pardon?
One Half of Ten; A Riddle
Avoiding morning and evening handshakes
How long is the D&D Starter Set campaign?
Eww, those bytes are gross
How much mayhem could I cause as a sentient fish?
Why does the Spectator have the Create Food and Water trait, instead of simply not requiring food and water?
How to count the characters of jar files by wc
Table formatting top left corner caption
Difference between i++ and (i)++ in C
Why are the books in the Game of Thrones citadel library shelved spine inwards?
Finding a mistake using Mayer-Vietoris
If I deleted a game I lost the disc for, can I reinstall it digitally?
join syntax / style performance consideration
Why does the location of a join change performance?Syntax of INNER JOIN nested inside OUTER JOIN vs. query resultsImproving SQL Query joins performanceWhy does join syntax with stacked on clauses work?Syntax of INNER JOIN nested inside OUTER JOIN vs. query resultsView designer strange join syntaxConverting old SQL Server outer join syntaxEnable TRY style error handlingConsideration for a little bit heavy databaseCheck for existing matches to find the field their grouped-byNeed Help Refining A Queryjoin performanceStrange query plan when using OR in JOIN clause - Constant scan for every row in table
We recently found on one of our stored procedures that we get significant performance improvement by changing the join syntax/style of our query from this...
SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
JOIN dbo.TableD d -- <-- Nested join syntax
ON d.yyy = c.yyy
ON c.xxx = b.xxx
To this...
SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d -- <-- Regular way
ON d.yyy = c.yyy
Note: in the real query, there were 10 joined tables including inner and outer joins. The tables were not huge in terms of sql data. No aggregates. There was a DISTINCT on the output. All joins were to a primary key, but the foreign key was not necessarily indexed.
We will certainly change our ways, but I'm still curious as to the proper 'guidance' on such style. I've often used the 'indented' style to indicate a 'more readable' join for such things as lookup tables.
sql-server sql-server-2012
|
show 2 more comments
We recently found on one of our stored procedures that we get significant performance improvement by changing the join syntax/style of our query from this...
SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
JOIN dbo.TableD d -- <-- Nested join syntax
ON d.yyy = c.yyy
ON c.xxx = b.xxx
To this...
SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d -- <-- Regular way
ON d.yyy = c.yyy
Note: in the real query, there were 10 joined tables including inner and outer joins. The tables were not huge in terms of sql data. No aggregates. There was a DISTINCT on the output. All joins were to a primary key, but the foreign key was not necessarily indexed.
We will certainly change our ways, but I'm still curious as to the proper 'guidance' on such style. I've often used the 'indented' style to indicate a 'more readable' join for such things as lookup tables.
sql-server sql-server-2012
Were there outer joins involved or only inner joins?
– ypercubeᵀᴹ
Oct 28 '13 at 22:20
1
I don't think I would ever write the first form on purpose. It just doesn't look right.
– Aaron Bertrand♦
Oct 28 '13 at 22:26
@AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!
– Max Vernon
Oct 28 '13 at 22:30
2
I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.
– Aaron Bertrand♦
Oct 28 '13 at 22:32
I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?
– Max Vernon
Oct 28 '13 at 22:44
|
show 2 more comments
We recently found on one of our stored procedures that we get significant performance improvement by changing the join syntax/style of our query from this...
SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
JOIN dbo.TableD d -- <-- Nested join syntax
ON d.yyy = c.yyy
ON c.xxx = b.xxx
To this...
SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d -- <-- Regular way
ON d.yyy = c.yyy
Note: in the real query, there were 10 joined tables including inner and outer joins. The tables were not huge in terms of sql data. No aggregates. There was a DISTINCT on the output. All joins were to a primary key, but the foreign key was not necessarily indexed.
We will certainly change our ways, but I'm still curious as to the proper 'guidance' on such style. I've often used the 'indented' style to indicate a 'more readable' join for such things as lookup tables.
sql-server sql-server-2012
We recently found on one of our stored procedures that we get significant performance improvement by changing the join syntax/style of our query from this...
SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
JOIN dbo.TableD d -- <-- Nested join syntax
ON d.yyy = c.yyy
ON c.xxx = b.xxx
To this...
SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d -- <-- Regular way
ON d.yyy = c.yyy
Note: in the real query, there were 10 joined tables including inner and outer joins. The tables were not huge in terms of sql data. No aggregates. There was a DISTINCT on the output. All joins were to a primary key, but the foreign key was not necessarily indexed.
We will certainly change our ways, but I'm still curious as to the proper 'guidance' on such style. I've often used the 'indented' style to indicate a 'more readable' join for such things as lookup tables.
sql-server sql-server-2012
sql-server sql-server-2012
edited 11 mins ago
jadarnel27
5,94311938
5,94311938
asked Oct 28 '13 at 22:13
JeffInCOJeffInCO
170115
170115
Were there outer joins involved or only inner joins?
– ypercubeᵀᴹ
Oct 28 '13 at 22:20
1
I don't think I would ever write the first form on purpose. It just doesn't look right.
– Aaron Bertrand♦
Oct 28 '13 at 22:26
@AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!
– Max Vernon
Oct 28 '13 at 22:30
2
I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.
– Aaron Bertrand♦
Oct 28 '13 at 22:32
I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?
– Max Vernon
Oct 28 '13 at 22:44
|
show 2 more comments
Were there outer joins involved or only inner joins?
– ypercubeᵀᴹ
Oct 28 '13 at 22:20
1
I don't think I would ever write the first form on purpose. It just doesn't look right.
– Aaron Bertrand♦
Oct 28 '13 at 22:26
@AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!
– Max Vernon
Oct 28 '13 at 22:30
2
I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.
– Aaron Bertrand♦
Oct 28 '13 at 22:32
I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?
– Max Vernon
Oct 28 '13 at 22:44
Were there outer joins involved or only inner joins?
– ypercubeᵀᴹ
Oct 28 '13 at 22:20
Were there outer joins involved or only inner joins?
– ypercubeᵀᴹ
Oct 28 '13 at 22:20
1
1
I don't think I would ever write the first form on purpose. It just doesn't look right.
– Aaron Bertrand♦
Oct 28 '13 at 22:26
I don't think I would ever write the first form on purpose. It just doesn't look right.
– Aaron Bertrand♦
Oct 28 '13 at 22:26
@AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!
– Max Vernon
Oct 28 '13 at 22:30
@AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!
– Max Vernon
Oct 28 '13 at 22:30
2
2
I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.
– Aaron Bertrand♦
Oct 28 '13 at 22:32
I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.
– Aaron Bertrand♦
Oct 28 '13 at 22:32
I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?
– Max Vernon
Oct 28 '13 at 22:44
I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?
– Max Vernon
Oct 28 '13 at 22:44
|
show 2 more comments
1 Answer
1
active
oldest
votes
In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.
As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.
To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.
Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS
clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.
To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):
I can (carefully) rewrite the query to use 'nested' syntax:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
In which case the logical tree at the same point is:
The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:
There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:
The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.
In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.
To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)
Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.
– JeffInCO
Oct 29 '13 at 14:05
@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)
– ypercubeᵀᴹ
Oct 29 '13 at 14:46
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f52371%2fjoin-syntax-style-performance-consideration%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.
As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.
To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.
Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS
clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.
To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):
I can (carefully) rewrite the query to use 'nested' syntax:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
In which case the logical tree at the same point is:
The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:
There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:
The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.
In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.
To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)
Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.
– JeffInCO
Oct 29 '13 at 14:05
@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)
– ypercubeᵀᴹ
Oct 29 '13 at 14:46
add a comment |
In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.
As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.
To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.
Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS
clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.
To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):
I can (carefully) rewrite the query to use 'nested' syntax:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
In which case the logical tree at the same point is:
The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:
There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:
The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.
In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.
To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)
Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.
– JeffInCO
Oct 29 '13 at 14:05
@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)
– ypercubeᵀᴹ
Oct 29 '13 at 14:46
add a comment |
In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.
As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.
To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.
Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS
clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.
To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):
I can (carefully) rewrite the query to use 'nested' syntax:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
In which case the logical tree at the same point is:
The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:
There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:
The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.
In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.
To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)
In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.
As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.
To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.
Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS
clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.
To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
ON INV.ProductID = P.ProductID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):
I can (carefully) rewrite the query to use 'nested' syntax:
SELECT
P.Name,
PS.Name,
SUM(TH.Quantity),
SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
ON INV.ProductID = TH.ProductID
ON TH.ProductID = P.ProductID
ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
P.ProductID,
P.Name,
PS.ProductSubcategoryID,
PS.Name;
In which case the logical tree at the same point is:
The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:
There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:
The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.
In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.
To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)
answered Oct 29 '13 at 2:11
Paul White♦Paul White
52.7k14281456
52.7k14281456
Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.
– JeffInCO
Oct 29 '13 at 14:05
@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)
– ypercubeᵀᴹ
Oct 29 '13 at 14:46
add a comment |
Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.
– JeffInCO
Oct 29 '13 at 14:05
@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)
– ypercubeᵀᴹ
Oct 29 '13 at 14:46
Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.
– JeffInCO
Oct 29 '13 at 14:05
Per the other comments, it is difficult to provide adequate detail to answer my question (privacy); however, I think Paul has captured the gist... You may or may not get a different (better) plan based on syntax, and especially if your query has many joins as my real query did.
– JeffInCO
Oct 29 '13 at 14:05
@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)
– ypercubeᵀᴹ
Oct 29 '13 at 14:46
@JeffInCO just a comment in case Paul hasn't noticed your later comment with: "There were 10 joined tables including inner and outer joins." With outer joins, it's easier to have a rewrite that it's not actually semantically equivalent (and maybe easier to hit an optimizer's blind spot on what queries it checks and considers for equivalence.)
– ypercubeᵀᴹ
Oct 29 '13 at 14:46
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f52371%2fjoin-syntax-style-performance-consideration%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Were there outer joins involved or only inner joins?
– ypercubeᵀᴹ
Oct 28 '13 at 22:20
1
I don't think I would ever write the first form on purpose. It just doesn't look right.
– Aaron Bertrand♦
Oct 28 '13 at 22:26
@AaronBertrand I think that join style comes from Access. It was a revelation for me when I realized I didn't HAVE to write queries that way!
– Max Vernon
Oct 28 '13 at 22:30
2
I'd be really surprised to see these perform differently, if they're semantically the same. Did you compare the actual execution plans of the two versions of the query? Were they different? Perhaps you think you got better performance because of a syntax change, but it actually happened because the old query was stuck on a bad plan, and changing the procedure forced a recompile.
– Aaron Bertrand♦
Oct 28 '13 at 22:32
I was just going to add I'd love to see the two plans! If there really is a performance difference, it would be helpful to understand why. Perhaps the plans are the same, just the data was cached for the 2nd run?
– Max Vernon
Oct 28 '13 at 22:44