query performance gains by removing operator hash match inner joinExecution Plan Basics — Hash Match...

Is there a page on which I can view all Sitecore jobs running?

Hashing password to increase entropy

Can a Knock spell open the door to Mordenkainen's Magnificent Mansion?

How can an organ that provides biological immortality be unable to regenerate?

What is the tangent at a sharp point on a curve?

Error in master's thesis, I do not know what to do

Magnifying glass in hyperbolic space

Has the laser at Magurele, Romania reached the tenth of the Sun power?

is this saw blade faulty?

Asserting that Atheism and Theism are both faith based positions

Can you take a "free object interaction" while incapacitated?

1 John in Luther’s Bibel

How to get directions in deep space?

Reason why a kingside attack is not justified

Is there a distance limit for minecart tracks?

C++ lambda syntax

Calculate Pi using Monte Carlo

Travelling in US for more than 90 days

What is the purpose of using a decision tree?

Why is "la Gestapo" feminine?

Did I make a mistake by ccing email to boss to others?

How do you justify more code being written by following clean code practices?

What should be the ideal length of sentences in a blog post for ease of reading?

python displays `n` instead of breaking a line



query performance gains by removing operator hash match inner join


Execution Plan Basics — Hash Match ConfusionHow do I read Query Cost, and is it always a percentage?Can you explain this execution plan?How to optimise T-SQL query using Execution PlanCan I get SSMS to show me the Actual query costs in the Execution plan pane?Execution Plans Differ for the Same SQL Statementcan sql generate a good plan for this procedure?Reading a SQL Server Execution planHow to manipulate the query plan so that I can have more control on memory grants?Query tuning - performanceSHOWPLAN does not display a warning but “Include Execution Plan” does for the same queryQuery Optimization required - Subquery using Inner JoinHash Match inner join in simple query with in statementMerge Join Performance TuningDoes the hash join operator always pull from the build side?Simple Inner join suggesting a Include indexSql query performance measure IO vs TIMEHow to determine index needed to remove hash match from exec planCostly HASH MATCH Aggregate













7















While trying to apply the contents of this question below, to my own situation, I am a bit confused as how I could get rid of the operator Hash Match (Inner Join) if any way possible.



SQL Server query performance - removing need for Hash Match (Inner Join)



I noticed the cost 10% and was wondering if I could reduce it.
see the query plan below.



enter image description here



this work comes from a query had had to tune today:



   SELECT c.AccountCode, MIN(d.CustomerSID) 
FROM Stage.Customer c
INNER JOIN Dimensions.Customer d ON c.Email = d.Email
OR (
c.HomePostCode = d.HomePostCode
AND c.StrSurname = d.strSurname
)
GROUP BY c.AccountCode


and after adding these indexes:



---------------------------------------------------------------------
-- Create the indexes
---------------------------------------------------------------------

CREATE NONCLUSTERED INDEX IDX_Stage_Customer_HOME_SURNAME_INCL
ON Stage.Customer(HomePostCode ,strSurname)
INCLUDE (AccountCode)
--WHERE HASEMAIL = 0
--WITH (ONLINE=ON, DROP_EXISTING = ON)
go


CREATE NONCLUSTERED INDEX IDX_Dimensions_Customer_HOME_SURNAME_INCL
ON Dimensions.Customer(HomePostCode ,strSurname)
INCLUDE (AccountCode,CustomerSID)
--WHERE HASEMAIL = 0
--WITH (ONLINE=ON, DROP_EXISTING = ON)
go



CREATE NONCLUSTERED INDEX IDX_Stage_Customer_EMAIL_INCL
ON Stage.Customer(EMAIL)
INCLUDE (AccountCode)
--WHERE HASEMAIL = 1
--WITH (ONLINE=ON, DROP_EXISTING = ON)
go


CREATE NONCLUSTERED INDEX IDX_Dimensions_Customer_EMAIL_INCL
ON Dimensions.Customer(EMAIL)
INCLUDE (AccountCode,CustomerSID)
--WHERE HASEMAIL = 1
--WITH (ONLINE=ON, DROP_EXISTING = ON)
go


this is the new query:



----------------------------------------------------------------------------
-- new query
----------------------------------------------------------------------------

SELECT *
FROM (
SELECT AccountCode
,RO=ROW_NUMBER () OVER (PARTITION BY AccountCode ORDER BY CustomerSID)
--,CustomerSID=MIN(CustomerSID) OVER (PARTITION BY AccountCode ORDER BY AccountCode)
,CustomerSID
FROM (
SELECT c.AccountCode, D.CustomerSID
FROM Stage.Customer c
INNER JOIN Dimensions.Customer d ON c.Email = d.Email

UNION ALL

SELECT c.AccountCode, D.CustomerSID
FROM Stage.Customer c
INNER JOIN Dimensions.Customer d ON c.HomePostCode = d.HomePostCode
AND c.StrSurname = d.strSurname
) RADHE
) R1
WHERE RO = 1


this has reduced the query execution time from 8 minutes to 1 second.
everybody is happy, but still I would like to know if I could get more done,
I.e. by somehow removing the hash match operator.



why is it there at the first place, I am matching all the fields, why hash?










share|improve this question





























    7















    While trying to apply the contents of this question below, to my own situation, I am a bit confused as how I could get rid of the operator Hash Match (Inner Join) if any way possible.



    SQL Server query performance - removing need for Hash Match (Inner Join)



    I noticed the cost 10% and was wondering if I could reduce it.
    see the query plan below.



    enter image description here



    this work comes from a query had had to tune today:



       SELECT c.AccountCode, MIN(d.CustomerSID) 
    FROM Stage.Customer c
    INNER JOIN Dimensions.Customer d ON c.Email = d.Email
    OR (
    c.HomePostCode = d.HomePostCode
    AND c.StrSurname = d.strSurname
    )
    GROUP BY c.AccountCode


    and after adding these indexes:



    ---------------------------------------------------------------------
    -- Create the indexes
    ---------------------------------------------------------------------

    CREATE NONCLUSTERED INDEX IDX_Stage_Customer_HOME_SURNAME_INCL
    ON Stage.Customer(HomePostCode ,strSurname)
    INCLUDE (AccountCode)
    --WHERE HASEMAIL = 0
    --WITH (ONLINE=ON, DROP_EXISTING = ON)
    go


    CREATE NONCLUSTERED INDEX IDX_Dimensions_Customer_HOME_SURNAME_INCL
    ON Dimensions.Customer(HomePostCode ,strSurname)
    INCLUDE (AccountCode,CustomerSID)
    --WHERE HASEMAIL = 0
    --WITH (ONLINE=ON, DROP_EXISTING = ON)
    go



    CREATE NONCLUSTERED INDEX IDX_Stage_Customer_EMAIL_INCL
    ON Stage.Customer(EMAIL)
    INCLUDE (AccountCode)
    --WHERE HASEMAIL = 1
    --WITH (ONLINE=ON, DROP_EXISTING = ON)
    go


    CREATE NONCLUSTERED INDEX IDX_Dimensions_Customer_EMAIL_INCL
    ON Dimensions.Customer(EMAIL)
    INCLUDE (AccountCode,CustomerSID)
    --WHERE HASEMAIL = 1
    --WITH (ONLINE=ON, DROP_EXISTING = ON)
    go


    this is the new query:



    ----------------------------------------------------------------------------
    -- new query
    ----------------------------------------------------------------------------

    SELECT *
    FROM (
    SELECT AccountCode
    ,RO=ROW_NUMBER () OVER (PARTITION BY AccountCode ORDER BY CustomerSID)
    --,CustomerSID=MIN(CustomerSID) OVER (PARTITION BY AccountCode ORDER BY AccountCode)
    ,CustomerSID
    FROM (
    SELECT c.AccountCode, D.CustomerSID
    FROM Stage.Customer c
    INNER JOIN Dimensions.Customer d ON c.Email = d.Email

    UNION ALL

    SELECT c.AccountCode, D.CustomerSID
    FROM Stage.Customer c
    INNER JOIN Dimensions.Customer d ON c.HomePostCode = d.HomePostCode
    AND c.StrSurname = d.strSurname
    ) RADHE
    ) R1
    WHERE RO = 1


    this has reduced the query execution time from 8 minutes to 1 second.
    everybody is happy, but still I would like to know if I could get more done,
    I.e. by somehow removing the hash match operator.



    why is it there at the first place, I am matching all the fields, why hash?










    share|improve this question



























      7












      7








      7


      2






      While trying to apply the contents of this question below, to my own situation, I am a bit confused as how I could get rid of the operator Hash Match (Inner Join) if any way possible.



      SQL Server query performance - removing need for Hash Match (Inner Join)



      I noticed the cost 10% and was wondering if I could reduce it.
      see the query plan below.



      enter image description here



      this work comes from a query had had to tune today:



         SELECT c.AccountCode, MIN(d.CustomerSID) 
      FROM Stage.Customer c
      INNER JOIN Dimensions.Customer d ON c.Email = d.Email
      OR (
      c.HomePostCode = d.HomePostCode
      AND c.StrSurname = d.strSurname
      )
      GROUP BY c.AccountCode


      and after adding these indexes:



      ---------------------------------------------------------------------
      -- Create the indexes
      ---------------------------------------------------------------------

      CREATE NONCLUSTERED INDEX IDX_Stage_Customer_HOME_SURNAME_INCL
      ON Stage.Customer(HomePostCode ,strSurname)
      INCLUDE (AccountCode)
      --WHERE HASEMAIL = 0
      --WITH (ONLINE=ON, DROP_EXISTING = ON)
      go


      CREATE NONCLUSTERED INDEX IDX_Dimensions_Customer_HOME_SURNAME_INCL
      ON Dimensions.Customer(HomePostCode ,strSurname)
      INCLUDE (AccountCode,CustomerSID)
      --WHERE HASEMAIL = 0
      --WITH (ONLINE=ON, DROP_EXISTING = ON)
      go



      CREATE NONCLUSTERED INDEX IDX_Stage_Customer_EMAIL_INCL
      ON Stage.Customer(EMAIL)
      INCLUDE (AccountCode)
      --WHERE HASEMAIL = 1
      --WITH (ONLINE=ON, DROP_EXISTING = ON)
      go


      CREATE NONCLUSTERED INDEX IDX_Dimensions_Customer_EMAIL_INCL
      ON Dimensions.Customer(EMAIL)
      INCLUDE (AccountCode,CustomerSID)
      --WHERE HASEMAIL = 1
      --WITH (ONLINE=ON, DROP_EXISTING = ON)
      go


      this is the new query:



      ----------------------------------------------------------------------------
      -- new query
      ----------------------------------------------------------------------------

      SELECT *
      FROM (
      SELECT AccountCode
      ,RO=ROW_NUMBER () OVER (PARTITION BY AccountCode ORDER BY CustomerSID)
      --,CustomerSID=MIN(CustomerSID) OVER (PARTITION BY AccountCode ORDER BY AccountCode)
      ,CustomerSID
      FROM (
      SELECT c.AccountCode, D.CustomerSID
      FROM Stage.Customer c
      INNER JOIN Dimensions.Customer d ON c.Email = d.Email

      UNION ALL

      SELECT c.AccountCode, D.CustomerSID
      FROM Stage.Customer c
      INNER JOIN Dimensions.Customer d ON c.HomePostCode = d.HomePostCode
      AND c.StrSurname = d.strSurname
      ) RADHE
      ) R1
      WHERE RO = 1


      this has reduced the query execution time from 8 minutes to 1 second.
      everybody is happy, but still I would like to know if I could get more done,
      I.e. by somehow removing the hash match operator.



      why is it there at the first place, I am matching all the fields, why hash?










      share|improve this question
















      While trying to apply the contents of this question below, to my own situation, I am a bit confused as how I could get rid of the operator Hash Match (Inner Join) if any way possible.



      SQL Server query performance - removing need for Hash Match (Inner Join)



      I noticed the cost 10% and was wondering if I could reduce it.
      see the query plan below.



      enter image description here



      this work comes from a query had had to tune today:



         SELECT c.AccountCode, MIN(d.CustomerSID) 
      FROM Stage.Customer c
      INNER JOIN Dimensions.Customer d ON c.Email = d.Email
      OR (
      c.HomePostCode = d.HomePostCode
      AND c.StrSurname = d.strSurname
      )
      GROUP BY c.AccountCode


      and after adding these indexes:



      ---------------------------------------------------------------------
      -- Create the indexes
      ---------------------------------------------------------------------

      CREATE NONCLUSTERED INDEX IDX_Stage_Customer_HOME_SURNAME_INCL
      ON Stage.Customer(HomePostCode ,strSurname)
      INCLUDE (AccountCode)
      --WHERE HASEMAIL = 0
      --WITH (ONLINE=ON, DROP_EXISTING = ON)
      go


      CREATE NONCLUSTERED INDEX IDX_Dimensions_Customer_HOME_SURNAME_INCL
      ON Dimensions.Customer(HomePostCode ,strSurname)
      INCLUDE (AccountCode,CustomerSID)
      --WHERE HASEMAIL = 0
      --WITH (ONLINE=ON, DROP_EXISTING = ON)
      go



      CREATE NONCLUSTERED INDEX IDX_Stage_Customer_EMAIL_INCL
      ON Stage.Customer(EMAIL)
      INCLUDE (AccountCode)
      --WHERE HASEMAIL = 1
      --WITH (ONLINE=ON, DROP_EXISTING = ON)
      go


      CREATE NONCLUSTERED INDEX IDX_Dimensions_Customer_EMAIL_INCL
      ON Dimensions.Customer(EMAIL)
      INCLUDE (AccountCode,CustomerSID)
      --WHERE HASEMAIL = 1
      --WITH (ONLINE=ON, DROP_EXISTING = ON)
      go


      this is the new query:



      ----------------------------------------------------------------------------
      -- new query
      ----------------------------------------------------------------------------

      SELECT *
      FROM (
      SELECT AccountCode
      ,RO=ROW_NUMBER () OVER (PARTITION BY AccountCode ORDER BY CustomerSID)
      --,CustomerSID=MIN(CustomerSID) OVER (PARTITION BY AccountCode ORDER BY AccountCode)
      ,CustomerSID
      FROM (
      SELECT c.AccountCode, D.CustomerSID
      FROM Stage.Customer c
      INNER JOIN Dimensions.Customer d ON c.Email = d.Email

      UNION ALL

      SELECT c.AccountCode, D.CustomerSID
      FROM Stage.Customer c
      INNER JOIN Dimensions.Customer d ON c.HomePostCode = d.HomePostCode
      AND c.StrSurname = d.strSurname
      ) RADHE
      ) R1
      WHERE RO = 1


      this has reduced the query execution time from 8 minutes to 1 second.
      everybody is happy, but still I would like to know if I could get more done,
      I.e. by somehow removing the hash match operator.



      why is it there at the first place, I am matching all the fields, why hash?







      sql-server sql-server-2014 index query-performance execution-plan






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 8 mins ago







      marcello miorelli

















      asked Sep 16 '15 at 17:32









      marcello miorellimarcello miorelli

      5,9162062141




      5,9162062141






















          1 Answer
          1






          active

          oldest

          votes


















          10














          the following links will provide a good source of knowledge regarding execution plans.



          From Execution Plan Basics — Hash Match Confusion I found:




          From
          http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/



          "The hash join is one of the more expensive join operations, as it
          requires the creation of a hash table to do the join. That said, it’s
          the join that’s best for large, unsorted inputs. It is the most
          memory-intensive of any of the joins



          The hash join first reads one of the inputs and hashes the join column
          and puts the resulting hash and the column values into a hash table
          built up in memory. Then it reads all the rows in the second input,
          hashes those and checks the rows in the resulting hash bucket for the
          joining rows."



          which links to this post:



          http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx




          Can you explain this execution plan? provides good insights about the execution plan with, not specific to hash match but relevant.




          The constant scans are a way for SQL Server to create a bucket into
          which it's going to place something later in the execution plan. I've
          posted a more thorough explanation of it here. To understand what the
          constant scan is for, you have to look further into the plan. In this
          case, it's the Compute Scalar operators that are being used to
          populate the space created by the constant scan.



          The Compute Scalar operators are being loaded up with NULL and the
          value 1045876, so they're clearly going to be used with the Loop Join
          in an effort to filter the data.



          The really cool part is that this plan is Trivial. It means that it
          went through a minimal optimization process. All the operations are
          leading up to the Merge Interval. This is used to create a minimal set
          of comparison operators for an index seek (details on that here).




          In this question:
          Can I get SSMS to show me the Actual query costs in the Execution plan pane?
          I'm fixing performance issues on a multistatement stored procedure in SQL Server. I want to know which part(s) I should spend time on.



          I understand from How do I read Query Cost, and is it always a percentage? that even when SSMS is told to Include Actual Execution Plan, the "Query cost (relative to the batch)" figures is still based on cost estimates, which can be far off actuals



          Measuring Query Performance : “Execution Plan Query Cost” vs “Time Taken”
          gives good info for when you need to compare the performance of 2 different queries.



          In Reading a SQL Server Execution plan you can find great tips for reading the execution plan.



          Other questions/answers that I really liked because they are relevant to this subject, and for my personal reference I would like to quote are:



          How to optimise T-SQL query using Execution Plan



          can sql generate a good plan for this procedure?



          Execution Plans Differ for the Same SQL Statement






          share|improve this answer

























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "182"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115276%2fquery-performance-gains-by-removing-operator-hash-match-inner-join%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            10














            the following links will provide a good source of knowledge regarding execution plans.



            From Execution Plan Basics — Hash Match Confusion I found:




            From
            http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/



            "The hash join is one of the more expensive join operations, as it
            requires the creation of a hash table to do the join. That said, it’s
            the join that’s best for large, unsorted inputs. It is the most
            memory-intensive of any of the joins



            The hash join first reads one of the inputs and hashes the join column
            and puts the resulting hash and the column values into a hash table
            built up in memory. Then it reads all the rows in the second input,
            hashes those and checks the rows in the resulting hash bucket for the
            joining rows."



            which links to this post:



            http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx




            Can you explain this execution plan? provides good insights about the execution plan with, not specific to hash match but relevant.




            The constant scans are a way for SQL Server to create a bucket into
            which it's going to place something later in the execution plan. I've
            posted a more thorough explanation of it here. To understand what the
            constant scan is for, you have to look further into the plan. In this
            case, it's the Compute Scalar operators that are being used to
            populate the space created by the constant scan.



            The Compute Scalar operators are being loaded up with NULL and the
            value 1045876, so they're clearly going to be used with the Loop Join
            in an effort to filter the data.



            The really cool part is that this plan is Trivial. It means that it
            went through a minimal optimization process. All the operations are
            leading up to the Merge Interval. This is used to create a minimal set
            of comparison operators for an index seek (details on that here).




            In this question:
            Can I get SSMS to show me the Actual query costs in the Execution plan pane?
            I'm fixing performance issues on a multistatement stored procedure in SQL Server. I want to know which part(s) I should spend time on.



            I understand from How do I read Query Cost, and is it always a percentage? that even when SSMS is told to Include Actual Execution Plan, the "Query cost (relative to the batch)" figures is still based on cost estimates, which can be far off actuals



            Measuring Query Performance : “Execution Plan Query Cost” vs “Time Taken”
            gives good info for when you need to compare the performance of 2 different queries.



            In Reading a SQL Server Execution plan you can find great tips for reading the execution plan.



            Other questions/answers that I really liked because they are relevant to this subject, and for my personal reference I would like to quote are:



            How to optimise T-SQL query using Execution Plan



            can sql generate a good plan for this procedure?



            Execution Plans Differ for the Same SQL Statement






            share|improve this answer






























              10














              the following links will provide a good source of knowledge regarding execution plans.



              From Execution Plan Basics — Hash Match Confusion I found:




              From
              http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/



              "The hash join is one of the more expensive join operations, as it
              requires the creation of a hash table to do the join. That said, it’s
              the join that’s best for large, unsorted inputs. It is the most
              memory-intensive of any of the joins



              The hash join first reads one of the inputs and hashes the join column
              and puts the resulting hash and the column values into a hash table
              built up in memory. Then it reads all the rows in the second input,
              hashes those and checks the rows in the resulting hash bucket for the
              joining rows."



              which links to this post:



              http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx




              Can you explain this execution plan? provides good insights about the execution plan with, not specific to hash match but relevant.




              The constant scans are a way for SQL Server to create a bucket into
              which it's going to place something later in the execution plan. I've
              posted a more thorough explanation of it here. To understand what the
              constant scan is for, you have to look further into the plan. In this
              case, it's the Compute Scalar operators that are being used to
              populate the space created by the constant scan.



              The Compute Scalar operators are being loaded up with NULL and the
              value 1045876, so they're clearly going to be used with the Loop Join
              in an effort to filter the data.



              The really cool part is that this plan is Trivial. It means that it
              went through a minimal optimization process. All the operations are
              leading up to the Merge Interval. This is used to create a minimal set
              of comparison operators for an index seek (details on that here).




              In this question:
              Can I get SSMS to show me the Actual query costs in the Execution plan pane?
              I'm fixing performance issues on a multistatement stored procedure in SQL Server. I want to know which part(s) I should spend time on.



              I understand from How do I read Query Cost, and is it always a percentage? that even when SSMS is told to Include Actual Execution Plan, the "Query cost (relative to the batch)" figures is still based on cost estimates, which can be far off actuals



              Measuring Query Performance : “Execution Plan Query Cost” vs “Time Taken”
              gives good info for when you need to compare the performance of 2 different queries.



              In Reading a SQL Server Execution plan you can find great tips for reading the execution plan.



              Other questions/answers that I really liked because they are relevant to this subject, and for my personal reference I would like to quote are:



              How to optimise T-SQL query using Execution Plan



              can sql generate a good plan for this procedure?



              Execution Plans Differ for the Same SQL Statement






              share|improve this answer




























                10












                10








                10







                the following links will provide a good source of knowledge regarding execution plans.



                From Execution Plan Basics — Hash Match Confusion I found:




                From
                http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/



                "The hash join is one of the more expensive join operations, as it
                requires the creation of a hash table to do the join. That said, it’s
                the join that’s best for large, unsorted inputs. It is the most
                memory-intensive of any of the joins



                The hash join first reads one of the inputs and hashes the join column
                and puts the resulting hash and the column values into a hash table
                built up in memory. Then it reads all the rows in the second input,
                hashes those and checks the rows in the resulting hash bucket for the
                joining rows."



                which links to this post:



                http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx




                Can you explain this execution plan? provides good insights about the execution plan with, not specific to hash match but relevant.




                The constant scans are a way for SQL Server to create a bucket into
                which it's going to place something later in the execution plan. I've
                posted a more thorough explanation of it here. To understand what the
                constant scan is for, you have to look further into the plan. In this
                case, it's the Compute Scalar operators that are being used to
                populate the space created by the constant scan.



                The Compute Scalar operators are being loaded up with NULL and the
                value 1045876, so they're clearly going to be used with the Loop Join
                in an effort to filter the data.



                The really cool part is that this plan is Trivial. It means that it
                went through a minimal optimization process. All the operations are
                leading up to the Merge Interval. This is used to create a minimal set
                of comparison operators for an index seek (details on that here).




                In this question:
                Can I get SSMS to show me the Actual query costs in the Execution plan pane?
                I'm fixing performance issues on a multistatement stored procedure in SQL Server. I want to know which part(s) I should spend time on.



                I understand from How do I read Query Cost, and is it always a percentage? that even when SSMS is told to Include Actual Execution Plan, the "Query cost (relative to the batch)" figures is still based on cost estimates, which can be far off actuals



                Measuring Query Performance : “Execution Plan Query Cost” vs “Time Taken”
                gives good info for when you need to compare the performance of 2 different queries.



                In Reading a SQL Server Execution plan you can find great tips for reading the execution plan.



                Other questions/answers that I really liked because they are relevant to this subject, and for my personal reference I would like to quote are:



                How to optimise T-SQL query using Execution Plan



                can sql generate a good plan for this procedure?



                Execution Plans Differ for the Same SQL Statement






                share|improve this answer















                the following links will provide a good source of knowledge regarding execution plans.



                From Execution Plan Basics — Hash Match Confusion I found:




                From
                http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/



                "The hash join is one of the more expensive join operations, as it
                requires the creation of a hash table to do the join. That said, it’s
                the join that’s best for large, unsorted inputs. It is the most
                memory-intensive of any of the joins



                The hash join first reads one of the inputs and hashes the join column
                and puts the resulting hash and the column values into a hash table
                built up in memory. Then it reads all the rows in the second input,
                hashes those and checks the rows in the resulting hash bucket for the
                joining rows."



                which links to this post:



                http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx




                Can you explain this execution plan? provides good insights about the execution plan with, not specific to hash match but relevant.




                The constant scans are a way for SQL Server to create a bucket into
                which it's going to place something later in the execution plan. I've
                posted a more thorough explanation of it here. To understand what the
                constant scan is for, you have to look further into the plan. In this
                case, it's the Compute Scalar operators that are being used to
                populate the space created by the constant scan.



                The Compute Scalar operators are being loaded up with NULL and the
                value 1045876, so they're clearly going to be used with the Loop Join
                in an effort to filter the data.



                The really cool part is that this plan is Trivial. It means that it
                went through a minimal optimization process. All the operations are
                leading up to the Merge Interval. This is used to create a minimal set
                of comparison operators for an index seek (details on that here).




                In this question:
                Can I get SSMS to show me the Actual query costs in the Execution plan pane?
                I'm fixing performance issues on a multistatement stored procedure in SQL Server. I want to know which part(s) I should spend time on.



                I understand from How do I read Query Cost, and is it always a percentage? that even when SSMS is told to Include Actual Execution Plan, the "Query cost (relative to the batch)" figures is still based on cost estimates, which can be far off actuals



                Measuring Query Performance : “Execution Plan Query Cost” vs “Time Taken”
                gives good info for when you need to compare the performance of 2 different queries.



                In Reading a SQL Server Execution plan you can find great tips for reading the execution plan.



                Other questions/answers that I really liked because they are relevant to this subject, and for my personal reference I would like to quote are:



                How to optimise T-SQL query using Execution Plan



                can sql generate a good plan for this procedure?



                Execution Plans Differ for the Same SQL Statement







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited May 23 '17 at 12:40









                Community

                1




                1










                answered Sep 23 '15 at 23:25









                marcello miorellimarcello miorelli

                5,9162062141




                5,9162062141






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115276%2fquery-performance-gains-by-removing-operator-hash-match-inner-join%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Parapolítica Índice Antecedentes El escándalo Proceso judicial Consecuencias Véase...

                    How to remove border from elements in the last row?Targeting flex items on the last rowHow to vertically wrap...

                    Tecnologías entrañables Índice Antecedentes Desarrollo Tecnologías Entrañables en la...