Performance difference between Clustered and Non Clustered IndexUnderstanding multiple indexes with MySQL...

Codimension of non-flat locus

How to determine what difficulty is right for the game?

Paid for article while in US on F-1 visa?

What's the output of a record needle playing an out-of-speed record

Why is consensus so controversial in Britain?

Are the number of citations and number of published articles the most important criteria for a tenure promotion?

When a company launches a new product do they "come out" with a new product or do they "come up" with a new product?

Is it inappropriate for a student to attend their mentor's dissertation defense?

Approximately how much travel time was saved by the opening of the Suez Canal in 1869?

Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?

How can I prevent hyper evolved versions of regular creatures from wiping out their cousins?

Convert two switches to a dual stack, and add outlet - possible here?

How to draw a waving flag in TikZ

Is it possible to run Internet Explorer on OS X El Capitan?

Why doesn't H₄O²⁺ exist?

How to efficiently unroll a matrix by value with numpy?

Malcev's paper "On a class of homogeneous spaces" in English

Do infinite dimensional systems make sense?

Important Resources for Dark Age Civilizations?

Definite integral giving negative value as a result?

How to format long polynomial?

Is it unprofessional to ask if a job posting on GlassDoor is real?

What is a clear way to write a bar that has an extra beat?

What is the word for reserving something for yourself before others do?

Performance difference between Clustered and Non Clustered Index

Understanding multiple indexes with MySQL (and performance)difference between having many non clustered index with single columns and with combination of many columnsNeed for reaching data through clustered index with a non-clustered indexAren't two writes required to update a clustered index recordFinding exact row in Clustered index leaf pageQuestion about non-clustered index storage in SQL Serverwhat if all the non clustered indexes on my table were filtered indexes?Understanding non clustered index locking during an updateComparison between A non-clustered primary key and a covering index in terms of performancePerformance differences between RID Lookup vs Key Lookup?Missing Non Clustered Index already part of Clustered Index

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I was reading Clustered and Non Clustered Indexes.

Clustered Index - It contains Data Pages. That means the complete row
information will be present in the Clustered Index Column.

Non Clustered Index - It only contains the Row Locator information in the
form of Clustered Index column(if availabe) or the File Indentifier +
Page Number + Total Rows in a Page. This means that the query engine must take an additional step in order to locate the actual data.

Query - How can I check the performance difference with a help of a practical example as we know that the table can have only one Clustered Index and provides sorting at the Clustered Index Column and Non Clustered Index don't provide sorting and can support 999 Non Clustered Indexes in SQL Server 2008 and 249 in SQL Server 2005.

asked May 29 '12 at 17:32

Pankaj Garg

migrated from stackoverflow.com May 29 '12 at 19:01

This question came from our site for professional and enthusiast programmers.

2

Performance difference when you do what?, what kind of work you want to do with that table?, there is not a single solution that suits every need

– Lamak
May 29 '12 at 19:05

2

Some tangible discussion here perhaps. stackoverflow.com/questions/91688/… stackoverflow.com/questions/5070529/… stackoverflow.com/questions/1251636/… We could write a dissertation about the differences between clustered and non-clustered indexes, but I don't think we would say anything that isn't already out there available for you to read.

– Aaron Bertrand♦
May 29 '12 at 19:36

4

You wrote: "This means that the query engine must take an additional step in order to locate the actual data." Actually, if all you need are columns covered in the index, you don't need to take any additional steps after you find your target rows in the nonclustered index. Only when you need columns not covered by the nonclustered index does SQL Server need to perform a bookmark lookup.

– Nick Chammas
May 29 '12 at 19:39

add a comment |

I was reading Clustered and Non Clustered Indexes.

Clustered Index - It contains Data Pages. That means the complete row
information will be present in the Clustered Index Column.

asked May 29 '12 at 17:32

Pankaj Garg

migrated from stackoverflow.com May 29 '12 at 19:01

This question came from our site for professional and enthusiast programmers.

2

Performance difference when you do what?, what kind of work you want to do with that table?, there is not a single solution that suits every need

– Lamak
May 29 '12 at 19:05

2

Some tangible discussion here perhaps. stackoverflow.com/questions/91688/… stackoverflow.com/questions/5070529/… stackoverflow.com/questions/1251636/… We could write a dissertation about the differences between clustered and non-clustered indexes, but I don't think we would say anything that isn't already out there available for you to read.

– Aaron Bertrand♦
May 29 '12 at 19:36

4

You wrote: "This means that the query engine must take an additional step in order to locate the actual data." Actually, if all you need are columns covered in the index, you don't need to take any additional steps after you find your target rows in the nonclustered index. Only when you need columns not covered by the nonclustered index does SQL Server need to perform a bookmark lookup.

– Nick Chammas
May 29 '12 at 19:39

add a comment |

I was reading Clustered and Non Clustered Indexes.

Clustered Index - It contains Data Pages. That means the complete row
information will be present in the Clustered Index Column.

asked May 29 '12 at 17:32

Pankaj Garg

I was reading Clustered and Non Clustered Indexes.

Clustered Index - It contains Data Pages. That means the complete row
information will be present in the Clustered Index Column.

sql-server sql-server-2008 sql-server-2005

asked May 29 '12 at 17:32

Pankaj Garg

asked May 29 '12 at 17:32

Pankaj Garg

asked May 29 '12 at 17:32

Pankaj Garg

asked May 29 '12 at 17:32

Pankaj Garg

asked May 29 '12 at 17:32

Pankaj Garg

migrated from stackoverflow.com May 29 '12 at 19:01

This question came from our site for professional and enthusiast programmers.

migrated from stackoverflow.com May 29 '12 at 19:01

This question came from our site for professional and enthusiast programmers.

2

Performance difference when you do what?, what kind of work you want to do with that table?, there is not a single solution that suits every need

– Lamak
May 29 '12 at 19:05

2

Some tangible discussion here perhaps. stackoverflow.com/questions/91688/… stackoverflow.com/questions/5070529/… stackoverflow.com/questions/1251636/… We could write a dissertation about the differences between clustered and non-clustered indexes, but I don't think we would say anything that isn't already out there available for you to read.

– Aaron Bertrand♦
May 29 '12 at 19:36

4

You wrote: "This means that the query engine must take an additional step in order to locate the actual data." Actually, if all you need are columns covered in the index, you don't need to take any additional steps after you find your target rows in the nonclustered index. Only when you need columns not covered by the nonclustered index does SQL Server need to perform a bookmark lookup.

– Nick Chammas
May 29 '12 at 19:39

add a comment |

2

Performance difference when you do what?, what kind of work you want to do with that table?, there is not a single solution that suits every need

– Lamak
May 29 '12 at 19:05

2

Some tangible discussion here perhaps. stackoverflow.com/questions/91688/… stackoverflow.com/questions/5070529/… stackoverflow.com/questions/1251636/… We could write a dissertation about the differences between clustered and non-clustered indexes, but I don't think we would say anything that isn't already out there available for you to read.

– Aaron Bertrand♦
May 29 '12 at 19:36

4

You wrote: "This means that the query engine must take an additional step in order to locate the actual data." Actually, if all you need are columns covered in the index, you don't need to take any additional steps after you find your target rows in the nonclustered index. Only when you need columns not covered by the nonclustered index does SQL Server need to perform a bookmark lookup.

– Nick Chammas
May 29 '12 at 19:39

Performance difference when you do what?, what kind of work you want to do with that table?, there is not a single solution that suits every need

– Lamak
May 29 '12 at 19:05

Some tangible discussion here perhaps. stackoverflow.com/questions/91688/… stackoverflow.com/questions/5070529/… stackoverflow.com/questions/1251636/… We could write a dissertation about the differences between clustered and non-clustered indexes, but I don't think we would say anything that isn't already out there available for you to read.

– Aaron Bertrand♦
May 29 '12 at 19:36

You wrote: "This means that the query engine must take an additional step in order to locate the actual data." Actually, if all you need are columns covered in the index, you don't need to take any additional steps after you find your target rows in the nonclustered index. Only when you need columns not covered by the nonclustered index does SQL Server need to perform a bookmark lookup.

– Nick Chammas
May 29 '12 at 19:39

add a comment |

2 Answers
2

active

oldest

votes

Very good question as it is such a important concept. This is a big topic though and what I am going to show you is a simplification so you can understand the base concepts.

Firstly when you see clustered index think table. In SQL server if a table does not contain a clustered index it is a heap. Creating a clustered index on the table actually transforms the table into a b-tree type structure. Your clustered index IS your table it is not separate from the table

Ever wondered why you can only have one clustered index? Well if we had two clustered indexes we would need two copies of the table. It contains the data after all.

I am going to try and explain this by using a simple example.

NOTE: I created the table in this example and filled it with over 3 million random entries. Then ran the actual queries and pasted the execution plans here.

What you really need to grasp is O notation or operational efficiency. Let's assume you have the following table.

CREATE TABLE [dbo].[Customer](

[CustomerID] [int] IDENTITY(1,1) NOT NULL,

[CustomerName] [varchar](100) NOT NULL,

[CustomerSurname] [varchar](100) NOT NULL,

CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED 

(

[CustomerID] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS  = ON

  , ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

So here we have basic table with a clustered key on CustomerID (Primary key is clustered by default). Thus the table is arranged/ordered based on the primary key CustomerID. The intermediate levels will contain the CustomerID values. The data pages will contain the whole row thus it is the table row.

We will also create a non-clustered index on the CustomerName field. The following code will do it.

CREATE NONCLUSTERED INDEX [ix_Customer_CustomerName] ON [dbo].[Customer] 

 (

[CustomerName] ASC

 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF

  , DROP_EXISTING = OFF, ONLINE = OFF

  , ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

So in this index you would find on the data pages/leaf level nodes a pointer to the intermediate levels in the clustered index. The index is arranged/ordered around the CustomerName field. Thus the intermediate level contains the CustomerName values and the leaf level will contain the pointer (these pointer values are actually the primary key values or CustomerID column).

Right so if we execute the following query:

SELECT * FROM Customer WHERE CustomerID = 1

SQL will probably read the clustered index via a seek operation. A seek operation is a binary search which is much more efficient than a scan which is a sequential search. So in our above example the index is read and by using a binary search SQL can eliminate the data that don't match the criteria we are looking for. See attached screen shot for the query plan.

enter image description here

So the number of operations or O Notation for the seek operation is as follows:

Do binary search on clustered index by comparing the value searched for to the values in the intermediate level.

Return the values that match( remember since the clustered index has all the data in it can return all the columns from the index as it is the row data)

So it is two operations. However if we executed the following query:

SELECT * FROM Customer WHERE CustomerName ='John'

SQL will now use the non-clustered index on the CustomerName to do the search. However since this is a non-clustered index it does not contain the all of the data in the row.

So SQL will do the search on the intermediate levels to find the records that match then do a lookup using the values returned to do another search on the clustered index(aka the table) to retrieve the actual data. This sounds confusing I know but read on and all will become clear.

Since our non-clustered index only contains the CustomerName field(the indexed field values stored in the intermediate nodes) and the pointer to the data which is the CustomerID, the index has no record of the CustomerSurname. The CustomerSurname has to be fetched from the clustered index or table.

When running this query I get the following execution plan:

enter image description here

There are two important things for you to notice in the screen shot above

SQL is saying I have a missing index(the text in green). SQL is suggesting I create a index on CustomerName which includes CustomerID and CustomerSurname.

You will also see that 99% of the time of the query is spent on doing a key lookup on the primary key index/clustered index.

Why is SQL suggesting the index on CustomerName again? Well since the index contains only the CustomerID and the CustomerName SQL still has to find the CustomerSurname from the table/clustered indexes.

If we created the index and we included the CustomerSurname column in the index SQL would be able to satisfy the entire query by just reading the non-clustered index. This is why SQL is suggesting I change my non-clustered index.

Here you can see the extra operation SQL needs to do to get the CustomerSurname column from the clustered key

Thus the number of operations are as follows:

Do binary search on non-clustered index by comparing the value searched for to the values in the intermediate level

For nodes that match read the leaf level node which will contain the pointer for the data in the clustered index (the leaf level nodes will contain the primary key values by the way).

For each value returned do a read on the clustered index(the table) to get the row values out here we would read the CustomerSurname.

Return matching rows

That is 4 operations to get the values out. Twice the amount of operations needed compared to reading the clustered index. The show you that your clustered index is your most powerful index as it contains all the data.

So just to clarify one last point. Why do I say that the pointer in the non-clustered index is the primary key value? Well to demonstrate that the leaf level nodes of the non-clustered index contains the primary key value I change my query to:

SELECT CustomerID

FROM Customer

WHERE CustomerName='Jane'

In this query SQL can read the CustomerID from the non-clustered index. It does not need to do a lookup on the clustered index. This you can see by the execution plan which looks like this.

enter image description here

Notice the difference between this query and the previous query. There is no lookup. SQL can find all the data in the non-clustered index

Hopefully you can begin to understand that clustered index is the table and non-clustered indexes DON'T contain all the data. Indexing will speed up selects due to the fact that binary searches can be done but only clustered indexes contain all the data. So a search on a non-clustered index will almost always result in values being loaded from the clustered index. These extra operations make non-clustered indexes less efficient than a clustered index.

Hope this clears things up. If anything does not make sense please post a comment and I will try clarify. It is rather late here and my brain is feeling a wee bit flat. Time for a red bull.

edited 3 mins ago

Glorfindel

1,0371816

answered May 29 '12 at 20:08

Namphibian

68149

I have a question. WHy is the search an index seek on the non clustered index on CustomerName for this query SELECT * FROM Customer WHERE CustomerName ='John'. Since it is a non clustered index the customername will not be sorted. So shouldnt an index scan be done.

– ckv
Sep 24 '13 at 5:24

BTW Great answer totally understood except the above question.

– ckv
Sep 24 '13 at 5:25

1

A index is sorted in the order of the data. For example it would be sorted on Customer name since it is the indexed value. So it is sorted. Remember it still has to scan the leaf level or pages.

– Namphibian
Sep 24 '13 at 11:07

add a comment |

"This means that the query engine must take an additional step in order to locate the actual data."

Not necessarily - if the index is covering for a given query, no trip has to be made to the data pages. Also, with included columns, additional columns can be added to a non-clustered index to make it covering without altering the key size.

So the ultimate answer is - It Depends (on a lot more information than you can really cover in a single question) - you need to understand all the capabilities of the indexes and the execution plan for a given query may diverge from your expectations.

A general rule of thumb I have is that a table always has a clustered index (and usually on an identity or sequential GUID), but non-clustered indexes are added for performance. But there are always exceptions - heap tables have a place, wider clustered indexes have a place. Seemingly redundant indexes which are narrower to fit more rows per page have a place. etc. etc.

And I wouldn't worry about the limits on the various indexes allowed - that's almost certainly not going to come into play in many real-world examples.

answered May 29 '12 at 20:07

Cade Roux

5,1752042

2

+1 for there are always exceptions - too many people omit this and think every clustered index should be an int identity no matter what.

– JNK
May 29 '12 at 20:29

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f18528%2fperformance-difference-between-clustered-and-non-clustered-index%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Very good question as it is such a important concept. This is a big topic though and what I am going to show you is a simplification so you can understand the base concepts.

Ever wondered why you can only have one clustered index? Well if we had two clustered indexes we would need two copies of the table. It contains the data after all.

I am going to try and explain this by using a simple example.

NOTE: I created the table in this example and filled it with over 3 million random entries. Then ran the actual queries and pasted the execution plans here.

What you really need to grasp is O notation or operational efficiency. Let's assume you have the following table.

CREATE TABLE [dbo].[Customer](

[CustomerID] [int] IDENTITY(1,1) NOT NULL,

[CustomerName] [varchar](100) NOT NULL,

[CustomerSurname] [varchar](100) NOT NULL,

CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED 

(

[CustomerID] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS  = ON

  , ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

We will also create a non-clustered index on the CustomerName field. The following code will do it.

CREATE NONCLUSTERED INDEX [ix_Customer_CustomerName] ON [dbo].[Customer] 

 (

[CustomerName] ASC

 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF

  , DROP_EXISTING = OFF, ONLINE = OFF

  , ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Right so if we execute the following query:

SELECT * FROM Customer WHERE CustomerID = 1

enter image description here

So the number of operations or O Notation for the seek operation is as follows:

Do binary search on clustered index by comparing the value searched for to the values in the intermediate level.

Return the values that match( remember since the clustered index has all the data in it can return all the columns from the index as it is the row data)

So it is two operations. However if we executed the following query:

SELECT * FROM Customer WHERE CustomerName ='John'

SQL will now use the non-clustered index on the CustomerName to do the search. However since this is a non-clustered index it does not contain the all of the data in the row.

When running this query I get the following execution plan:

enter image description here

There are two important things for you to notice in the screen shot above

SQL is saying I have a missing index(the text in green). SQL is suggesting I create a index on CustomerName which includes CustomerID and CustomerSurname.

You will also see that 99% of the time of the query is spent on doing a key lookup on the primary key index/clustered index.

Here you can see the extra operation SQL needs to do to get the CustomerSurname column from the clustered key

Thus the number of operations are as follows:

Do binary search on non-clustered index by comparing the value searched for to the values in the intermediate level

For nodes that match read the leaf level node which will contain the pointer for the data in the clustered index (the leaf level nodes will contain the primary key values by the way).

For each value returned do a read on the clustered index(the table) to get the row values out here we would read the CustomerSurname.

Return matching rows

SELECT CustomerID

FROM Customer

WHERE CustomerName='Jane'

In this query SQL can read the CustomerID from the non-clustered index. It does not need to do a lookup on the clustered index. This you can see by the execution plan which looks like this.

enter image description here

Notice the difference between this query and the previous query. There is no lookup. SQL can find all the data in the non-clustered index

Hope this clears things up. If anything does not make sense please post a comment and I will try clarify. It is rather late here and my brain is feeling a wee bit flat. Time for a red bull.

edited 3 mins ago

Glorfindel

1,0371816

answered May 29 '12 at 20:08

Namphibian

68149

I have a question. WHy is the search an index seek on the non clustered index on CustomerName for this query SELECT * FROM Customer WHERE CustomerName ='John'. Since it is a non clustered index the customername will not be sorted. So shouldnt an index scan be done.

– ckv
Sep 24 '13 at 5:24

BTW Great answer totally understood except the above question.

– ckv
Sep 24 '13 at 5:25

1

A index is sorted in the order of the data. For example it would be sorted on Customer name since it is the indexed value. So it is sorted. Remember it still has to scan the leaf level or pages.

– Namphibian
Sep 24 '13 at 11:07

add a comment |

Very good question as it is such a important concept. This is a big topic though and what I am going to show you is a simplification so you can understand the base concepts.

Ever wondered why you can only have one clustered index? Well if we had two clustered indexes we would need two copies of the table. It contains the data after all.

I am going to try and explain this by using a simple example.

NOTE: I created the table in this example and filled it with over 3 million random entries. Then ran the actual queries and pasted the execution plans here.

What you really need to grasp is O notation or operational efficiency. Let's assume you have the following table.

CREATE TABLE [dbo].[Customer](

[CustomerID] [int] IDENTITY(1,1) NOT NULL,

[CustomerName] [varchar](100) NOT NULL,

[CustomerSurname] [varchar](100) NOT NULL,

CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED 

(

[CustomerID] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS  = ON

  , ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

We will also create a non-clustered index on the CustomerName field. The following code will do it.

CREATE NONCLUSTERED INDEX [ix_Customer_CustomerName] ON [dbo].[Customer] 

 (

[CustomerName] ASC

 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF

  , DROP_EXISTING = OFF, ONLINE = OFF

  , ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Right so if we execute the following query:

SELECT * FROM Customer WHERE CustomerID = 1

enter image description here

So the number of operations or O Notation for the seek operation is as follows:

Do binary search on clustered index by comparing the value searched for to the values in the intermediate level.

Return the values that match( remember since the clustered index has all the data in it can return all the columns from the index as it is the row data)

So it is two operations. However if we executed the following query:

SELECT * FROM Customer WHERE CustomerName ='John'

SQL will now use the non-clustered index on the CustomerName to do the search. However since this is a non-clustered index it does not contain the all of the data in the row.

When running this query I get the following execution plan:

enter image description here

There are two important things for you to notice in the screen shot above

SQL is saying I have a missing index(the text in green). SQL is suggesting I create a index on CustomerName which includes CustomerID and CustomerSurname.

You will also see that 99% of the time of the query is spent on doing a key lookup on the primary key index/clustered index.

Here you can see the extra operation SQL needs to do to get the CustomerSurname column from the clustered key

Thus the number of operations are as follows:

Do binary search on non-clustered index by comparing the value searched for to the values in the intermediate level

For nodes that match read the leaf level node which will contain the pointer for the data in the clustered index (the leaf level nodes will contain the primary key values by the way).

For each value returned do a read on the clustered index(the table) to get the row values out here we would read the CustomerSurname.

Return matching rows

SELECT CustomerID

FROM Customer

WHERE CustomerName='Jane'

In this query SQL can read the CustomerID from the non-clustered index. It does not need to do a lookup on the clustered index. This you can see by the execution plan which looks like this.

enter image description here

Notice the difference between this query and the previous query. There is no lookup. SQL can find all the data in the non-clustered index

Hope this clears things up. If anything does not make sense please post a comment and I will try clarify. It is rather late here and my brain is feeling a wee bit flat. Time for a red bull.

edited 3 mins ago

Glorfindel

1,0371816

answered May 29 '12 at 20:08

Namphibian

68149

I have a question. WHy is the search an index seek on the non clustered index on CustomerName for this query SELECT * FROM Customer WHERE CustomerName ='John'. Since it is a non clustered index the customername will not be sorted. So shouldnt an index scan be done.

– ckv
Sep 24 '13 at 5:24

BTW Great answer totally understood except the above question.

– ckv
Sep 24 '13 at 5:25

1

A index is sorted in the order of the data. For example it would be sorted on Customer name since it is the indexed value. So it is sorted. Remember it still has to scan the leaf level or pages.

– Namphibian
Sep 24 '13 at 11:07

add a comment |

Very good question as it is such a important concept. This is a big topic though and what I am going to show you is a simplification so you can understand the base concepts.

Ever wondered why you can only have one clustered index? Well if we had two clustered indexes we would need two copies of the table. It contains the data after all.

I am going to try and explain this by using a simple example.

NOTE: I created the table in this example and filled it with over 3 million random entries. Then ran the actual queries and pasted the execution plans here.

What you really need to grasp is O notation or operational efficiency. Let's assume you have the following table.

CREATE TABLE [dbo].[Customer](

[CustomerID] [int] IDENTITY(1,1) NOT NULL,

[CustomerName] [varchar](100) NOT NULL,

[CustomerSurname] [varchar](100) NOT NULL,

CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED 

(

[CustomerID] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS  = ON

  , ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

We will also create a non-clustered index on the CustomerName field. The following code will do it.

CREATE NONCLUSTERED INDEX [ix_Customer_CustomerName] ON [dbo].[Customer] 

 (

[CustomerName] ASC

 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF

  , DROP_EXISTING = OFF, ONLINE = OFF

  , ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Right so if we execute the following query:

SELECT * FROM Customer WHERE CustomerID = 1

enter image description here

So the number of operations or O Notation for the seek operation is as follows:

Do binary search on clustered index by comparing the value searched for to the values in the intermediate level.

Return the values that match( remember since the clustered index has all the data in it can return all the columns from the index as it is the row data)

So it is two operations. However if we executed the following query:

SELECT * FROM Customer WHERE CustomerName ='John'

SQL will now use the non-clustered index on the CustomerName to do the search. However since this is a non-clustered index it does not contain the all of the data in the row.

When running this query I get the following execution plan:

enter image description here

There are two important things for you to notice in the screen shot above

SQL is saying I have a missing index(the text in green). SQL is suggesting I create a index on CustomerName which includes CustomerID and CustomerSurname.

You will also see that 99% of the time of the query is spent on doing a key lookup on the primary key index/clustered index.

Here you can see the extra operation SQL needs to do to get the CustomerSurname column from the clustered key

Thus the number of operations are as follows:

Do binary search on non-clustered index by comparing the value searched for to the values in the intermediate level

For nodes that match read the leaf level node which will contain the pointer for the data in the clustered index (the leaf level nodes will contain the primary key values by the way).

For each value returned do a read on the clustered index(the table) to get the row values out here we would read the CustomerSurname.

Return matching rows

SELECT CustomerID

FROM Customer

WHERE CustomerName='Jane'

In this query SQL can read the CustomerID from the non-clustered index. It does not need to do a lookup on the clustered index. This you can see by the execution plan which looks like this.

enter image description here

Notice the difference between this query and the previous query. There is no lookup. SQL can find all the data in the non-clustered index

Hope this clears things up. If anything does not make sense please post a comment and I will try clarify. It is rather late here and my brain is feeling a wee bit flat. Time for a red bull.

edited 3 mins ago

Glorfindel

1,0371816

answered May 29 '12 at 20:08

Namphibian

68149

Very good question as it is such a important concept. This is a big topic though and what I am going to show you is a simplification so you can understand the base concepts.

Ever wondered why you can only have one clustered index? Well if we had two clustered indexes we would need two copies of the table. It contains the data after all.

I am going to try and explain this by using a simple example.

NOTE: I created the table in this example and filled it with over 3 million random entries. Then ran the actual queries and pasted the execution plans here.

What you really need to grasp is O notation or operational efficiency. Let's assume you have the following table.

CREATE TABLE [dbo].[Customer](

[CustomerID] [int] IDENTITY(1,1) NOT NULL,

[CustomerName] [varchar](100) NOT NULL,

[CustomerSurname] [varchar](100) NOT NULL,

CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED 

(

[CustomerID] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS  = ON

  , ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

We will also create a non-clustered index on the CustomerName field. The following code will do it.

CREATE NONCLUSTERED INDEX [ix_Customer_CustomerName] ON [dbo].[Customer] 

 (

[CustomerName] ASC

 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF

  , SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF

  , DROP_EXISTING = OFF, ONLINE = OFF

  , ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Right so if we execute the following query:

SELECT * FROM Customer WHERE CustomerID = 1

enter image description here

So the number of operations or O Notation for the seek operation is as follows:

Do binary search on clustered index by comparing the value searched for to the values in the intermediate level.

Return the values that match( remember since the clustered index has all the data in it can return all the columns from the index as it is the row data)

So it is two operations. However if we executed the following query:

SELECT * FROM Customer WHERE CustomerName ='John'

SQL will now use the non-clustered index on the CustomerName to do the search. However since this is a non-clustered index it does not contain the all of the data in the row.

When running this query I get the following execution plan:

enter image description here

There are two important things for you to notice in the screen shot above

SQL is saying I have a missing index(the text in green). SQL is suggesting I create a index on CustomerName which includes CustomerID and CustomerSurname.

You will also see that 99% of the time of the query is spent on doing a key lookup on the primary key index/clustered index.

Here you can see the extra operation SQL needs to do to get the CustomerSurname column from the clustered key

Thus the number of operations are as follows:

Do binary search on non-clustered index by comparing the value searched for to the values in the intermediate level

For nodes that match read the leaf level node which will contain the pointer for the data in the clustered index (the leaf level nodes will contain the primary key values by the way).

For each value returned do a read on the clustered index(the table) to get the row values out here we would read the CustomerSurname.

Return matching rows

SELECT CustomerID

FROM Customer

WHERE CustomerName='Jane'

In this query SQL can read the CustomerID from the non-clustered index. It does not need to do a lookup on the clustered index. This you can see by the execution plan which looks like this.

enter image description here

Notice the difference between this query and the previous query. There is no lookup. SQL can find all the data in the non-clustered index

Hope this clears things up. If anything does not make sense please post a comment and I will try clarify. It is rather late here and my brain is feeling a wee bit flat. Time for a red bull.

edited 3 mins ago

Glorfindel

1,0371816

answered May 29 '12 at 20:08

Namphibian

68149

edited 3 mins ago

Glorfindel

1,0371816

edited 3 mins ago

Glorfindel

1,0371816

edited 3 mins ago

Glorfindel

1,0371816

answered May 29 '12 at 20:08

Namphibian

68149

answered May 29 '12 at 20:08

Namphibian

68149

answered May 29 '12 at 20:08

Namphibian

68149

I have a question. WHy is the search an index seek on the non clustered index on CustomerName for this query SELECT * FROM Customer WHERE CustomerName ='John'. Since it is a non clustered index the customername will not be sorted. So shouldnt an index scan be done.

– ckv
Sep 24 '13 at 5:24

BTW Great answer totally understood except the above question.

– ckv
Sep 24 '13 at 5:25

1

A index is sorted in the order of the data. For example it would be sorted on Customer name since it is the indexed value. So it is sorted. Remember it still has to scan the leaf level or pages.

– Namphibian
Sep 24 '13 at 11:07

add a comment |

I have a question. WHy is the search an index seek on the non clustered index on CustomerName for this query SELECT * FROM Customer WHERE CustomerName ='John'. Since it is a non clustered index the customername will not be sorted. So shouldnt an index scan be done.

– ckv
Sep 24 '13 at 5:24

BTW Great answer totally understood except the above question.

– ckv
Sep 24 '13 at 5:25

1

A index is sorted in the order of the data. For example it would be sorted on Customer name since it is the indexed value. So it is sorted. Remember it still has to scan the leaf level or pages.

– Namphibian
Sep 24 '13 at 11:07

I have a question. WHy is the search an index seek on the non clustered index on CustomerName for this query SELECT * FROM Customer WHERE CustomerName ='John'. Since it is a non clustered index the customername will not be sorted. So shouldnt an index scan be done.

– ckv
Sep 24 '13 at 5:24

BTW Great answer totally understood except the above question.

– ckv
Sep 24 '13 at 5:25

A index is sorted in the order of the data. For example it would be sorted on Customer name since it is the indexed value. So it is sorted. Remember it still has to scan the leaf level or pages.

– Namphibian
Sep 24 '13 at 11:07

add a comment |

"This means that the query engine must take an additional step in order to locate the actual data."

And I wouldn't worry about the limits on the various indexes allowed - that's almost certainly not going to come into play in many real-world examples.

answered May 29 '12 at 20:07

Cade Roux

5,1752042

2

+1 for there are always exceptions - too many people omit this and think every clustered index should be an int identity no matter what.

– JNK
May 29 '12 at 20:29

add a comment |

"This means that the query engine must take an additional step in order to locate the actual data."

And I wouldn't worry about the limits on the various indexes allowed - that's almost certainly not going to come into play in many real-world examples.

answered May 29 '12 at 20:07

Cade Roux

5,1752042

2

+1 for there are always exceptions - too many people omit this and think every clustered index should be an int identity no matter what.

– JNK
May 29 '12 at 20:29

add a comment |

"This means that the query engine must take an additional step in order to locate the actual data."

And I wouldn't worry about the limits on the various indexes allowed - that's almost certainly not going to come into play in many real-world examples.

answered May 29 '12 at 20:07

Cade Roux

5,1752042

"This means that the query engine must take an additional step in order to locate the actual data."

And I wouldn't worry about the limits on the various indexes allowed - that's almost certainly not going to come into play in many real-world examples.

answered May 29 '12 at 20:07

Cade Roux

5,1752042

answered May 29 '12 at 20:07

Cade Roux

5,1752042

answered May 29 '12 at 20:07

Cade Roux

5,1752042

answered May 29 '12 at 20:07

Cade Roux

5,1752042

2

+1 for there are always exceptions - too many people omit this and think every clustered index should be an int identity no matter what.

– JNK
May 29 '12 at 20:29

add a comment |

2

+1 for there are always exceptions - too many people omit this and think every clustered index should be an int identity no matter what.

– JNK
May 29 '12 at 20:29

+1 for there are always exceptions - too many people omit this and think every clustered index should be an int identity no matter what.

– JNK
May 29 '12 at 20:29

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sfrgttk