Non Clustered Covering Index Column Selection Rule

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Non Clustered Covering Index Column Selection Rule

Author

Topic

Soumya Kar
Starting Member

7 Posts

Posted - 2012-01-11 : 05:17:01

I have a table with around 19 columns which contains reasonable large amount of data and is primarily being queried to retrieve data using select statements based on different where clause. Since this table is primarily queried to get data, I thought about creating Non Clustered indexes based on the different where clauses getting used in the queries. Also, all the get queries returns all the columns in the table as part of the select list. Based on the information above, I have two questions for selecting the indexes:
1. let us assume that we have the following SPs which queries as:

where [col_a] = {value} and [col_b] = {value}
[col_b] = {value} and [col_a] = {value}
[col_a] = {value} and [col_c] = {value} and [col_d] = {value}
[col_a] = {value} and [col_c] = {value}

I have created the following Non Clustered indexes on the table as
[col_a] and [col_b] --> Would the first SP still use this index as the orders are reversed
[col_a] and [col_c] and [col_d] --> Would the last SP use this index as the first two columns match with order

Also, should we go ahead and try to define Non Clustered indexes based on the filter/join clauses for the get SPs on a table?
2. Since the select list in all the SPs return the entire list of columns, I added all the columns of the table as included columns in the Non Clustered indexes(covering index) to avoid bookmark lookups. Is this approach correct? What are the space implications in this case since we are storing all the table columns as part of the index definition?

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-11 : 07:42:52

The order of the predicates in the where clause doesn't matter. SQL SERVER orders them as it pleases based on statistics on the objects involved..
(as a side note this means you can't reliably short circuit predicates -- i.e if you have a predicate that says: (a = b OR c = d) then you can't guarantee that c = d won't be evaluated if a = b is true)

Some points:

quote:
Also, all the get queries returns all the columns in the table as part of the select list.

Why? Do you *really* need to. You obviously already understand the consequences (Clustered Index lookups)

The value of an index on a column corresponds to the selectivity of the column/query and the cardinality of the column. Roughly what that means is:
if your column has a lot of distinct values in it then an index is better than a column with few distinct values.

Consider a BIT column. The only values it can have are 1 and 0 (and possible NULL). This means that an index on that column can only help you so far. If there is a roughly equal distribution of these values then the index is all but useless. However if the vast majority are 0 and you want to fine the few values that are 1 then an index could be an option (though you'd want a FILTERED index rather).

For columns with a lot of distinct values (say email addresses) then an index is generally a good choice.

see here for a bit of a primer: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
http://msdn.microsoft.com/en-us/library/ms179613.aspx
http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/
http://www.google.co.uk/search?sourceid=chrome&ie=UTF-8&q=sql+server+understanding+indexes

With regards to INCLUDING all the values. This will take up a lot of space because all the information is duplicated for each leaf node in your indexes.

If you are not SELECTING many rows in the result then the CI lookup shouldn't be a concern.

So. short answer from long. IT DEPENDS.

if you expand on your environment and your performance envelope then we'll be able to advise more.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

Kristen
Test

22859 Posts

Posted - 2012-01-11 : 07:49:19

quote:
Originally posted by Transact Charlie

quote:
Also, all the get queries returns all the columns in the table as part of the select list.

Why? Do you *really* need to. You obviously already understand the consequences (Clustered Index lookups)

Also it takes some "bandwidth" to return all those columns, if the User doesn't need them all.

Another issue is that if an Application is currently using "all" the columns, will it use new columns that are added later?

I've seen plenty of SELECT * applications where performance has collapsed when someone added a VARCHAR(MAX) column for "Notes" and all the users thought it so brilliant because they could put their life-stories in the NOTES column!

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-11 : 07:49:30

addendum. If you *are* selecting a lot of rows from the table then you may find that your indexes are ignored by the engine / they are actually hurtful if used.

If you are retrieving a large percentage of the data from a table then a table scan / CI scan is generally the most efficient way to do it.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-11 : 07:51:01

quote:

I've seen plenty of SELECT * applications where performance has collapsed when someone added a VARCHAR(MAX) column for "Notes" and all the users thought it so brilliant because they could put their life-stories in the NOTES column!

++ Yeah, don't do that. SELECT * is a disease.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

Kristen
Test

22859 Posts

Posted - 2012-01-11 : 07:56:45

"[col_a] = {value} and [col_b] = {value}
[col_b] = {value} and [col_a] = {value}"

Any index on Col_A, Col_B, Col_A, Col_B or Col_B, Col_A would be used for both these (provided that the index was sufficiently selective etc etc etc)

I work on the basis of putting the most selective column FIRST. But if I have

[col_A] = {value}
[col_A] = {value} and [col_B] = {value}

then I put Col_A first to see if the composite index will be used for both.

"What are the space implications in this case since we are storing all the table columns as part of the index definition?"

Not just that, every UPDATE will have to "adjust" the index as well as the underlying data record. We try to avoid including anything in indexes that changes "often". If we can read it 10x as often as it changes, or more

, then I'm happy.

You can query DMV to find unused indexes, e.g.
http://sqlserverpedia.com/wiki/Find_Indexes_Not_In_Use

Queries in the Plan Cache That Are Missing an Index
http://sqlserverpedia.com/wiki/Find_Queries_in_the_Plan_Cache_That_Are_Missing_an_Index

Soumya Kar
Starting Member

7 Posts

Posted - 2012-01-11 : 08:27:25

quote:
Originally posted by Transact Charlie

The order of the predicates in the where clause doesn't matter. SQL SERVER orders them as it pleases based on statistics on the objects involved..
(as a side note this means you can't reliably short circuit predicates -- i.e if you have a predicate that says: (a = b OR c = d) then you can't guarantee that c = d won't be evaluated if a = b is true)

Some points:
quote:
Also, all the get queries returns all the columns in the table as part of the select list.

Why? Do you *really* need to. You obviously already understand the consequences (Clustered Index lookups)

quote:
I do understand the consequences of a select *. Actually this is an existing SP and I was just trying to validate my understanding that using a lot of columns in the included list of an index actually multiplies the space used.

The value of an index on a column corresponds to the selectivity of the column/query and the cardinality of the column. Roughly what that means is:
if your column has a lot of distinct values in it then an index is better than a column with few distinct values.

Consider a BIT column. The only values it can have are 1 and 0 (and possible NULL). This means that an index on that column can only help you so far. If there is a roughly equal distribution of these values then the index is all but useless. However if the vast majority are 0 and you want to fine the few values that are 1 then an index could be an option (though you'd want a FILTERED index rather).

For columns with a lot of distinct values (say email addresses) then an index is generally a good choice.

see here for a bit of a primer: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
http://msdn.microsoft.com/en-us/library/ms179613.aspx
http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/
http://www.google.co.uk/search?sourceid=chrome&ie=UTF-8&q=sql+server+understanding+indexes

With regards to INCLUDING all the values. This will take up a lot of space because all the information is duplicated for each leaf node in your indexes.

If you are not SELECTING many rows in the result then the CI lookup shouldn't be a concern.

So. short answer from long. IT DEPENDS.

if you expand on your environment and your performance envelope then we'll be able to advise more.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-11 : 08:27:51

[Fistbump]

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

Soumya Kar
Starting Member

7 Posts

Posted - 2012-01-11 : 08:29:15

quote:
Originally posted by Kristen

"[col_a] = {value} and [col_b] = {value}
[col_b] = {value} and [col_a] = {value}"

Any index on Col_A, Col_B, Col_A, Col_B or Col_B, Col_A would be used for both these (provided that the index was sufficiently selective etc etc etc)

I work on the basis of putting the most selective column FIRST. But if I have

[col_A] = {value}
[col_A] = {value} and [col_B] = {value}

then I put Col_A first to see if the composite index will be used for both.

"What are the space implications in this case since we are storing all the table columns as part of the index definition?"

Not just that, every UPDATE will have to "adjust" the index as well as the underlying data record. We try to avoid including anything in indexes that changes "often". If we can read it 10x as often as it changes, or more , then I'm happy.

You can query DMV to find unused indexes, e.g.
http://sqlserverpedia.com/wiki/Find_Indexes_Not_In_Use

Queries in the Plan Cache That Are Missing an Index
http://sqlserverpedia.com/wiki/Find_Queries_in_the_Plan_Cache_That_Are_Missing_an_Index

quote:
Thanks a lot Kirsten for pointing out the anything that changes often point regarding the selection of included columns

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources