Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 PK slower than Non-Clustered index?!?!

Author  Topic 

malachi151
Posting Yak Master

152 Posts

Posted - 2012-04-06 : 07:36:42
This is baffling. I had this question on "an exam" and it confused the heck out of me, so I've followed up with some testing and now I'm even more confused.

First let's just start with the DDL (This is my own version of this, not directly from "the exam"):


CREATE TABLE [dbo].[ExamTest](
[Col1] [int] NOT NULL,
[Col2] [int] NOT NULL,
CONSTRAINT [PK_ExamTest] PRIMARY KEY CLUSTERED
(
[Col1] ASC,
[Col2] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_ExamTest] ON [dbo].[ExamTest]
(
[Col2] ASC,
[Col1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


Now populate the table:


INSERT INTO dbo.ExamTest
(Col1, Col2)
VALUES
(CAST(RAND()*1000 as int), CAST(RAND()*1000 as int))
GO 20000


Notice that the IX is simply a reverse order of the PK.

Now turn on Show Actual Execution Plans in the query options.

The first off thing that I run into is the fact that if you simply do a select * from this table, it uses the IX index, not the PK. The results will be ordered by the IX.

Why? Why doesn't the optimizer use the PK by default?

Let's go further. Run the following as a single batch:


SELECT
COUNT(*), Col1, Col2
FROM
dbo.ExamTest WITH(INDEX(PK_ExamTest))
GROUP BY
Col1, Col2;

SELECT
COUNT(*), Col2, Col1
FROM
dbo.ExamTest
GROUP BY
Col2, Col1;


What you should see is that the relative cost of the first query is more than the second, in my case its 53% to 47%.

How can this be? Also, if you don't force it to use the PK, it will use the IX, why would it use the IX to do a group by on Col1, Col2, when the IX is on Col2, Col1? How can that possibly be more efficient than using the PK on Col1, Col2?

The question on the test (which was slightly different than this) was, how would you improve the performance of both queries? (Except on the test they didn't force the use of the PK, but the query with the PK showed in the plans to have a relative cost of 60% of the batch)

I was totally baffled, I couldn't figure out why the query using the PK would be slower than the one using the IX in the first place and assumed the whole question was just totally messed up and wrong.


--------------------------------------------------
My blog: http://practicaltsql.net/
Twitter: @RGPSoftware

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2012-04-08 : 10:54:43
Two things you may not be considering:
1) set statistics io on for both and you may see that the clustered index requires more reads (for the same amount of data).
2) the non-clustered index covers all needed data. Sql server does not require the table to perform the query. All it needs to scan is the nonclustered index.

Be One with the Optimizer
TG
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2012-04-08 : 14:51:35
For this part of your question I need to know whether you have any knowledge on IAM pages ?

quote:
The first off thing that I run into is the fact that if you simply do a select * from this table, it uses the IX index, not the PK. The results will be ordered by the IX.

Why? Why doesn't the optimizer use the PK by default?


If no then I think I will be wasting my time explaining why the optimizer went for a unordered scan and not an ordered scan.


After Monday and Tuesday even the calendar says W T F ....
Go to Top of Page

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2012-04-08 : 22:00:13
Could you post the full test question, malachi151?
Go to Top of Page

malachi151
Posting Yak Master

152 Posts

Posted - 2012-04-09 : 09:35:39
quote:
Originally posted by TG

Two things you may not be considering:
1) set statistics io on for both and you may see that the clustered index requires more reads (for the same amount of data).
2) the non-clustered index covers all needed data. Sql server does not require the table to perform the query. All it needs to scan is the nonclustered index.

Be One with the Optimizer
TG



1) I believe I noticed then at the time, yes. At least the IO cost was higher on the clustered index.

2) True, but the table has only two columns, so shouldn't the leafs for the clustered and non-clustered indexes be the same? And if so, then when grouping by Col1, Col2, wouldn't using the clustered index for Col1, Col2 be faster than using the non-clustered index for Col2, Col1?

--------------------------------------------------
My blog: http://practicaltsql.net/
Twitter: @RGPSoftware
Go to Top of Page

malachi151
Posting Yak Master

152 Posts

Posted - 2012-04-09 : 09:48:32
quote:
Originally posted by Sachin.Nand

For this part of your question I need to know whether you have any knowledge on IAM pages ?

If no then I think I will be wasting my time explaining why the optimizer went for a unordered scan and not an ordered scan.


Index Allocation Maps, yeah. Actually I'm reading the Query Tuning chapter in Inside SQL Server: T-SQL Querying by Itzik Ben-Gan right now, which covers the topic.

--------------------------------------------------
My blog: http://practicaltsql.net/
Twitter: @RGPSoftware
Go to Top of Page

malachi151
Posting Yak Master

152 Posts

Posted - 2012-04-09 : 10:13:51
quote:
Originally posted by namman

Could you post the full test question, malachi151?



#1 I don't recall the exact question and #2 even if I did I wouldn't post it :p

But as far as I can tell, it seems to me that the correct answer of how to improve performance in this specific scenario would be to remove the PK (clustered index) and leave the table in a heap with a non-clustered index on it.

--------------------------------------------------
My blog: http://practicaltsql.net/
Twitter: @RGPSoftware
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2012-04-09 : 11:09:59
quote:
Originally posted by malachi151

quote:
Originally posted by Sachin.Nand

For this part of your question I need to know whether you have any knowledge on IAM pages ?

If no then I think I will be wasting my time explaining why the optimizer went for a unordered scan and not an ordered scan.


Index Allocation Maps, yeah. Actually I'm reading the Query Tuning chapter in Inside SQL Server: T-SQL Querying by Itzik Ben-Gan right now, which covers the topic.

--------------------------------------------------
My blog: http://practicaltsql.net/
Twitter: @RGPSoftware



Great...

So IAM pages basically tracks the extents which are allocated to indexes at leaf level to be precise.The order of the extents in the IAM page is same as that exists in a data file.

So now with a heap allocation order scan(IAM scans)is the only way to scan the table but with a clustered table there are options available linked list(as data is physically sorted in the index with the each page having a pointer to the next and previous page in the index) or allocation order scan.

It maybe highly possible that the pages on the disk are sorted by its physical order.So you might get the records actually sorted though the optimizer goes for a allocation order scan.

Optimizer will always go for a allocation order scan as it is going to be faster as pages will be read in order that they are written on the disk with almost zero seeks.

After Monday and Tuesday even the calendar says W T F ....
Go to Top of Page

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2012-04-09 : 21:59:11

#1 I don't recall the exact question and #2 even if I did I wouldn't post it :p

But as far as I can tell, it seems to me that the correct answer of how to improve performance in this specific scenario would be to remove the PK (clustered index) and leave the table in a heap with a non-clustered index on it.


I ask you to post the full test question because I think the test wants to check you knowledge on a certain area that I did not see (or not clear) on your post. You don't remember. That's fine.
Go to Top of Page
   

- Advertisement -