Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
malachi151
Posting Yak Master
152 Posts |
Posted - 2012-04-06 : 07:36:42
|
This is baffling. I had this question on "an exam" and it confused the heck out of me, so I've followed up with some testing and now I'm even more confused.First let's just start with the DDL (This is my own version of this, not directly from "the exam"):CREATE TABLE [dbo].[ExamTest]( [Col1] [int] NOT NULL, [Col2] [int] NOT NULL,CONSTRAINT [PK_ExamTest] PRIMARY KEY CLUSTERED ( [Col1] ASC, [Col2] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]GOCREATE NONCLUSTERED INDEX [IX_ExamTest] ON [dbo].[ExamTest]( [Col2] ASC, [Col1] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]GO Now populate the table:INSERT INTO dbo.ExamTest(Col1, Col2)VALUES(CAST(RAND()*1000 as int), CAST(RAND()*1000 as int))GO 20000 Notice that the IX is simply a reverse order of the PK.Now turn on Show Actual Execution Plans in the query options.The first off thing that I run into is the fact that if you simply do a select * from this table, it uses the IX index, not the PK. The results will be ordered by the IX.Why? Why doesn't the optimizer use the PK by default?Let's go further. Run the following as a single batch:SELECT COUNT(*), Col1, Col2FROM dbo.ExamTest WITH(INDEX(PK_ExamTest))GROUP BY Col1, Col2;SELECT COUNT(*), Col2, Col1FROM dbo.ExamTestGROUP BY Col2, Col1; What you should see is that the relative cost of the first query is more than the second, in my case its 53% to 47%.How can this be? Also, if you don't force it to use the PK, it will use the IX, why would it use the IX to do a group by on Col1, Col2, when the IX is on Col2, Col1? How can that possibly be more efficient than using the PK on Col1, Col2?The question on the test (which was slightly different than this) was, how would you improve the performance of both queries? (Except on the test they didn't force the use of the PK, but the query with the PK showed in the plans to have a relative cost of 60% of the batch)I was totally baffled, I couldn't figure out why the query using the PK would be slower than the one using the IX in the first place and assumed the whole question was just totally messed up and wrong.--------------------------------------------------My blog: http://practicaltsql.net/Twitter: @RGPSoftware |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2012-04-08 : 10:54:43
|
Two things you may not be considering:1) set statistics io on for both and you may see that the clustered index requires more reads (for the same amount of data).2) the non-clustered index covers all needed data. Sql server does not require the table to perform the query. All it needs to scan is the nonclustered index.Be One with the OptimizerTG |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2012-04-08 : 14:51:35
|
For this part of your question I need to know whether you have any knowledge on IAM pages ?quote: The first off thing that I run into is the fact that if you simply do a select * from this table, it uses the IX index, not the PK. The results will be ordered by the IX.Why? Why doesn't the optimizer use the PK by default?
If no then I think I will be wasting my time explaining why the optimizer went for a unordered scan and not an ordered scan.After Monday and Tuesday even the calendar says W T F .... |
 |
|
namman
Constraint Violating Yak Guru
285 Posts |
Posted - 2012-04-08 : 22:00:13
|
Could you post the full test question, malachi151? |
 |
|
malachi151
Posting Yak Master
152 Posts |
Posted - 2012-04-09 : 09:35:39
|
quote: Originally posted by TG Two things you may not be considering:1) set statistics io on for both and you may see that the clustered index requires more reads (for the same amount of data).2) the non-clustered index covers all needed data. Sql server does not require the table to perform the query. All it needs to scan is the nonclustered index.Be One with the OptimizerTG
1) I believe I noticed then at the time, yes. At least the IO cost was higher on the clustered index.2) True, but the table has only two columns, so shouldn't the leafs for the clustered and non-clustered indexes be the same? And if so, then when grouping by Col1, Col2, wouldn't using the clustered index for Col1, Col2 be faster than using the non-clustered index for Col2, Col1?--------------------------------------------------My blog: http://practicaltsql.net/Twitter: @RGPSoftware |
 |
|
malachi151
Posting Yak Master
152 Posts |
Posted - 2012-04-09 : 09:48:32
|
quote: Originally posted by Sachin.Nand For this part of your question I need to know whether you have any knowledge on IAM pages ?If no then I think I will be wasting my time explaining why the optimizer went for a unordered scan and not an ordered scan.
Index Allocation Maps, yeah. Actually I'm reading the Query Tuning chapter in Inside SQL Server: T-SQL Querying by Itzik Ben-Gan right now, which covers the topic.--------------------------------------------------My blog: http://practicaltsql.net/Twitter: @RGPSoftware |
 |
|
malachi151
Posting Yak Master
152 Posts |
Posted - 2012-04-09 : 10:13:51
|
quote: Originally posted by namman Could you post the full test question, malachi151?
#1 I don't recall the exact question and #2 even if I did I wouldn't post it :pBut as far as I can tell, it seems to me that the correct answer of how to improve performance in this specific scenario would be to remove the PK (clustered index) and leave the table in a heap with a non-clustered index on it.--------------------------------------------------My blog: http://practicaltsql.net/Twitter: @RGPSoftware |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2012-04-09 : 11:09:59
|
quote: Originally posted by malachi151
quote: Originally posted by Sachin.Nand For this part of your question I need to know whether you have any knowledge on IAM pages ?If no then I think I will be wasting my time explaining why the optimizer went for a unordered scan and not an ordered scan.
Index Allocation Maps, yeah. Actually I'm reading the Query Tuning chapter in Inside SQL Server: T-SQL Querying by Itzik Ben-Gan right now, which covers the topic.--------------------------------------------------My blog: http://practicaltsql.net/Twitter: @RGPSoftware
Great...So IAM pages basically tracks the extents which are allocated to indexes at leaf level to be precise.The order of the extents in the IAM page is same as that exists in a data file.So now with a heap allocation order scan(IAM scans)is the only way to scan the table but with a clustered table there are options available linked list(as data is physically sorted in the index with the each page having a pointer to the next and previous page in the index) or allocation order scan.It maybe highly possible that the pages on the disk are sorted by its physical order.So you might get the records actually sorted though the optimizer goes for a allocation order scan.Optimizer will always go for a allocation order scan as it is going to be faster as pages will be read in order that they are written on the disk with almost zero seeks.After Monday and Tuesday even the calendar says W T F .... |
 |
|
namman
Constraint Violating Yak Guru
285 Posts |
Posted - 2012-04-09 : 21:59:11
|
#1 I don't recall the exact question and #2 even if I did I wouldn't post it :pBut as far as I can tell, it seems to me that the correct answer of how to improve performance in this specific scenario would be to remove the PK (clustered index) and leave the table in a heap with a non-clustered index on it.I ask you to post the full test question because I think the test wants to check you knowledge on a certain area that I did not see (or not clear) on your post. You don't remember. That's fine. |
 |
|
|
|
|
|
|