Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
PeteFromOz
Starting Member
8 Posts |
Posted - 2012-03-16 : 22:57:48
|
Hi There,I am working with a huge table and am hoping to get some optimisation tips.The table is basically an event log from a system I am working with. It looks like thisCREATE TABLE [dbo].[action_log]( [doc_id] [nvarchar](64) COLLATE Latin1_General_CI_AS NOT NULL, [stamp_uid] [nvarchar](60) COLLATE Latin1_General_CI_AS NOT NULL, [stamp_date] [datetime] NOT NULL, [log_index] [int] NOT NULL, [log_comment] [nvarchar](2000) COLLATE Latin1_General_CI_AS NULL) ON [PRIMARY] And is about 58 Million Records long!!I am wrining reports that work out average times between certain events, as given by the log_index, for particular objects referred to by the doc_id.See my earlier post http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=172483The table is not mine, so its tricky (but not impossible) to do something with it. Instead, I went down the indexed view pathI came up with… CREATE VIEW [dbo].[viewReporting] WITH SCHEMABINDINGAS SELECT al.doc_id, al.stamp_uid, al.stamp_date, al.log_index, al.log_comment FROM dbo.action_log al INNER JOIN dbo.docs d ON al.doc_id = d.doc_id WHERE d.entry_date > CONVERT (DATETIME, '2012-02-01 00:00:00', 121) AND d.entry_date < CONVERT (DATETIME, '2012-02-29 23:59:59', 121) AND al.log_index IN (15000, 10000, 1020, 1001, 1008, 1018, 10503, 10507, 6012, 5013, 10507, 1008, 6007, 5006, 6018, 5019, 6019, 5020, 11001) This basically selects only those log records pertaining to the doc_id’s that I am interested in, based on an entry_date range and also only the events that I am interested in, based on the log_index.The dates will eventually be dynamic, so I will have to come up with some trickiness for that too.I then created a clustered index as followsCREATE UNIQUE CLUSTERED INDEX zz_indRepView ON dbo.viewReporting (Doc_ID, log_index, stamp_date) Which is the main thing I am not that sure about :(Any help, suggestions, discussion or questions most appreciated!Thanks heaps,Cheers,Pete. |
|
Kristen
Test
22859 Posts |
Posted - 2012-03-19 : 05:20:21
|
It may make no difference but the optimise might get on better with an unambiguous constant for the dates: WHERE d.entry_date > '20120201' AND d.entry_date < '20120229' Note: No hyphens in the date strings.Are you sure you want > rather than >= for the start? and < '20120229' rather than < '20120301' for the end? |
 |
|
PeteFromOz
Starting Member
8 Posts |
Posted - 2012-03-20 : 00:07:13
|
Hi Kristen,Thanks for the reply... I actually did have date strings intitally ('1 feb 2012' and '1 mar 2012') but I kept -seemingly randomly- getting some of the March objects in the result(?) The dates I am using do have a time component, so I figured I needed to specify it in the where clause. With that in mind, do you see another way to make sure I only get the discrete dates I am asking for?I understand that SS 2008 actually has a 'date' type - who hoo!Thanks again,Cheers,Pete |
 |
|
Kristen
Test
22859 Posts |
Posted - 2012-03-20 : 04:55:12
|
"('1 feb 2012' and '1 mar 2012')"If you use "string" dates you need to use an unambiguous form, eitheryyyymmddyyyymmdd hh:mm:ss.sssyyyy-mm-ddThh:mm:ss.sss(Only use punctuation exactly as per those examples)any other form is subject to SQL parser trying to "guess" what the date is, and is ambiguous. SQL will use information from Server locale configuration, settings for Country/Language for the user who is logged on, and so on ... so is subject to being interpreted differently if any of those criteria change. I think it would be better they have never built in that flexibility 'coz the ambiguity catches people out ... so the solution is to only use those "unambiguous" formats for string dates, and SQL will never change how it interprets them.That probably isn't the cause of your problem though.A date '20121231' has an inherent date as well (midnight) so you don't need to specify a time part if you want 00:00:00.You will get March objects if you say <= '20120301' and there is a record with a date of 1st March and time of 00:00:00Hence why I suggested using the range >= '20120201' AND < '20120301' to just get FebruaryI don't think this will help your performance issue though (it might help the optimiser, because if you specify CONVERT, or any other function call, in the WHERE clause it is possible that the Optimise will fail to spot that it can use an index, but I think there is an equal chance that the optimiser will be smart enough to sort out a simple CONVERT. However, IMHO I think it would make the code better to use an unambiguous date and no CONVERT.For optimisation try putting an index on docs.entry_dateaction_log.log_indexaction_log.doc_id and docs.doc_idand see if they make a difference. You could also (before changing the indexes) run the query in SSMS and see what indexes it suggests (using the Trace) |
 |
|
PeteFromOz
Starting Member
8 Posts |
Posted - 2012-03-20 : 20:00:25
|
Hi Kristen,Thanks heaps for that, it explains a lot actually. When I was usind the string date, I *was* getting some flakey results when comparing results from separate queries on the same data i.e. total record counts. I hadn't tried to find the reason yet, as I had bigger problems (performance!). One question though, if I use d.entry_date >= '20120201' then that is still a string and without using the 'convert' to get an explicit conversion, I get an implict conversion and hence am unable to index the view. Changing it to CONVERT(DATETIME,'20120201',112) works, but is that still ambiguous? Also, since posting this question, I have solved the performance issues too! Turns out that it was not the huge view at all, but lots of static udf's which I had no idea would cause such an issue. If you are interested, here's the thread. http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=172678 Thanks again for all your replies, very much appreciated. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2012-03-21 : 18:32:22
|
"but is that still ambiguous?"No, definitely not ambiguous (I presume "112" is "yyyymmdd" - I haven't checked )But it may force the optimiser to use a less efficient route. That's something you can check though, so fingers-crossed it has no impact."lots of static udf's which I had no idea would cause such an issue."I've hit that same rock, in the past |
 |
|
|
|
|
|
|