Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Optimizing Huge View

Author  Topic 

PeteFromOz
Starting Member

8 Posts

Posted - 2012-03-16 : 22:57:48
Hi There,
I am working with a huge table and am hoping to get some optimisation tips.
The table is basically an event log from a system I am working with. It looks like this

CREATE TABLE [dbo].[action_log](
[doc_id] [nvarchar](64) COLLATE Latin1_General_CI_AS NOT NULL,
[stamp_uid] [nvarchar](60) COLLATE Latin1_General_CI_AS NOT NULL,
[stamp_date] [datetime] NOT NULL,
[log_index] [int] NOT NULL,
[log_comment] [nvarchar](2000) COLLATE Latin1_General_CI_AS NULL
) ON [PRIMARY]

And is about 58 Million Records long!!
I am wrining reports that work out average times between certain events, as given by the log_index, for particular objects referred to by the doc_id.
See my earlier post
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=172483

The table is not mine, so its tricky (but not impossible) to do something with it. Instead, I went down the indexed view path
I came up with…

CREATE VIEW [dbo].[viewReporting] WITH
SCHEMABINDING
AS
SELECT al.doc_id,
al.stamp_uid,
al.stamp_date,
al.log_index,
al.log_comment
FROM dbo.action_log al INNER JOIN dbo.docs d ON al.doc_id = d.doc_id
WHERE d.entry_date > CONVERT (DATETIME, '2012-02-01 00:00:00', 121)
AND d.entry_date < CONVERT (DATETIME, '2012-02-29 23:59:59', 121)
AND al.log_index IN
(15000,
10000,
1020,
1001,
1008,
1018,
10503,
10507,
6012,
5013,
10507,
1008,
6007,
5006,
6018,
5019,
6019,
5020,
11001)


This basically selects only those log records pertaining to the doc_id’s that I am interested in, based on an entry_date range and also only the events that I am interested in, based on the log_index.

The dates will eventually be dynamic, so I will have to come up with some trickiness for that too.

I then created a clustered index as follows


CREATE UNIQUE CLUSTERED INDEX zz_indRepView
ON dbo.viewReporting (Doc_ID, log_index, stamp_date)

Which is the main thing I am not that sure about :(

Any help, suggestions, discussion or questions most appreciated!
Thanks heaps,

Cheers,
Pete.





Kristen
Test

22859 Posts

Posted - 2012-03-19 : 05:20:21
It may make no difference but the optimise might get on better with an unambiguous constant for the dates:

WHERE d.entry_date > '20120201'
AND d.entry_date < '20120229'

Note: No hyphens in the date strings.

Are you sure you want > rather than >= for the start? and < '20120229' rather than < '20120301' for the end?
Go to Top of Page

PeteFromOz
Starting Member

8 Posts

Posted - 2012-03-20 : 00:07:13
Hi Kristen,

Thanks for the reply... I actually did have date strings intitally ('1 feb 2012' and '1 mar 2012') but I kept -seemingly randomly- getting some of the March objects in the result(?) The dates I am using do have a time component, so I figured I needed to specify it in the where clause. With that in mind, do you see another way to make sure I only get the discrete dates I am asking for?
I understand that SS 2008 actually has a 'date' type - who hoo!

Thanks again,
Cheers,

Pete
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2012-03-20 : 04:55:12
"('1 feb 2012' and '1 mar 2012')"

If you use "string" dates you need to use an unambiguous form, either

yyyymmdd
yyyymmdd hh:mm:ss.sss
yyyy-mm-ddThh:mm:ss.sss

(Only use punctuation exactly as per those examples)

any other form is subject to SQL parser trying to "guess" what the date is, and is ambiguous. SQL will use information from Server locale configuration, settings for Country/Language for the user who is logged on, and so on ... so is subject to being interpreted differently if any of those criteria change. I think it would be better they have never built in that flexibility 'coz the ambiguity catches people out ... so the solution is to only use those "unambiguous" formats for string dates, and SQL will never change how it interprets them.

That probably isn't the cause of your problem though.

A date '20121231' has an inherent date as well (midnight) so you don't need to specify a time part if you want 00:00:00.

You will get March objects if you say <= '20120301' and there is a record with a date of 1st March and time of 00:00:00

Hence why I suggested using the range >= '20120201' AND < '20120301' to just get February

I don't think this will help your performance issue though (it might help the optimiser, because if you specify CONVERT, or any other function call, in the WHERE clause it is possible that the Optimise will fail to spot that it can use an index, but I think there is an equal chance that the optimiser will be smart enough to sort out a simple CONVERT. However, IMHO I think it would make the code better to use an unambiguous date and no CONVERT.

For optimisation try putting an index on

docs.entry_date
action_log.log_index
action_log.doc_id and docs.doc_id

and see if they make a difference. You could also (before changing the indexes) run the query in SSMS and see what indexes it suggests (using the Trace)
Go to Top of Page

PeteFromOz
Starting Member

8 Posts

Posted - 2012-03-20 : 20:00:25
Hi Kristen,

Thanks heaps for that, it explains a lot actually. When I was usind the string date, I *was* getting some flakey results when comparing results from separate queries on the same data i.e. total record counts. I hadn't tried to find the reason yet, as I had bigger problems (performance!). One question though, if I use d.entry_date >= '20120201' then that is still a string and without using the 'convert' to get an explicit conversion, I get an implict conversion and hence am unable to index the view. Changing it to
CONVERT(DATETIME,'20120201',112)
works, but is that still ambiguous?

Also, since posting this question, I have solved the performance issues too! Turns out that it was not the huge view at all, but lots of static udf's which I had no idea would cause such an issue. If you are interested, here's the thread. http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=172678

Thanks again for all your replies, very much appreciated.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2012-03-21 : 18:32:22
"but is that still ambiguous?"

No, definitely not ambiguous (I presume "112" is "yyyymmdd" - I haven't checked )

But it may force the optimiser to use a less efficient route. That's something you can check though, so fingers-crossed it has no impact.

"lots of static udf's which I had no idea would cause such an issue."

I've hit that same rock, in the past
Go to Top of Page
   

- Advertisement -