Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Using parameters in aggregate functions

Author  Topic 

PeteFromOz
Starting Member

8 Posts

Posted - 2012-03-14 : 20:05:31
Hi Folks,

(if this looks familiar, I have actually posted it to one other forum, hope thats ok)

I am currently producing a report in SQL Server 2005 which needs to aggregate the days difference between two dates The dates are arrived at by calling functions that go off and scan log type tables which are HUGE (58M records and counting).

For example.

@int1 = sum (dbo.fncGetWorkingDays (dbo.fncGetSpecialDate1 (d.doc_id),dbo.fncGetSpecialDate2 (d.doc_id))),
@int2 = sum (datediff (dd,dbo.fncGetSpecialDate1 (d.doc_id),dbo.fncGetSpecialDate2(d.doc_id)))
FROM docs d
WHERE d.company=@strOrgID
AND d.date BETWEEN @dateStart AND @dateEnd


@strOrgID, @dateStart and @dateEnd are from the parent function which uses a (gasp) cursor to step through all the companies, but there are only a dozen or so.

The code for fncGetSpecialDate1 is simple in and of itself, and gopes something like this...


BEGIN
DECLARE @tblMinIn TABLE (LogIndex VARCHAR (50))

INSERT INTO @tblMinIn (LogIndex)
(SELECT Item FROM dbo.fncSplitCSV (@strMinIn, ','))

SET @dateStart =
(SELECT min (al.stamp_date)
FROM viewActionLog al
WHERE al.log_index IN (SELECT LogIndex FROM @tblMinIn)
AND al.doc_id = @strDocId)
END


The different fncGetSpecialDate functions call the above function with a specific string. The string contains the action log event codes that I am looking for. So it might be '1234,3456,4567' and I look through the action log to find the first instance of any of these events, then return the date of that event. The action log view has a id from the original object that is being logged, so if 20 events happened, then there is 20 rows with the same id... then an index code (which we are serching for) and a date (which we return). I created a view with the minimum number of rows (but still huge) and a unique clustered index on id,index,stampdate.

My problem is that the functions fncGetSpecialDate1 and fncGetSpecialDate2 are very expensive as they are the ones that go off to trawl the huge log files (of which I have made an indexed view but are still >10M records). How can I restructure the above SELECT statement so I only have to call the functions once as inputs for both the other functions?

Any advice on the aggregation or just plain optimisation strategy much appreciated.

Cheers,

Pete.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-15 : 16:20:38
I think issue is basically effort spend on parsing the delimited '1234,3456,4567' value and then comparing each and every values and getting a date corresponding to first occurance of them
One suggestion would be separate this part and prepopulate values to table with date against doc_id and then in above code just include a simple join to the earlier table to get date. That would save some amount of effort in calling this udf and doing this parsing inline for large dataset

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

PeteFromOz
Starting Member

8 Posts

Posted - 2012-03-16 : 22:52:01
Thanks visakh,

This is what I came up with... basically hard coding the event codes as they are pretty set anyway, and there are only a few sets I need to query (I guess it was just my OO training that led be down the path of making a generic, reusable function )


CREATE FUNCTION dbo.fncGetSLATime (@strDocId VARCHAR (32),
@dateStartDate DATETIME,--if I already know one of these dates then pass it in
@dateEndDate DATETIME)
RETURNS INT
AS
BEGIN
DECLARE @intSLADays INT
DECLARE @dateMaxIn DATETIME
DECLARE @dateMaxOut DATETIME
DECLARE @tblIn TABLE (Log_Index INT)

INSERT INTO @tblIn (log_index)
(SELECT 6012
UNION
SELECT 5013
UNION
SELECT 10507
UNION
SELECT 1008
UNION
SELECT 6007
UNION
SELECT 5006)

IF @dateStartDate IS NULL
SET @dateMaxIn =
(SELECT max (al.stamp_date)
FROM dbo.viewReporting al
INNER JOIN
@tblIn ti
ON al.log_index = ti.log_index
WHERE al.doc_id = @strDocId)
ELSE
SET @dateMaxIn = @dateStartDate


IF @dateEndDate IS NULL
SET @dateMaxout =
(SELECT max (al.stamp_date)
FROM dbo.viewReporting al
WHERE al.doc_id = @strDocId AND al.log_index = 10000)
ELSE
SET @dateMaxIn = @dateStartDate

SET @intSLADays = dbo.fncGetWorkingDays (@dateStartDate, @dateEndDate)

RETURN @intSLADays
END


Is that what you had in mind? I think it should be more efficient (?)

Thanks again,

Cheers,

Pete.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-16 : 23:00:38
i would have put the values in a table in main code itself rather than using a separate udf

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

PeteFromOz
Starting Member

8 Posts

Posted - 2012-03-17 : 19:39:37
Hi visakh,

Sorry, i am confused now... is there a significant overhead just calling a udf? I have to do this several times in my main code as i am aggrigatng (see the first bit of code in the original post), so i really only did this to make things neater, but my main issue now is definately performance.

any inights much apreciated

Cheers,
Pete.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-18 : 10:49:04
is there a significant overhead just calling a udf?

there's a significant impact. why not create a permanent table and populate it first with required data one time. then use it across

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page
   

- Advertisement -