Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 formula for mode calculation

Author  Topic 

zion99
Posting Yak Master

141 Posts

Posted - 2010-04-30 : 04:57:57
Hi,

I am using the foll. formula for Mode calculation. Can anybody plz. suggest me a better & faster implementation for Mode calculation.

actual staging table contains 10 million records.


declare @StagingTable TABLE
(
Col1 varchar(64),
Col2 varchar(64),
Total_Size numeric(10,0)
)

INSERT into @StagingTable (Col1,Col2,Total_Size)
select 'S1','C:',600 union all
select 'S1','C:',605 union all
select 'S1','C:',605 union all
select 'S1','C:',605 union all
select 'S1','C:',602 union all
select 'S1','C:',602 union all
select 'S1','C:',601 union all

select 'S2','C:',601 union all
select 'S2','C:',601 union all
select 'S2','C:',605 union all
select 'S2','C:',605 union all
select 'S2','C:',605 union all
select 'S2','C:',601 union all
select 'S2','C:',602 union all
select 'S2','C:',601 union all

select 'S3','D:',605 union all
select 'S3','D:',605 union all
select 'S3','D:',600 union all
select 'S3','D:',600 union all
select 'S3','D:',600 union all
select 'S3','D:',602 union all
select 'S3','D:',602 union all
select 'S3','D:',602 union all
select 'S3','D:',602


;WITH CTE ( Col1, Col2, [Total_Size], [Freq_TotalSize] )
AS
(
SELECT d.Col1 as Server_Name
,Col2
,Total_Size
,COUNT(*) as Freq_TotalSize
from @StagingTable d
group by
d.Col1
,Col2
,Total_Size
)

SELECT m1.Col1,
m1.Col2,
m1.ModeCount,
m2.[Total_Size]
FROM (
SELECT Col1, Col2,
MAX( [Freq_TotalSize] ) as ModeCount
FROM CTE
GROUP BY Col1, Col2
) as m1
JOIN CTE as m2
ON m2.Col1 = m1.Col1
AND m2.Col2 = m1.Col2
AND m2.[Freq_TotalSize] = m1.ModeCount
ORDER BY m1.Col1, m1.Col2


visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-05-01 : 01:18:34
try this too:-

SELECT Col1,Col2,Occur,Total_Size
FROM(
SELECT DISTINCT DENSE_RANK() OVER (PARTITION BY Col1,Col2 ORDER BY Occur DESC) AS Rnk,*
FROM
(
SELECT COUNT(1) OVER (PARTITION BY Col1,Col2,Total_Size) AS Occur,*
FROM @StagingTable
)t
)r
WHERE Rnk=1


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2010-05-03 : 00:05:17
Hi visakh16, thanks :)
unfortunately, this query takes much more time...

will try to make some changes in your suggested query...
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-05-03 : 01:30:57
oh ok

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-05-03 : 04:20:37
Try this
SELECT		Col1,
Col2,
ModeCount,
Total_Size
FROM (
SELECT Col1,
Col2,
Total_Size,
ModeCount,
RANK() OVER (PARTITION BY Col1, Col2 ORDER BY ModeCount DESC) AS recID
FROM (
SELECT Col1,
Col2,
Total_Size,
COUNT(*) AS ModeCount
FROM @StagingTable
GROUP BY Col1,
Col2,
Total_Size
) AS d
) AS d
WHERE recID = 1
ORDER BY Col1,
Col2



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-05-03 : 04:22:45
[code]-- Peso
Table '@StagingTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

-- OP
Table '@StagingTable'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0.

-- Visakh16
Table 'Worktable'. Scan count 3, logical reads 89, physical reads 0, read-ahead reads 0.
Table '@StagingTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.[/code]


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2010-05-03 : 07:24:44
thanks Peso :)

sorry for sounding dumb, but did u calculate the logical & physical reads manually from the execution plan or is this information available through some command ?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-05-03 : 07:34:10
It is available with
"SET STATISTICS IO ON" / "SET STATISTICS IO OFF"

How you tested my suggestion yet?
In terms of seconds, how long time did your query take before, and how long time did it take with my suggestion?


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2010-05-04 : 03:01:41
thanks have implemented your suggestion, the execution plans says it all...

initial query takes 10 seconds... this one takes 6 seconds
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-05-04 : 05:21:51
Only 40% faster?



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-05-04 : 05:23:00
Do you also have an index over columns {Col1, Col2, Total_Size} ?



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2010-05-04 : 07:01:40
No index on staging table.

It has around 10 million records... with index in place, updates to 2 columns takes around 4 minutes, without index, update is done in a minute. hence have avoided indexing...
Go to Top of Page
   

- Advertisement -