Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 SQL Averages

Author  Topic 

JohnL
Starting Member

3 Posts

Posted - 2010-03-25 : 11:16:16
Is there any way to calculate averages based on distinct values from another field.

I have some OLAP style data in SQL-Server 2005 that looks something like this when de-normalised :-

Patient ID   Rx ID     Product    Rx Action   Diagnosis                   Doctor Specialty    Exit Dose
----------   -----     -------    ---------   ---------                   ----------------    ---------
690529       1317203   acarbose   Repeat      Type II diabetes mellitus   Diabetes            NULL
690529       1317203   acarbose   Repeat      Type II diabetes mellitus   Endocrinology       NULL
703972       1347083   acarbose   Repeat      Type II diabetes mellitus   NULL                100
707196       1353727   acarbose   Repeat      Type II diabetes mellitus   NULL                200
714561       1370887   acarbose   Repeat      Diabetes                    Diabetes            100
714561       1370887   acarbose   Repeat      Diabetes                    Elderly medicine    100
714561       1370887   acarbose   Repeat      Diabetes                    Stroke              100
714561       1370887   acarbose   Repeat      Obesity                     Diabetes            100
714561       1370887   acarbose   Repeat      Obesity                     Elderly medicine    100
714561       1370887   acarbose   Repeat      Obesity                     Stroke              100
714561       1370887   acarbose   Repeat      Hypertension                Diabetes            100
714561       1370887   acarbose   Repeat      Hypertension                Elderly medicine    100
714561       1370887   acarbose   Repeat      Hypertension                Stroke              100
714561       1370887   acarbose   Repeat      Ischaemic heart disease     Diabetes            100
714561       1370887   acarbose   Repeat      Ischaemic heart disease     Elderly medicine    100
714561       1370887   acarbose   Repeat      Ischaemic heart disease     Stroke              100
714561       1370887   acarbose   Repeat      Stroke                      Diabetes            100
714561       1370887   acarbose   Repeat      Stroke                      Elderly medicine    100
714561       1370887   acarbose   Repeat      Stroke                      Stroke              100
714561       1370887   acarbose   Repeat      Hyperlipidaemia             Diabetes            100
714561       1370887   acarbose   Repeat      Hyperlipidaemia             Elderly medicine    100
714561       1370887   acarbose   Repeat      Hyperlipidaemia             Stroke              100


My application is generating dynamic T-SQL to analyse this data with the required groupings and sub-totals
e.g.

select
  Product,
  'Total Specialty' as Specialty,
  'Total Diagnosis' as Diagnosis,
  count(distinct PatientID ) as [Patient Count],
  count( distinct RxID ) as [Rx Count],
  Avg( ExitDose ) as [Avg. Dose]
from #Temp
group by Product
union all
select
  Product,
  '  ' + Coalesce( DoctorSpecialty, '-' ),
  'Total Diagnosis' as Diagnosis,
  count(distinct PatientID ),
  count( distinct RxID ),
  Avg( ExitDose )
from #Temp
group by Product, '  ' + Coalesce( DoctorSpecialty, '-' )

returns

Product    Specialty            Diagnosis         Patient Count   Rx Count   Avg. Dose
-------    ---------            ---------         -------------   --------   ---------
acarbose   Total Specialty      Total Diagnosis   4               4          105
acarbose     -                  Total Diagnosis   2               2          150
acarbose     Diabetes           Total Diagnosis   2               2          100
acarbose     Elderly medicine   Total Diagnosis   1               1          100
acarbose     Endocrinology      Total Diagnosis   1               1          NULL
acarbose     Stroke             Total Diagnosis   1               1          100


All looks good? Actually no.

The patient count and Rx count figures are correct, but the average dose figure is wrong (especially at the total speciality level - where the value should be 133.33). This is because the last 18 records are not actually different values at all, but are actually for the same prescription that is just duplicated in the data for different diagnosis / doctor speciality values - which is skewing the average calculation.

I would like to calculate the average by taking only one value for each distinct Rx ID value - is there any simple way to do this?

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 2010-03-25 : 11:33:48
Where did the 133.33 figure come from? In the query that generates the Total Specialty record, you're averaging the entire table. 20 Values, total of 2100. 2100/20 = 105.

There are 10 types of people in the world, those that understand binary, and those that don't.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-03-25 : 11:37:43
how do you populate #Temp? can you show its query?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

JohnL
Starting Member

3 Posts

Posted - 2010-03-25 : 11:57:26
The 133.33 figure comes from the fact that there are 4 distinct Rx ID values in the data

1317203 - dose = null
1347083 - dose = 100
1353727 - dose = 200
1370887 - dose = 100

so what I need is the average of ( null, 100, 200, 100 ) = 400 / 3 = 133.33

I know this won't be produced by the SQL statement I provided - I just need to know if there is any way to actually do this - calculate average value from all this data just taking 1 row for each distinct Rx ID value..

The #Temp table is just to do proof of concept. The actual data comes from a view in the OLAP database.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-03-25 : 12:06:44
ok then show actual query for view please

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

JohnL
Starting Member

3 Posts

Posted - 2010-03-25 : 12:18:08
The source of the data isn't important to solving my problem.

Just assume that you have a single table with the following structure

create table #Temp
( PatientID int,
RxID int,
Product varchar(max),
RxAction varchar(max),
Diagnosis varchar(max),
DoctorSpecialty varchar(max),
ExitDose float
)

populated with the data at the top of this topic.

I want to calculate an average of ExitDose values, but only take one record for each distinct RxID value. Is there any way to do this?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-03-25 : 12:20:47
nope its important for getting distinct value. Would you please post it as requested?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-03-25 : 12:21:09
or atleast suggest a unique valued column in your view?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 2010-03-25 : 12:24:46
You might be looking for something like this:

select
t1.Product,
'Total Specialty' as Specialty,
'Total Diagnosis' as Diagnosis,
count(distinct t1.PatientID ) as [Patient Count],
count( distinct t1.RxID ) as [Rx Count],
Avg( t2.ExitDose ) as [Avg. Dose]
from #Temp t1
INNER JOIN ( SELECT x.Product, AVG(x.ExitDose ) as ExitDose
FROM ( SELECT DISTINCT t.Product, t.RxID, CAST (t.ExitDose AS DECIMAL(8,2)) AS ExitDose
FROM #temp t) x
GROUP BY x.Product ) t2
ON t1.Product = t2.Product
GROUP BY t1.Product

union all
...


The CAST will prevent that trailing decimal places from being truncated, and ExitDose appears to be an integer.

There are 10 types of people in the world, those that understand binary, and those that don't.

EDIT: Forgot the final group by.
Go to Top of Page
   

- Advertisement -