Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Multiple Queries Causing Long Load Time

Author  Topic 

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 11:04:55
I have a page that has a filter system on it. To create the filter, I have to select distinct values from the table that the data is pulled from, for each of the filter types.

As an example, if there were 8 fields that you could filter on, I have 8 select boxes built from 8 different "Select Distinct X From Y"

I believe this is the best way to build those filters, I may be wrong.

If this is the best way, my problem is that each of those queries takes about .3 - .5 seconds to run and return to my page, which ends up making the load time a bit long for the entire site.

There are about 11 million records right now in said table, and it will grow to about 100 million in a few months, and maybe 3 or 4 hundred million in a year or so, and then it will cease to grow. That means that .5 could turn into a full second.

Is there a way I can speed up these queries? Because of the simplicity of the query, I don't believe there are any indexes I can create to speed them up that much. I have used query analyzer and execution plans to look for suggestions, but received none.

The hardware isn't amazing, but its only a year old.

I'm not sure what information would be useful to help solve this problem, but I can provide anything needed.

Thanks in advance.

Vinnie881
Master Smack Fu Yak Hacker

1231 Posts

Posted - 2012-03-28 : 11:39:50
Can you please post a sample query to illustrate. Also, even though I think it is probably pretty self-explanitory once I see the query, try to include some wanted/expected results to help us assist.

Thanks


Success is 10% Intelligence, 70% Determination, and 22% Stupidity.
\_/ _/ _/\_/ _/\_/ _/ _/- 881
Go to Top of Page

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 11:47:59
quote:
Originally posted by Vinnie881

Can you please post a sample query to illustrate. Also, even though I think it is probably pretty self-explanitory once I see the query, try to include some wanted/expected results to help us assist.



The sample I gave was pretty accurate, but below is the exact query I use to fill out one of the Select boxes for the filter.

SELECT distinct school_name
FROM Student_Test_Item
WHERE COUNTY_DISTRICT = 'xxxxxx'

This returns anywhere from 30 - 50 rows out of around 11.5 million and takes about .3 seconds to run.

The result, obviously, would be a set of school names in a given district.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-28 : 12:17:41
i think issue is your system design.
Why do you've to go Student_Test_Item table to get school_names?
Ideally, you should be maintaining master tables for reference (dimension) attributes like these . The Student_Test_Item should just have a reference key (a fk id column) which will just specify what school student belongs to. This fk will be the primary key id value generated in the master table for school. The school master table will not have huge amount of data at any point. It will just have distinct list of schools and hence select query from school table will be quick and fast.
Suggest you to read on the concept of normalisation
With the current design, you have to fetch data from transaction table (Student_Test_Item ) for any of master reference details which will surely hurt page performance

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 14:33:51
quote:
Originally posted by visakh16

i think issue is your system design.
Why do you've to go Student_Test_Item table to get school_names?
Ideally, you should be maintaining master tables for reference (dimension) attributes like these . The Student_Test_Item should just have a reference key (a fk id column) which will just specify what school student belongs to. This fk will be the primary key id value generated in the master table for school. The school master table will not have huge amount of data at any point. It will just have distinct list of schools and hence select query from school table will be quick and fast.
Suggest you to read on the concept of normalisation
With the current design, you have to fetch data from transaction table (Student_Test_Item ) for any of master reference details which will surely hurt page performance




I considered this point. The data is not generated by our system. We are given the data by the state and it comes in the format that I query. I could create a table with just school names that references this table by foreign key, however, I would then also have to join to this table because the filter could include additional layers that add to a where clause on other fields.

I wasn't sure if it would then become more complicated and take longer still. I thought it best to ask first before building a bunch of tables. In this case I am searching school names, but there are several other fields I have to consider as well.

If you think it would be best to create a table that I join two tables instead of selecting distincts, I can definitely go that route.

Thank you for your help.
Go to Top of Page

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 14:40:04
Also, keep in mind that the master reference table could not simply be School_ID, School_name, District_ID...

It would have to include each school multiple times, for grade, year, content area, etc...

So while it would dramatically reduce the number of records, by the time I have them all done for the entire state, I would probably still be looking at around 8 or 9 million records and that number will grow every year when I add another year of data being sent from the state.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-28 : 14:55:49
quote:
Originally posted by tomrippity

Also, keep in mind that the master reference table could not simply be School_ID, School_name, District_ID...

It would have to include each school multiple times, for grade, year, content area, etc...

So while it would dramatically reduce the number of records, by the time I have them all done for the entire state, I would probably still be looking at around 8 or 9 million records and that number will grow every year when I add another year of data being sent from the state.


why should each school be repeated multiple times?
Any additional attributes like grade year, content area etc which has no direct relation to school should not be in schools table. Its something related to student's test record and should reside in your current transaction table itself ie Student_Test_Item. All other master entries like student,school,course, etc have to be represnted by master tables and should just have reference key in your transaction table.
As suggested before, please read about normalisation principles

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

Vinnie881
Master Smack Fu Yak Hacker

1231 Posts

Posted - 2012-03-28 : 15:17:29
Agree with Visakh
Table 1 (SCHOOL)
TABLE 2 (Years)
table 3 (Student_Tests)
table 4 ...

Just because the data received comes in one file, does not mean you should not parse it out properly to ensure adequate performance.


Success is 10% Intelligence, 70% Determination, and 22% Stupidity.
\_/ _/ _/\_/ _/\_/ _/ _/- 881
Go to Top of Page

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 15:17:41
visakh16:

I understand normalization, and I think you would understand my question more if you saw the scope of the application and data together, or my understanding of what should and should not be normalized is wrong.


Years, Content Area, Grade Level are all repeated many many times, just like a school name would be, even for an individual student.

Any one of those fields would have the same data repeated no less than a million times, where a school name might appear at most, 50 or 60 thousand times.
Go to Top of Page

Vinnie881
Master Smack Fu Yak Hacker

1231 Posts

Posted - 2012-03-28 : 15:22:34
If you will not change your data structure, then to answer your original question:

All you can do is proper index it possibly partition it, and that is your optimal performance. You have no joins or anything that would impact performance besides the data you are querying.





Success is 10% Intelligence, 70% Determination, and 22% Stupidity.
\_/ _/ _/\_/ _/\_/ _/ _/- 881
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-28 : 15:23:41
quote:
Originally posted by tomrippity

visakh16:

I understand normalization, and I think you would understand my question more if you saw the scope of the application and data together, or my understanding of what should and should not be normalized is wrong.


Years, Content Area, Grade Level are all repeated many many times, just like a school name would be, even for an individual student.

Any one of those fields would have the same data repeated no less than a million times, where a school name might appear at most, 50 or 60 thousand times.



thats ok. if even thats the case why should you worry?
In suggested approach, you'll just have a single entry in School,Student etc for each school,student information and they'll have an associated id value which uniquely identifies them.
In your main table ie student_test_item all records that belongs to the particular school /student will have same id repeated against them. This will avoid need of duplicating school name,student name etc for each of those records. And wherever you want to show them you dont need to pull a distinct list out of million records

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-28 : 15:25:51
how are you getting the data feeds currently? is it coming from file/ application?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 15:46:29
quote:
Originally posted by visakh16

quote:
Originally posted by tomrippity

visakh16:

I understand normalization, and I think you would understand my question more if you saw the scope of the application and data together, or my understanding of what should and should not be normalized is wrong.


Years, Content Area, Grade Level are all repeated many many times, just like a school name would be, even for an individual student.

Any one of those fields would have the same data repeated no less than a million times, where a school name might appear at most, 50 or 60 thousand times.



thats ok. if even thats the case why should you worry?
In suggested approach, you'll just have a single entry in School,Student etc for each school,student information and they'll have an associated id value which uniquely identifies them.
In your main table ie student_test_item all records that belongs to the particular school /student will have same id repeated against them. This will avoid need of duplicating school name,student name etc for each of those records. And wherever you want to show them you dont need to pull a distinct list out of million records

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/





Ok, I will work based on that information. I appreciate the input.

Also, I am wondering if there is any reason why I couldn't use the school name as the foreign key here, as opposed to creating a column with an ID and then having to update the student_test_item table every time we import data from the state for each district. That process occurs a few hundred times each semester.

Instead, couldn't I just keep a master table of School_Name, School_Code, District_Number and reference based on those pieces of information? If the performance hit were minimal, I believe there would be far less chance of problems occurring when we get new data.

Thanks again.

Go to Top of Page

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 15:47:55
quote:
Originally posted by visakh16

how are you getting the data feeds currently? is it coming from file/ application?




The files are given to us in flat text files, broken up by districts. There are about 530 districts, and each one would have a minimum of a million records and as many as 10.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-28 : 15:50:14
quote:
Originally posted by tomrippity

quote:
Originally posted by visakh16

how are you getting the data feeds currently? is it coming from file/ application?




The files are given to us in flat text files, broken up by districts. There are about 530 districts, and each one would have a minimum of a million records and as many as 10.


then your ETL process should take care of populating the master tables with reference data from files

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-28 : 15:53:59
quote:
Originally posted by tomrippity
Ok, I will work based on that information. I appreciate the input.

Also, I am wondering if there is any reason why I couldn't use the school name as the foreign key here, as opposed to creating a column with an ID and then having to update the student_test_item table every time we import data from the state for each district. That process occurs a few hundred times each semester.

Instead, couldn't I just keep a master table of School_Name, School_Code, District_Number and reference based on those pieces of information? If the performance hit were minimal, I believe there would be far less chance of problems occurring when we get new data.

Thanks again.




the main reason is to avoid update anomalies. If at all you decide to change name of school at a later point, you just need to do it in single record which is in master table rather than seraching for all millions of rows within student_test_item having this school. The id will still be the same as its same school but it just changed its name so there wont be any change required in student_test_item. changing id value in student_test_item means changing associated school itself which will not happen for retrospective records unless source feed was wrong.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 16:04:14
quote:
Originally posted by visakh16

quote:
Originally posted by tomrippity
Ok, I will work based on that information. I appreciate the input.

Also, I am wondering if there is any reason why I couldn't use the school name as the foreign key here, as opposed to creating a column with an ID and then having to update the student_test_item table every time we import data from the state for each district. That process occurs a few hundred times each semester.

Instead, couldn't I just keep a master table of School_Name, School_Code, District_Number and reference based on those pieces of information? If the performance hit were minimal, I believe there would be far less chance of problems occurring when we get new data.

Thanks again.




the main reason is to avoid update anomalies. If at all you decide to change name of school at a later point, you just need to do it in single record which is in master table rather than seraching for all millions of rows within student_test_item having this school. The id will still be the same as its same school but it just changed its name so there wont be any change required in student_test_item. changing id value in student_test_item means changing associated school itself which will not happen for retrospective records unless source feed was wrong.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/





In this case, when we get new data, they give us ALL data for that school at the same time. We completely replace data each time its given to us because the state does retroactively change information for student testing. So if the school name did change, the state would go back and change all of the old records and then send us new files which we would then use to replace all of our existing data.

So if I understand you correctly, and I am not concerned with name changes causing conflict, it would be ok to do that?
Go to Top of Page

tomrippity
Starting Member

37 Posts

Posted - 2012-03-28 : 16:05:54
quote:
Originally posted by visakh16
then your ETL process should take care of populating the master tables with reference data from files





Ok, I believe I understand where you are coming from. I will look at trying to create some processes that normalize the data for us in an efficient manner.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-28 : 16:11:08
quote:
Originally posted by tomrippity

quote:
Originally posted by visakh16

quote:
Originally posted by tomrippity
Ok, I will work based on that information. I appreciate the input.

Also, I am wondering if there is any reason why I couldn't use the school name as the foreign key here, as opposed to creating a column with an ID and then having to update the student_test_item table every time we import data from the state for each district. That process occurs a few hundred times each semester.

Instead, couldn't I just keep a master table of School_Name, School_Code, District_Number and reference based on those pieces of information? If the performance hit were minimal, I believe there would be far less chance of problems occurring when we get new data.

Thanks again.




the main reason is to avoid update anomalies. If at all you decide to change name of school at a later point, you just need to do it in single record which is in master table rather than seraching for all millions of rows within student_test_item having this school. The id will still be the same as its same school but it just changed its name so there wont be any change required in student_test_item. changing id value in student_test_item means changing associated school itself which will not happen for retrospective records unless source feed was wrong.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/





In this case, when we get new data, they give us ALL data for that school at the same time. We completely replace data each time its given to us because the state does retroactively change information for student testing. So if the school name did change, the state would go back and change all of the old records and then send us new files which we would then use to replace all of our existing data.

So if I understand you correctly, and I am not concerned with name changes causing conflict, it would be ok to do that?


ideally if school information is unchanged there's no reason to remove and again add it back. it will cause a new id value to be generated for it and then you've to cascade change to your student_test_item which would be big impact. so what you need to do is identify deltas (new data/modified data) out of file and reflect them in appropriate master tables

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page
   

- Advertisement -