Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
hari_sql
Starting Member
22 Posts |
Posted - 2010-03-09 : 02:18:36
|
Hi all,I need expert advice on the after effects of deleting huge amounts of data from differetn tables in a database:Would the delete operation introduce any table locks (hope it won’t)? Will the delete operation affect the indexing in the tables? Will the delete operation adversely affect the mirror database? Will the delete operation adversely affect Log Shipping? Will there be huge transaction logs created based on the deletion process? Thanks & Regards,Hari. |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2010-03-09 : 05:15:05
|
1) yes2) yes3) yes4) yes5) yesHow big is your database and how much data are you planning to delete? Are you deleting from one table only or many different tables? There are numerous different strategies for deleting a lot of data but we need to know a little bit more to be able to give you more qualified advice.- LumbagoIf the facts don't fit the theory, change the facts. Albert Einstein |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-03-09 : 05:27:41
|
See also http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=141065#553313 |
 |
|
hari_sql
Starting Member
22 Posts |
Posted - 2010-03-09 : 06:21:23
|
Hi Lumbago,Thank you very much for the quick response.The DB size is around 32 GB now and data to be deleted is spread across multiple tables in the database. Hoping for a strategy as follows:-- Check for conditionWHILE EXISTS ( SELECT * FROM dbo.Routes WHERE PeriodStart < (SELECT DATEDIFF(ss, '1970-01-01 00:00:00', '2010-01-01 00:00:00.000')) )BEGIN -- Set a row count value to represent a chunk of data (in our case it can be say 10000) SET ROWCOUNT 10000 -- Actual delete operation DELETE FROM dbo. Routes WHERE PeriodStart < (SELECT DATEDIFF(ss, '1970-01-01 00:00:00', '2010-01-01 00:00:00.000')) SET ROWCOUNT 0 -- Wait 2 seconds between each delete, to reduce the constant impact, ie. to spread it over a much longer time (optional) WAITFOR DELAY '0:00:02' ENDAny better idea or expert opinion always welcome!Thanks & Regards,Hari. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-03-09 : 07:16:50
|
Stick the (SELECT DATEDIFF(ss, '1970-01-01 00:00:00', '2010-01-01 00:00:00.000'))in a variable, and use that.Don't do SET ROWCOUNT nnnn ... see the link I posted. You are forcing the whole query to be repeated multiple times, which will usually be slower. |
 |
|
hari_sql
Starting Member
22 Posts |
Posted - 2010-03-09 : 07:42:50
|
Hi Kristen,Unfortunately, I cannot stick to your idea of:Copy rows-to-keep to new, temporary tableDrop original tableRename temporary table to original table's namebecause there are applications reading the tables where data needs to be deleted and ofcourse data is also being written into these table constantly.And by:Stick the (SELECT DATEDIFF(ss, '1970-01-01 00:00:00', '2010-01-01 00:00:00.000'))in a variable, and use that.do you mean to include the subquery in a variable? Whats the advantage of doing so?Thanks & Regards,Hari. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-03-09 : 08:26:46
|
You can use the copy-to-table approach, as the table will be locked. But that may still not make it acceptable of course.The post I linked to also explains a method for deleting in batches without using SET ROWCOUNT; that will be more efficient than your earlier SET ROWCOUNT example.Put the result of the DATEDIFF into an @VARIABLE ahead of the actual query, and then use the @VARIABLE in place of the nested SELECT. The reasons for this are:The optimiser is more likely to use any available index, and will definitely only calculate the value once (otherwise it may calculate it multiple times - at least twice, once for the EXISTS and once for the DELETE, possibly per-row in the select if it cannot determine that it is a constant value. It also makes you code more obvious, and less prone to error during maintenance - currently you have the same code in two places, so you have to remember to change it in both places. |
 |
|
hari_sql
Starting Member
22 Posts |
Posted - 2010-03-09 : 09:46:01
|
Hi Kristen,I have modified the statement as per your suggestion as follows (also made the number of days before which the old date needs to be deleted).-- Get the number of days before which data needs to be deletedDECLARE @DayCount smallintSET @DayCount = 30-- Find the date before which data needs to be deletedDECLARE @ToDateTime datetimeSET @ToDateTime = DATEADD(dd, -@DayCount, GETDATE())-- Check for conditionDECLARE @SubQuery varchar(100)SET @SubQuery = DATEDIFF(ss, '1970-01-01 00:00:00', @ToDateTime)-- Loop through the rows that satisfy the conditionWHILE EXISTS (SELECT * FROM dbo.Routes WHERE PeriodStart < (@SubQuery))BEGIN -- Set a row count value to represent a chunk of data (in our case it can be say 10000) SET ROWCOUNT 10000 -- Actual delete operation DELETE FROM dbo.Routes WHERE PeriodStart < (@SubQuery) SET ROWCOUNT 0 -- Wait 2 seconds between each delete, to reduce the constant impact, ie. to spread it over a much longer time (optional) WAITFOR DELAY '0:00:02'ENDThanks & Regards,Hari. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-03-09 : 09:49:36
|
You are still using SET ROWCOUNT, but I suppose that's up to you ...SET @ToDateTime = DATEADD(dd, -@DayCount, GETDATE())will set @ToDateTime including the Current TIME, whereas you may have wanted "whole days" instead. |
 |
|
hari_sql
Starting Member
22 Posts |
Posted - 2010-03-09 : 10:23:00
|
Hi Kirsten,"will set @ToDateTime including the Current TIME, whereas you may have wanted "whole days" instead." - your correct, how to get the whole days?And what is the alternative to using SET ROWCOUNT, can you please put some details into this?Again thank you very much for the reponses and follow ups!Thanks & Regards,Hari. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-03-09 : 11:10:01
|
"how to get the whole days "SET @ToDateTime = DATEADD(Day, DATEDIFF(Day, 0, GETDATE())-@DayCount, 0)Note the use of "Day" for first parameter instead of "dd" - less likely to cause errors - do you know if "m" Months, Minutes, Milliseconds or (in SQL 2008) Microseconds? better to use the full parameter name, not an abbreviation." And what is the alternative to using SET ROWCOUNT, can you please put some details into this (I couldn't make it out from your link)?"I have provided a link to fully documented example, if you can't understand the code then I'm afraid I don't have time to explain it further, sorry. Up to you whether you consider (by testing / whatever) that your method is efficient enough, or not. |
 |
|
hari_sql
Starting Member
22 Posts |
Posted - 2010-03-09 : 23:22:25
|
Hi Kristen,Thank you for the x-tra tip provided :)Regards,Hari. |
 |
|
|
|
|
|
|