Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Administration
 Bulk delete and Transaction log

Author  Topic 

hari_sql
Starting Member

22 Posts

Posted - 2010-03-09 : 02:18:36
Hi all,

I need expert advice on the after effects of deleting huge amounts of data from differetn tables in a database:

Would the delete operation introduce any table locks (hope it won’t)?
Will the delete operation affect the indexing in the tables?
Will the delete operation adversely affect the mirror database?
Will the delete operation adversely affect Log Shipping?
Will there be huge transaction logs created based on the deletion process?

Thanks & Regards,
Hari.

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2010-03-09 : 05:15:05
1) yes
2) yes
3) yes
4) yes
5) yes

How big is your database and how much data are you planning to delete? Are you deleting from one table only or many different tables? There are numerous different strategies for deleting a lot of data but we need to know a little bit more to be able to give you more qualified advice.

- Lumbago
If the facts don't fit the theory, change the facts. Albert Einstein
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-03-09 : 05:27:41
See also http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=141065#553313
Go to Top of Page

hari_sql
Starting Member

22 Posts

Posted - 2010-03-09 : 06:21:23
Hi Lumbago,

Thank you very much for the quick response.
The DB size is around 32 GB now and data to be deleted is spread across multiple tables in the database. Hoping for a strategy as follows:

-- Check for condition

WHILE EXISTS ( SELECT * FROM dbo.Routes

WHERE PeriodStart < (SELECT DATEDIFF(ss, '1970-01-01 00:00:00', '2010-01-01 00:00:00.000'))

)

BEGIN

-- Set a row count value to represent a chunk of data (in our case it can be say 10000)

SET ROWCOUNT 10000

-- Actual delete operation

DELETE FROM dbo. Routes

WHERE PeriodStart < (SELECT DATEDIFF(ss, '1970-01-01 00:00:00', '2010-01-01 00:00:00.000'))

SET ROWCOUNT 0

-- Wait 2 seconds between each delete, to reduce the constant impact, ie. to spread it over a much longer time (optional)

WAITFOR DELAY '0:00:02'

END

Any better idea or expert opinion always welcome!

Thanks & Regards,
Hari.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-03-09 : 07:16:50
Stick the

(SELECT DATEDIFF(ss, '1970-01-01 00:00:00', '2010-01-01 00:00:00.000'))

in a variable, and use that.

Don't do SET ROWCOUNT nnnn ... see the link I posted. You are forcing the whole query to be repeated multiple times, which will usually be slower.
Go to Top of Page

hari_sql
Starting Member

22 Posts

Posted - 2010-03-09 : 07:42:50
Hi Kristen,

Unfortunately, I cannot stick to your idea of:
Copy rows-to-keep to new, temporary table
Drop original table
Rename temporary table to original table's name

because there are applications reading the tables where data needs to be deleted and ofcourse data is also being written into these table constantly.

And by:
Stick the
(SELECT DATEDIFF(ss, '1970-01-01 00:00:00', '2010-01-01 00:00:00.000'))
in a variable, and use that.

do you mean to include the subquery in a variable? Whats the advantage of doing so?

Thanks & Regards,
Hari.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-03-09 : 08:26:46
You can use the copy-to-table approach, as the table will be locked. But that may still not make it acceptable of course.

The post I linked to also explains a method for deleting in batches without using SET ROWCOUNT; that will be more efficient than your earlier SET ROWCOUNT example.

Put the result of the DATEDIFF into an @VARIABLE ahead of the actual query, and then use the @VARIABLE in place of the nested SELECT. The reasons for this are:

The optimiser is more likely to use any available index, and will definitely only calculate the value once (otherwise it may calculate it multiple times - at least twice, once for the EXISTS and once for the DELETE, possibly per-row in the select if it cannot determine that it is a constant value.

It also makes you code more obvious, and less prone to error during maintenance - currently you have the same code in two places, so you have to remember to change it in both places.
Go to Top of Page

hari_sql
Starting Member

22 Posts

Posted - 2010-03-09 : 09:46:01
Hi Kristen,

I have modified the statement as per your suggestion as follows (also made the number of days before which the old date needs to be deleted).

-- Get the number of days before which data needs to be deleted
DECLARE @DayCount smallint
SET @DayCount = 30

-- Find the date before which data needs to be deleted
DECLARE @ToDateTime datetime
SET @ToDateTime = DATEADD(dd, -@DayCount, GETDATE())

-- Check for condition
DECLARE @SubQuery varchar(100)
SET @SubQuery = DATEDIFF(ss, '1970-01-01 00:00:00', @ToDateTime)

-- Loop through the rows that satisfy the condition
WHILE EXISTS (SELECT * FROM dbo.Routes WHERE PeriodStart < (@SubQuery))
BEGIN
-- Set a row count value to represent a chunk of data (in our case it can be say 10000)
SET ROWCOUNT 10000
-- Actual delete operation
DELETE FROM dbo.Routes
WHERE PeriodStart < (@SubQuery)
SET ROWCOUNT 0
-- Wait 2 seconds between each delete, to reduce the constant impact, ie. to spread it over a much longer time (optional)
WAITFOR DELAY '0:00:02'
END

Thanks & Regards,
Hari.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-03-09 : 09:49:36
You are still using SET ROWCOUNT, but I suppose that's up to you ...

SET @ToDateTime = DATEADD(dd, -@DayCount, GETDATE())

will set @ToDateTime including the Current TIME, whereas you may have wanted "whole days" instead.
Go to Top of Page

hari_sql
Starting Member

22 Posts

Posted - 2010-03-09 : 10:23:00
Hi Kirsten,

"will set @ToDateTime including the Current TIME, whereas you may have wanted "whole days" instead." - your correct, how to get the whole days?

And what is the alternative to using SET ROWCOUNT, can you please put some details into this?

Again thank you very much for the reponses and follow ups!

Thanks & Regards,
Hari.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-03-09 : 11:10:01
"how to get the whole days "

SET @ToDateTime = DATEADD(Day, DATEDIFF(Day, 0, GETDATE())-@DayCount, 0)

Note the use of "Day" for first parameter instead of "dd" - less likely to cause errors - do you know if "m" Months, Minutes, Milliseconds or (in SQL 2008) Microseconds? better to use the full parameter name, not an abbreviation.

" And what is the alternative to using SET ROWCOUNT, can you please put some details into this (I couldn't make it out from your link)?"

I have provided a link to fully documented example, if you can't understand the code then I'm afraid I don't have time to explain it further, sorry. Up to you whether you consider (by testing / whatever) that your method is efficient enough, or not.
Go to Top of Page

hari_sql
Starting Member

22 Posts

Posted - 2010-03-09 : 23:22:25
Hi Kristen,

Thank you for the x-tra tip provided :)

Regards,
Hari.
Go to Top of Page
   

- Advertisement -