Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 BCP performance optimisation

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2004-11-17 : 15:55:47
Are there particular settings that favour BCP?

I attempted to pull 20 million rows, or so, into a table the other day.

It took an hour or so to import. There was no change to MyDatabase MDF/LDF during that time, but I then realised that TEMPDB had grown instead.

Then there was another hour of activity during which neither TEMPDB nor MyDatabase MDF/LDF changed.

Then I got a [duplicate] Primary Key error :-)

MyDatabase was set to SIMPLE recovery model.

Apart from the error<g>, could I have improved things by setting MyDatabase recovery model to "Bulk Logged" - or even "Full"?

I mean, would the data have gone into the database direct? - rather than into TEMPDB and then "Copy" to MyDatabase (which has got to be hardwork, no?)

Could I do something to cause BCP to commit batches in, say, 10,000 row blocks?

Anything else which speeds up the process?

Thanks

Kristen

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-11-17 : 16:03:55
First, BULK INSERT is preferrable for imports for performance.

Second, you could modify the batchsize switch to import in batches rather than all at once. So if you fail on the 40th batch, all the other batches will have been committed. That makes it easier if you need to do it again.

If you can get away with SIMPLE recovery model, then it's best to use that.

Tara
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-11-17 : 16:16:03
Can you clarify that please Tara:

For best import performance am I better off with BULK INERT or SIMPLE? (I don't need any TLogs, but if BULK INSERT is faster I'd be happy to turn them on during the job)

Thanks

Kristen
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-17 : 16:18:54
quote:
Originally posted by tduggan

First, BULK INSERT is preferrable for imports for performance.



Really?

quote:

Data can also be transferred into a SQL Server table from a data file using the BULK INSERT statement. However, the BULK INSERT statement cannot bulk copy data from an instance of SQL Server to a data file. The BULK INSERT statement allows you to bulk copy data to an instance of SQL Server using the functionality of the bcp utility with a Transact-SQL statement, rather than from the command prompt.

It is also possible to write programs to bulk copy SQL Server data to or from a data file using the bulk copy API. The bulk copy API can be used in ODBC, OLE DB, SQL-DMO, and DB-Library-based applications.



Plus bcp is a minimally logged operation...(unless you tell it to fire triggers)...

Ther's some good stuff in bol...don't know why tempdb is blowing out...must be doing read aheads....

20 million shouldn't be to much...



Brett

8-)
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-17 : 16:24:03
Still haven't found anything about tempdb

BUT

Does your table have any constraints or indexes?

I would take them all off and rebuild them after.

Also, I'm reading about something that looks pretty cool...

Loading Data in parallel..sweet....



Brett

8-)
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-11-17 : 16:24:51
Not recovery models, but BULK INSERT command instead of bcp.exe.

I'm looking in BOL right now to find that statement where it says BULK INSERT is fastest for loading data from a file into a table. Maybe I saw it on MSDN.

Tara
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-17 : 16:27:49
OK...I'm confused

quote:

Logged and Minimally Logged Bulk Copy Operations
When using the full recovery model, all row-insert operations performed by bcp are logged in the transaction log. For large data loads, this can cause the transaction log to fill rapidly. To help prevent the transaction log from running out of space, a minimally logged bulk copy can be performed if all of these conditions are met:

The recovery model is simple or bulk-logged.


The target table is not being replicated.


The target table does not have any triggers.


The target table has either 0 rows or no indexes.


The TABLOCK hint is specified. For more information, see Controlling the Locking Behavior.
Any bulk copy into an instance of Microsoft® SQL Server™ that does not meet these conditions is logged.

Before doing bulk copy operations, it is recommended that you set the recovery model to bulk-logged if you usually use full recovery. This will prevent the bulk copy operations from using excessive log space and possibly filling the log. However, even with bulk-logged recovery, some transaction log space will be used. You may want to create transaction log backups during the bulk copy operation to free up transaction log space.

When bulk copying a large number of rows into a table with indexes, it can be faster to drop all the indexes, perform the bulk copy, and re-create the indexes. For more information, see Optimizing Bulk Copy Performance.



Note Although data insertions are not logged in the transaction log when a minimally logged bulk copy is performed, SQL Server still logs extent allocations each time a new extent is allocated to the table.





Brett

8-)
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-17 : 16:29:01
quote:
Originally posted by tduggan

Not recovery models, but BULK INSERT command instead of bcp.exe.

I'm looking in BOL right now to find that statement where it says BULK INSERT is fastest for loading data from a file into a table. Maybe I saw it on MSDN.

Tara



Doesn't it say that they use the same APIs?



Brett

8-)
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-11-17 : 16:29:11
I can't find it anywhere. All I can find is that BULK INSERT uses the same functionality as bcp.exe except the advantange of BULK INSERT is that you can do it from T-SQL. Now where did I hear about BULK INSERT being best for imports? Can't remember.

Tara
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-11-17 : 16:30:26
What part from your quote are you confused about?

Tara
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-11-17 : 16:53:43
I clearly shouldn't have left this thread unattended for 15 minutes ...

Kristen
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2004-11-17 : 17:56:26
AFAIK, BCP uses ODBC transport while BULK INSERT uses TDS (Tabular Data Stream). TDS is faster than ODBC...


DavidM

"Always pre-heat the oven"
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-11-17 : 18:46:17
I'm with David M on this one. I remember reading the same thing.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-11-17 : 18:54:45
From BOL:
quote:
The bcp command prompt utility copies Microsoft® SQL Server™ data to or from a data file. It is used most frequently to transfer large volumes of data into a SQL Server table from another program, usually another database management system (DBMS). The data is first exported from the source program to a data file, and then imported from the data file into a SQL Server table using bcp. Alternatively, bcp can be used to transfer data from a SQL Server table to a data file for use in other programs. For example, the data can be copied from an instance of SQL Server into a data file. From there, another program can import the data.

Note The bcp utility is written using the ODBC bulk copy application programming interface (API). Earlier versions of the bcp utility were written using the DB-Library bulk copy API.

Data can also be transferred into a SQL Server table from a data file using the BULK INSERT statement. However, the BULK INSERT statement cannot bulk copy data from an instance of SQL Server to a data file. The BULK INSERT statement allows you to bulk copy data to an instance of SQL Server using the functionality of the bcp utility with a Transact-SQL statement, rather than from the command prompt.

It is also possible to write programs to bulk copy SQL Server data to or from a data file using the bulk copy API. The bulk copy API can be used in ODBC, OLE DB, SQL-DMO, and DB-Library-based applications.
I vaguely remember reading that BULK INSERT uses OLE-DB, but I'm probably wrong. TDS is the fastest data access method for SQL Server. SQL Server 2005 will introduce the SQL Native Client (which uses TDS I believe) and will essentially replace OLE DB and ODBC for SQL Server.

You should check your database settings, if "select into/bulk copy" is not on, it may be the reason tempdb is used. Setting the recovery model alone may not be enough. I don't know for certain, but I'd suggest turning that setting on, it won't hurt anything anyway.

Additionally, if you have non-clustered indexes, you should drop them, perform the bcp/BULK INSERT, then add them back. This could also be why tempdb is growing; it's rebuilding indexes. Also use the TABLOCK and ORDER hints for bcp/BULK INSERT. You CAN drop the clustered index too, you might get even more speed, but it renders the ORDER hint meaningless. If you do drop the clustered index, make sure you recreate it first before you recreate the non-clustered indexes. All of these hints are described in Books Online under "bcp" and "BULK INSERT".

If you want an idea of potential speed, I have imported 55 million rows of data (1.8 GB) into a table in 7 minutes using bcp, using the TABLOCK and ORDER hints. Tempdb did not grow, nor did the log. I haven't tried BULK INSERT yet, but I don't really need to. I only keep the clustered index; building the 2-3 nonclustered indexes afterwards takes about 5-7 more minutes. This is using a native format file, a text file might take longer.

Don't forget that bcp also exports data, and it's just as fast or faster than importing.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-11-18 : 00:57:25
7 minutes?

My data was basically stuff I wanted to insert into an Archive database, taken from a Production database - exported with BCP using Native format.

Off to investigate ... thanks for theinfo Rob

Kristen
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-11-18 : 01:37:27
The import table only has a [CLUSTERED] PK - an INT which has the IDENTITY property - perhaps that's the problem?

Here's my Export:

bcp MyProdDB.dbo.MyTable out MyTable.bcp -e MyTable-x.out -n -S MyServer -E -T

and the Import:

bcp MyArchiveDB.dbo.MyTable in MyTable.bcp -e MyTable-i.out -n -S MyServer -E -T

Database attributes (I haven't changed to Bulk Logged yet)

Attribute Value
----------------------------- -----------------------------------
Database MyArchiveDB
x1 N ("bits" from master..sysdatabases.status)
x2 N
Bulkcopy N
TruncLog Y
NoChkpt Y
x32 N
x64 N
x128 N
x256 N
x512 N
ReadOnly N
DBOOnly N
SingleUse N
x8192 N
x16384 N
x32768 N
x65536 N

IsAnsiNullDefault
IsAnsiNullsEnabled
IsAnsiPaddingEnabled
IsAnsiWarningsEnabled
IsArithmeticAbortEnabled
IsAutoClose
IsAutoCreateStatistics Yes
IsAutoShrink
IsAutoUpdateStatistics Yes
IsCloseCursorsOnCommitEnabled
IsFulltextEnabled
IsInStandBy
IsLocalCursorsDefault
IsMergePublished
IsNullConcat
IsNumericRoundAbortEnabled
IsQuotedIdentifiersEnabled
IsRecursiveTriggersEnabled
IsSubscribed
IsTornPageDetectionEnabled Yes
Collation SQL_Latin1_General_CP1_CI_AS
Recovery SIMPLE
SQLSortOrder 52
Status ONLINE
Updateability READ_WRITE
UserAccess MULTI_USER
Version 539

Kristen
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-11-18 : 02:00:37
OK, so I should change by IMPORT to be

bcp MyArchiveDB.dbo.MyTable in MyTable.bcp -e MyTable-i.out -n -S MyServer -E -T -h "ORDER (MyPKColumn ASC), TABLOCK"

or preferably use BULK IMPORT instead

How do I sort the EXPORT? Will BCP automatically use the PK, or do I have to do a query
"SELECT * FROM MyTable ORDER BY MyPKColumn" ?

if so does the query slow the export significantly?

Kristen
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-11-18 : 08:17:52
Yes, use queryout and no, it doesn't seem to affect the export speed. I haven't seen a significant difference myself, unless I was ordering on a non-indexed column. It will export in clustered index order, but if you want to be absolutely sure then add the ORDER BY, it probably won't make any difference.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-11-18 : 15:00:00
Darn thing is broken :-(

Here's my command :

bcp "SELECT * FROM MyProdDB.dbo.MyTable WHERE MyDate >= '20041028 00:00:00.000' AND MyDate < '20041029 00:00:00.000' ORDER BY MyPK" queryout MyTable.bcp -e MyTable.out -n -S MyServer -T

The query produces about 1,000,000 rows

It shows the rows Progress Messages (10,000 per batch, or maybe 1,000 I forget exactly). Gets to about the end, and then hangs.

I reduced the date range - if I do from midnight to 7AM its fine, go to 8AM (much MUCH less than 1,000,000 rows) and it hangs - looks like right after the last Progress Message, but before the "This took nnnn seconds; nnnn rows output ..." message.

If I run it for the whole day the Progress Messages go to much higher numbers (i.e. about 1,000,000) - and then hang.

Any ideas?

<Frustrated>

Kristen
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-11-18 : 15:22:22
I think you should run DBCC CHECKTABLE or CHECKDB, you might have errors that need to be corrected.
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-18 : 15:22:47
Is anything in the error file?



Brett

8-)
Go to Top of Page
    Next Page

- Advertisement -