| Author |
Topic |
|
Kristen
Test
22859 Posts |
Posted - 2004-11-17 : 15:55:47
|
| Are there particular settings that favour BCP?I attempted to pull 20 million rows, or so, into a table the other day.It took an hour or so to import. There was no change to MyDatabase MDF/LDF during that time, but I then realised that TEMPDB had grown instead.Then there was another hour of activity during which neither TEMPDB nor MyDatabase MDF/LDF changed.Then I got a [duplicate] Primary Key error :-)MyDatabase was set to SIMPLE recovery model.Apart from the error<g>, could I have improved things by setting MyDatabase recovery model to "Bulk Logged" - or even "Full"?I mean, would the data have gone into the database direct? - rather than into TEMPDB and then "Copy" to MyDatabase (which has got to be hardwork, no?)Could I do something to cause BCP to commit batches in, say, 10,000 row blocks?Anything else which speeds up the process?ThanksKristen |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-11-17 : 16:03:55
|
| First, BULK INSERT is preferrable for imports for performance.Second, you could modify the batchsize switch to import in batches rather than all at once. So if you fail on the 40th batch, all the other batches will have been committed. That makes it easier if you need to do it again. If you can get away with SIMPLE recovery model, then it's best to use that.Tara |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2004-11-17 : 16:16:03
|
| Can you clarify that please Tara:For best import performance am I better off with BULK INERT or SIMPLE? (I don't need any TLogs, but if BULK INSERT is faster I'd be happy to turn them on during the job)ThanksKristen |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-11-17 : 16:18:54
|
quote: Originally posted by tduggan First, BULK INSERT is preferrable for imports for performance.
Really?quote: Data can also be transferred into a SQL Server table from a data file using the BULK INSERT statement. However, the BULK INSERT statement cannot bulk copy data from an instance of SQL Server to a data file. The BULK INSERT statement allows you to bulk copy data to an instance of SQL Server using the functionality of the bcp utility with a Transact-SQL statement, rather than from the command prompt.It is also possible to write programs to bulk copy SQL Server data to or from a data file using the bulk copy API. The bulk copy API can be used in ODBC, OLE DB, SQL-DMO, and DB-Library-based applications.
Plus bcp is a minimally logged operation...(unless you tell it to fire triggers)...Ther's some good stuff in bol...don't know why tempdb is blowing out...must be doing read aheads....20 million shouldn't be to much...Brett8-) |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-11-17 : 16:24:03
|
| Still haven't found anything about tempdbBUTDoes your table have any constraints or indexes?I would take them all off and rebuild them after.Also, I'm reading about something that looks pretty cool...Loading Data in parallel..sweet....Brett8-) |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-11-17 : 16:24:51
|
| Not recovery models, but BULK INSERT command instead of bcp.exe. I'm looking in BOL right now to find that statement where it says BULK INSERT is fastest for loading data from a file into a table. Maybe I saw it on MSDN.Tara |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-11-17 : 16:27:49
|
OK...I'm confusedquote: Logged and Minimally Logged Bulk Copy OperationsWhen using the full recovery model, all row-insert operations performed by bcp are logged in the transaction log. For large data loads, this can cause the transaction log to fill rapidly. To help prevent the transaction log from running out of space, a minimally logged bulk copy can be performed if all of these conditions are met: The recovery model is simple or bulk-logged.The target table is not being replicated.The target table does not have any triggers.The target table has either 0 rows or no indexes.The TABLOCK hint is specified. For more information, see Controlling the Locking Behavior. Any bulk copy into an instance of Microsoft® SQL Server™ that does not meet these conditions is logged.Before doing bulk copy operations, it is recommended that you set the recovery model to bulk-logged if you usually use full recovery. This will prevent the bulk copy operations from using excessive log space and possibly filling the log. However, even with bulk-logged recovery, some transaction log space will be used. You may want to create transaction log backups during the bulk copy operation to free up transaction log space.When bulk copying a large number of rows into a table with indexes, it can be faster to drop all the indexes, perform the bulk copy, and re-create the indexes. For more information, see Optimizing Bulk Copy Performance. Note Although data insertions are not logged in the transaction log when a minimally logged bulk copy is performed, SQL Server still logs extent allocations each time a new extent is allocated to the table.
Brett8-) |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-11-17 : 16:29:01
|
quote: Originally posted by tduggan Not recovery models, but BULK INSERT command instead of bcp.exe. I'm looking in BOL right now to find that statement where it says BULK INSERT is fastest for loading data from a file into a table. Maybe I saw it on MSDN.Tara
Doesn't it say that they use the same APIs?Brett8-) |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-11-17 : 16:29:11
|
| I can't find it anywhere. All I can find is that BULK INSERT uses the same functionality as bcp.exe except the advantange of BULK INSERT is that you can do it from T-SQL. Now where did I hear about BULK INSERT being best for imports? Can't remember.Tara |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-11-17 : 16:30:26
|
| What part from your quote are you confused about?Tara |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2004-11-17 : 16:53:43
|
I clearly shouldn't have left this thread unattended for 15 minutes ... Kristen |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2004-11-17 : 17:56:26
|
| AFAIK, BCP uses ODBC transport while BULK INSERT uses TDS (Tabular Data Stream). TDS is faster than ODBC...DavidM"Always pre-heat the oven" |
 |
|
|
MichaelP
Jedi Yak
2489 Posts |
Posted - 2004-11-17 : 18:46:17
|
| I'm with David M on this one. I remember reading the same thing.Michael<Yoda>Use the Search page you must. Find the answer you will.</Yoda> |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-11-17 : 18:54:45
|
From BOL:quote: The bcp command prompt utility copies Microsoft® SQL Server™ data to or from a data file. It is used most frequently to transfer large volumes of data into a SQL Server table from another program, usually another database management system (DBMS). The data is first exported from the source program to a data file, and then imported from the data file into a SQL Server table using bcp. Alternatively, bcp can be used to transfer data from a SQL Server table to a data file for use in other programs. For example, the data can be copied from an instance of SQL Server into a data file. From there, another program can import the data.Note The bcp utility is written using the ODBC bulk copy application programming interface (API). Earlier versions of the bcp utility were written using the DB-Library bulk copy API.Data can also be transferred into a SQL Server table from a data file using the BULK INSERT statement. However, the BULK INSERT statement cannot bulk copy data from an instance of SQL Server to a data file. The BULK INSERT statement allows you to bulk copy data to an instance of SQL Server using the functionality of the bcp utility with a Transact-SQL statement, rather than from the command prompt.It is also possible to write programs to bulk copy SQL Server data to or from a data file using the bulk copy API. The bulk copy API can be used in ODBC, OLE DB, SQL-DMO, and DB-Library-based applications.
I vaguely remember reading that BULK INSERT uses OLE-DB, but I'm probably wrong. TDS is the fastest data access method for SQL Server. SQL Server 2005 will introduce the SQL Native Client (which uses TDS I believe) and will essentially replace OLE DB and ODBC for SQL Server. You should check your database settings, if "select into/bulk copy" is not on, it may be the reason tempdb is used. Setting the recovery model alone may not be enough. I don't know for certain, but I'd suggest turning that setting on, it won't hurt anything anyway. Additionally, if you have non-clustered indexes, you should drop them, perform the bcp/BULK INSERT, then add them back. This could also be why tempdb is growing; it's rebuilding indexes. Also use the TABLOCK and ORDER hints for bcp/BULK INSERT. You CAN drop the clustered index too, you might get even more speed, but it renders the ORDER hint meaningless. If you do drop the clustered index, make sure you recreate it first before you recreate the non-clustered indexes. All of these hints are described in Books Online under "bcp" and "BULK INSERT".If you want an idea of potential speed, I have imported 55 million rows of data (1.8 GB) into a table in 7 minutes using bcp, using the TABLOCK and ORDER hints. Tempdb did not grow, nor did the log. I haven't tried BULK INSERT yet, but I don't really need to. I only keep the clustered index; building the 2-3 nonclustered indexes afterwards takes about 5-7 more minutes. This is using a native format file, a text file might take longer.Don't forget that bcp also exports data, and it's just as fast or faster than importing. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2004-11-18 : 00:57:25
|
7 minutes? My data was basically stuff I wanted to insert into an Archive database, taken from a Production database - exported with BCP using Native format.Off to investigate ... thanks for theinfo RobKristen |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2004-11-18 : 01:37:27
|
The import table only has a [CLUSTERED] PK - an INT which has the IDENTITY property - perhaps that's the problem?Here's my Export:bcp MyProdDB.dbo.MyTable out MyTable.bcp -e MyTable-x.out -n -S MyServer -E -Tand the Import:bcp MyArchiveDB.dbo.MyTable in MyTable.bcp -e MyTable-i.out -n -S MyServer -E -TDatabase attributes (I haven't changed to Bulk Logged yet)Attribute Value----------------------------- -----------------------------------Database MyArchiveDBx1 N ("bits" from master..sysdatabases.status)x2 NBulkcopy NTruncLog YNoChkpt Yx32 Nx64 Nx128 Nx256 Nx512 NReadOnly NDBOOnly NSingleUse Nx8192 Nx16384 Nx32768 Nx65536 NIsAnsiNullDefault IsAnsiNullsEnabled IsAnsiPaddingEnabled IsAnsiWarningsEnabled IsArithmeticAbortEnabled IsAutoClose IsAutoCreateStatistics YesIsAutoShrink IsAutoUpdateStatistics YesIsCloseCursorsOnCommitEnabled IsFulltextEnabled IsInStandBy IsLocalCursorsDefault IsMergePublished IsNullConcat IsNumericRoundAbortEnabled IsQuotedIdentifiersEnabled IsRecursiveTriggersEnabled IsSubscribed IsTornPageDetectionEnabled YesCollation SQL_Latin1_General_CP1_CI_ASRecovery SIMPLESQLSortOrder 52Status ONLINEUpdateability READ_WRITEUserAccess MULTI_USERVersion 539Kristen |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2004-11-18 : 02:00:37
|
| OK, so I should change by IMPORT to bebcp MyArchiveDB.dbo.MyTable in MyTable.bcp -e MyTable-i.out -n -S MyServer -E -T -h "ORDER (MyPKColumn ASC), TABLOCK"or preferably use BULK IMPORT insteadHow do I sort the EXPORT? Will BCP automatically use the PK, or do I have to do a query "SELECT * FROM MyTable ORDER BY MyPKColumn" ? if so does the query slow the export significantly?Kristen |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-11-18 : 08:17:52
|
| Yes, use queryout and no, it doesn't seem to affect the export speed. I haven't seen a significant difference myself, unless I was ordering on a non-indexed column. It will export in clustered index order, but if you want to be absolutely sure then add the ORDER BY, it probably won't make any difference. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2004-11-18 : 15:00:00
|
| Darn thing is broken :-(Here's my command :bcp "SELECT * FROM MyProdDB.dbo.MyTable WHERE MyDate >= '20041028 00:00:00.000' AND MyDate < '20041029 00:00:00.000' ORDER BY MyPK" queryout MyTable.bcp -e MyTable.out -n -S MyServer -TThe query produces about 1,000,000 rowsIt shows the rows Progress Messages (10,000 per batch, or maybe 1,000 I forget exactly). Gets to about the end, and then hangs.I reduced the date range - if I do from midnight to 7AM its fine, go to 8AM (much MUCH less than 1,000,000 rows) and it hangs - looks like right after the last Progress Message, but before the "This took nnnn seconds; nnnn rows output ..." message.If I run it for the whole day the Progress Messages go to much higher numbers (i.e. about 1,000,000) - and then hang.Any ideas?<Frustrated>Kristen |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-11-18 : 15:22:22
|
| I think you should run DBCC CHECKTABLE or CHECKDB, you might have errors that need to be corrected. |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-11-18 : 15:22:47
|
| Is anything in the error file?Brett8-) |
 |
|
|
Next Page
|