C # It takes only 4 seconds to import 1 million pieces of data into the SQL Server database (with source code) 2010-10-13 from: cnblogs font size: [large, medium, small]
- Abstract: This article introduces four methods for importing millions of data into the SQL Server database in C # and compares their efficiency. sqlbulkcopy can efficiently import data, test code and data are provided for download.
In practice, you sometimes need to import a large amount of data into the database and then use it for various program calculations. In this experiment, we will use step 5 to complete this process and record the time spent by various methods in detail.
The tools used in this experiment are vs2008, SQL Server 2000, and SQL Server 2008. The methods in step 5 are respectively used to import 1 million pieces of data into SQL 2000 and SQL 2008, the experiment environment is a Dell 2850 dual 2.0 gcpu, 2 GB memory server. If you are interested, you can download the source code and verify the time used.
Note that the dataprovider module in the isline framework is used for SQL statement execution in this experiment. This module only reads and encapsulates the SQL configuration, this does not affect the final result. For details about the isline framework, see the "isline framework" framework series.
Next, let's go to the question: Use Basic insert statements, bulk insert statements, bulk insert in multiple threads, sqlbulkcopy, and sqlbulkcopy in multiple threads, challenge the 4-second limit.
For databases, SQL 2000 and SQL 2008 are used. The table name is tableb and the field name is value1. You can modify the database name in APP. config. The default value is test.
Figure 1 five methods in the test
Method 1.Use BasicInsertStatement
This method is the most basic method that most people will think of at first. However, the insert statement does not seem suitable for a large number of operations. Is that true?
In this method, 1 million data is divided into 10 batches, with 0.1 million data records in each batch and 1 transaction in each 0.1 million data records. The data is imported into the database 10 times.
Basic statement:Insert into tableb (value1) values ('"+ I + "');
Note: The I in the statement is an accumulation variable in the Host Program and is used to fill the value in the database field.
SQL 2000 elapsed time: 901599
SQL 2008 elapsed time: 497638
Method 2.UseBULK INSERTStatement
The effect of this class is the most satisfactory in this experiment. It is easy to use, flexible, and fast.
The "bulk insert" statement does not seem very commonly used. aicken heard that Oracle can map external files to temporary Oracle tables, and then directly import data from temporary tables to other Oracle tables, the speed of this method is very satisfactory. Is the bulk insert of SQL Server equally satisfactory?
Basic statement:Bulk insert tableb from 'C: \ SQL .txt 'with (fieldterminator =', ', rowter/., mbminator =' | ', batchsize = 100000)
Note: "C: \ SQL .txt" is a pre-generated file containing 100 pieces of data. The data is separated by the "|" symbol, and each 0.1 million pieces of data is a transaction.
SQL 2000 elapsed time: 4009
SQL 2008 elapsed time: 10722
Method 3.Use in multiple threadsBULK INSERT
On the basis of method 2, divide 1 million pieces of data into five threads. Each thread is responsible for 0.2 million pieces of data, each 50 thousand pieces of data, and five threads are started at the same time. Let's take a look at this effect.
SQL 2000 elapsed time: 21099
SQL 2008 elapsed time: 10997
Method 4.UseSqlbulkcopyClass
This method is fast, but it depends on the memory. For complex data with tens of millions of records and multiple fields, the memory may be greatly consumed, however, you can use a 64-bit solution to solve this problem.
Tens of millions of data records and multiple fields are generally encountered in some business scenarios. For example, when calculating the consumption amount of a global consumer in a certain business cycle, first, you need to obtain a snapshot of the member consumption record in the primary database table, and store the snapshot in the temporary table for the computing program to use the data. In addition, sometimes the consumer's consumption data is not in a database server, but from multiple servers in multiple countries. In this way, we must use memory or external storage devices to transfer the data, then clean, merge, check, and finally import the dedicated table for use by the computing program.
Basic statement:
using (System.Data.SqlClient.SqlBulkCopy sqlBC = new System.Data.SqlClient.SqlBulkCopy(conn)) { sqlBC.BatchSize = 100000; sqlBC.BulkCopyTimeout = 60; sqlBC.DestinationTableName = "dbo.TableB"; sqlBC.ColumnMappings.Add("valueA", "Value1"); sqlBC.WriteToServer(dt); }
Note:
Batchsize = 100000; indicates that each 0.1 million transactions are committed.
Bulkcopytimeout = 60; indicates Processing Based on timeout in 60 seconds
Destinationtablename = "DBO. tableb"; indicates that data is imported into Table B.
Columnmappings. Add ("valuea", "value1"); indicates matching the valuea field in memory with the value1 field in tableb
Writetoserver (DT); write data to the database. DT is a pre-built able, which contains the valuea field.
SQL 2000 elapsed time: 4989
SQL 2008 elapsed time: 10412
Method 5.Use the sqlbulkcopy class in multiple threads
Based on Method 4, 1 million pieces of data are divided into five threads. Each thread is responsible for 0.2 million pieces of data, each 50 thousand pieces of data, and five threads are started at the same time. Let's take a look at this effect.
SQL 2000 elapsed time: 7682
SQL 2008 elapsed time: 10870
Summary
Database Test Method |
Sqls 2000 |
Sqls 2008 |
Basic insert |
901599 |
497638 |
Single-thread BULK INSERT |
4209 |
10722 |
Multi-thread BULK INSERT |
21099 |
10997 |
Single-thread sqlbulkcopy |
4989 |
10412 |
Multi-thread sqlbulkcopy |
7682 |
10870 |
The above are the results of the experiments over the past few days. What is disappointing is that SQL Server 2008 does not seem as good as we think.
In addition, if you download the source code, Let me see why the multi-thread performance is not as good as that of a single thread? Is it because of improper use of static resources, or why?
Source Code address: http://files.cnblogs.com/isline/sqltest.rar
Test data: http://files.cnblogs.com/isline/Data.rar