How to challenge importing millions of data into SQL Server within 4 seconds

Last Update:2018-12-05 Source: Internet

Author: User

Tags bulk insert sql 2008

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The import of large data volumes is a common problem for DBAs. Here we will discuss the import of millions of data volumes in the SQL server environment, and hope to help you. The 51cto editor recommends the topic "getting started with SQL Server.

In practice, you sometimes need to import a large amount of data into the database and then use it for various program calculations. In this experiment, we will use step 5 to complete this process and record the time spent by various methods in detail.

The tools used in this experiment are Visual Studio 2008, SQL Server 2000, and SQL Server 2008. in step 5, 1 million pieces of data are imported into SQL Server 2000 and SQL Server 2008 respectively, the experiment environment is a Dell 2850 dual 2.0 gcpu, 2 GB memory server. If you are interested, you can download the source code and verify the time used.

In this experiment, isline is used to execute SQL statements.
The dataprovider module in the framework. This module only reads and encapsulates the SQL configuration and does not affect the final result.
For more information about the framework, see the "isline framework" framework series.

Next, let's go to the question: Use Basic insert statements, bulk insert statements, bulk insert in multiple threads, sqlbulkcopy, and sqlbulkcopy in multiple threads, challenge the 4-second limit.

For the database, use SQL Server 2000 and SQL Server 2008. The table name is tableb and the field name is value1. The database name can be modified in APP. config. The default value is test.

Figure 1 five methods in the test

Method 1: Use Basic insert statements

This method is the most basic method that most people will think of at first. However, the insert statement does not seem suitable for a large number of operations. Is that true?

In this method, 1 million data is divided into 10 batches, with 0.1 million data records in each batch and 1 transaction in each 0.1 million data records. The data is imported into the database 10 times.

--> Basic statement:

 
 
  
  Insert Into TableB (Value1) values (‘”+i+”’);

Note: The I in the statement is an accumulation variable in the Host Program and is used to fill the value in the database field.

SQL Server 2000 elapsed time: 901599

SQL Server 2008 elapsed time: 497638

Method 2: Use the bulk insert statement

The effect of this class is the most satisfactory in this experiment. It is easy to use, flexible, and fast.

The "bulk insert" statement does not seem very common,
Aicken heard that Oracle can map external files to temporary Oracle tables, and then directly import data from temporary tables to other Oracle tables.
SQL Server BULK INSERT is also satisfactory?

--> Basic statement:

 
 
  
  BULK INSERT TableB FROM 'c:\\sql.txt' 
  
  WITH (FIELDTERMINATOR = ',',ROWTER /.,mbMINATOR='|',BATCHSIZE = 100000)

Note: "C: \ SQL .txt" is a pre-generated file containing 100 pieces of data. The data is separated by the "|" symbol, and each 0.1 million pieces of data is a transaction.

SQL Server 2000 elapsed time: 4009

SQL Server 2008 elapsed time: 10722

Method 3: Use bulk insert in multiple threads

On the basis of method 2, divide 1 million pieces of data into five threads. Each thread is responsible for 0.2 million pieces of data, each 50 thousand pieces of data, and five threads are started at the same time. Let's take a look at this effect.

SQL Server 2000 elapsed time: 21099

SQL Server 2008 elapsed time: 10997

Method 4: Use the sqlbulkcopy class

This method is fast, but it depends on the memory. For complex data with tens of millions of records and multiple fields, the memory may be greatly consumed, however, you can use a 64-bit solution to solve this problem.

Tens of millions of data records and multiple fields are generally encountered in some business scenarios. For example, when calculating the consumption amount of a global consumer in a certain business cycle, you must obtain a snapshot of the member consumption record in the primary database table and
Store snapshots in temporary tables for computing programs to use the data. In addition, sometimes the consumption data of consumers is not in a database server, but from multiple servers in multiple countries.
You must use the memory or external storage device to transfer the data, then clean, merge, and detect the data, and finally import the dedicated table for use by the computing program.

Basic statement:

 
 
  
  using (System.Data.SqlClient.SqlBulkCopy sqlBC = new System.Data.SqlClient.SqlBulkCopy(conn))   
  
  { sqlBC.BatchSize = 100000;   
  
  sqlBC.BulkCopyTimeout = 60;   
  
  sqlBC.DestinationTableName = "dbo.TableB";   
  
  sqlBC.ColumnMappings.Add("valueA", "Value1");   
  
  sqlBC.WriteToServer(dt);   
  
  }

Note:

Batchsize = 100000; indicates that each 0.1 million transactions are committed.

Bulkcopytimeout = 60; indicates Processing Based on timeout in 60 seconds

Destinationtablename = "DBO. tableb"; indicates that data is imported into Table B.

Columnmappings. Add ("valuea", "value1"); indicates matching the valuea field in memory with the value1 field in tableb

Writetoserver (DT); write data to the database. DT is a pre-built able, which contains the valuea field.

SQL Server 2000 elapsed time: 4989

SQL Server 2008 elapsed time: 10412

Method 5: Use the sqlbulkcopy class in multiple threads

Based on Method 4, 1 million pieces of data are divided into five threads. Each thread is responsible for 0.2 million pieces of data, each 50 thousand pieces of data, and five threads are started at the same time. Let's take a look at this effect.

SQL 2000 elapsed time: 7682

SQL 2008 elapsed time: 10870

Summary

Database Test Method	Sqls 2000	Sqls 2008
Basic insert	901599	497638
Single-thread BULK INSERT	4209	10722
Multi-thread BULK INSERT	21099	10997
Single-thread sqlbulkcopy	4989	10412
Multi-thread sqlbulkcopy	7682	10870

The above are the results of the experiments over the past few days. What is disappointing is that SQL Server 2008 does not seem as good as we think.

Source Code address: http://files.cnblogs.com/isline/sqltest.rar

Test data: http://files.cnblogs.com/isline/Data.rar

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More