MSSQL batch insert Data Optimization details, mssql insert

Source: Internet
Author: User

MSSQL batch insert Data Optimization details, mssql insert

Requirement

One requirement is to insert 10 million pieces of data into the MSSQL database. The table structure is as follows. How long does it take you to insert 10 million pieces of data into the following MSSQL table?

Or how does your batch data be inserted? I will discuss this issue today.

View the data through the http interface of mvc for stress testing

First, let's just make a reference here to understand the performance of the database to be inserted, and there is no half-cent relationship with the requirements at the beginning.

The mvc interface code is as follows:

public bool Add(CustomerFeedbackEntity m)    {      using (var conn=Connection)      {        string sql = @"INSERT INTO [dbo].[CustomerFeedback]                      ([BusType]                      ,[CustomerPhone]                      ,[BackType]                      ,[Content]                     )                   VALUES                      (@BusType                      ,@CustomerPhone                      ,@BackType                      ,@Content                      )";        return conn.Execute(sql, m) > 0;      }    }

This mvc interface for stress testing inserts a single piece of data into the aggregated data graph of the database.

Use Cases: 5000 requests are divided into 500 threads to execute the post request interface.

This figure shows that the slowest request takes only 4 milliseconds. So let's make an algorithm.

As shown in the requirements at the beginning, we use the minimum response time for calculation.

It takes 100000*4 ms to insert 6.67 data records to the database, which is about minutes. So we are running on this goal to make an insert scheme.

The most common insert method

First, our engineers have written a code segment as follows:

// Number of executed data entries int cnt = 10*10000; // CustomerFeedbackEntity m = new CustomerFeedbackEntity () {BusType = 1, CustomerPhone = "1888888888", BackType = 1, content = "123123dagvhkfhsdjk will surely be the prosperous and prosperous security classification of the coquetry. The high-rise housing prices will be openly available"}; // The first public void FristWay () {using (var conn = new SqlConnection (ConnStr) {conn. open (); Stopwatch sw = new Stopwatch (); sw. start (); StringBuilder sb = new StringBuilder (); Console. W RiteLine ("from:" + DateTime. now. toString ("yyyy-MM-dd HH: mm: ss fff") + "Start cyclic execution:" + cnt + "SQL statements... "); for (int I = 0; I <= cnt; I ++) {sb. clear (); sb. append (@ "insert into [dbo]. [CustomerFeedback] ([BusType], [CustomerPhone], [BackType], [Content]) VALUES ("); sb. append (m. busType); sb. append (", '"); sb. append (m. customerPhone); sb. append ("',"); sb. append (m. backType); sb. append (", '"); sb. append (m. conten T); sb. append ("')"); using (SqlCommand cmd = new SqlCommand (sb. toString (), conn) {cmd. commandTimeout = 0; cmd. executeNonQuery () ;}} Console. writeLine (DateTime. now. when ToString ("yyyy-MM-dd HH: mm: ss fff") + ", run the" + cnt + "SQL statement! Time consumed: "+ sw. ElapsedMilliseconds +" milliseconds. ");}}

The execution result is as follows:

10 million data records, 693906 milliseconds, 11 minutes. Is it okay or acceptable. Dear friends, I don't speak and continue writing. You can see where the MSSQL database works with. Net to stop insertion?

Comments:

1. Non-stop creation and release of sqlcommon objects will result in a waste of performance.

2. Non-stop connection to databases results in high performance loss.

The execution results at these two points tell us that this method is not desirable, even if this is our most common data insertion method.

We will optimize the above two points. 1. Create a sqlcommon object and establish a connection with the database only once. The optimization code is as follows:

Public void SecondWay () {using (var conn = new SqlConnection (ConnStr) {conn. open (); Stopwatch sw = new Stopwatch (); sw. start (); StringBuilder sb = new StringBuilder (); Console. writeLine ("from:" + DateTime. now. toString ("yyyy-MM-dd HH: mm: ss fff") + "Start loop splicing:" + cnt + "SQL statement... "); for (int I = 0; I <= cnt; I ++) {sb. append (@ "insert into [dbo]. [CustomerFeedback] ([BusType], [CustomerPhone], [BackType], [Content]) VALUES ("); sb. append (m. busType); sb. append (", '"); sb. append (m. customerPhone); sb. append ("',"); sb. append (m. backType); sb. append (", '"); sb. append (m. content); sb. append ("')");} var result = sw. elapsedMilliseconds; Console. writeLine (DateTime. now. when ToString ("yyyy-MM-dd HH: mm: ss fff") + ", the SQL statement" + cnt + "is completed! Time consumed: "+ result +" milliseconds. "); Using (SqlCommand cmd = new SqlCommand (sb. toString (), conn) {cmd. commandTimeout = 0; Stopwatch sw1 = new Stopwatch (); sw1.Start (); Console. writeLine ("from:" + DateTime. now. toString ("yyyy-MM-dd HH: mm: ss fff") + "start execution:" + cnt + "SQL statement... "); cmd. executeNonQuery (); Console. writeLine (DateTime. now. when ToString ("yyyy-MM-dd HH: mm: ss fff") + ", run the" + cnt + "SQL statement! Time consumed: "+ sw1.ElapsedMilliseconds +" milliseconds. ");}}}

The execution result is as follows:

Ah, it's strange. Why is there no big difference with a solution?

First, let's take a look at how these SQL statements are executed in the database.

1. view database connection information

Select * from sysprocesses where dbid in (select dbid from sysdatabases where name = 'dbname') -- or SELECT * FROM [Master]. [dbo]. [SYSPROCESSES] WHERE [DBID] IN (SELECT [DBID] FROM [Master]. [dbo]. [SYSDATABASES] where name = 'dbname ')

2. view the SQL statements being executed by the database

SELECT [Spid] = session_id ,      ecid ,      [Database] = DB_NAME(sp.dbid) ,      [User] = nt_username ,      [Status] = er.status ,      [Wait] = wait_type ,      [Individual Query] = SUBSTRING(qt.text,                      er.statement_start_offset / 2,                      ( CASE WHEN er.statement_end_offset = -1                         THEN LEN(CONVERT(NVARCHAR(MAX), qt.text))                            * 2                         ELSE er.statement_end_offset                       END - er.statement_start_offset )                      / 2) ,      [Parent Query] = qt.text ,      Program = program_name ,      hostname ,      nt_domain ,      start_time  FROM  sys.dm_exec_requests er      INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid      CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) AS qt  WHERE  session_id > 50 -- Ignore system spids.      AND session_id NOT IN ( @@SPID ) -- Ignore this current statement.ORDER BY  1 ,

Comments: although it seems to have been optimized, the implementation process is almost the same as that of the previous solution, so you don't have to say much.

Optimize User-Defined table types in the MSSQL database
The Code should be added first, so that you may be interested in the User-Defined table type.

CREATE TYPE CustomerFeedbackTemp AS TABLE(BusType int NOT NULL,CustomerPhone varchar(40) NOT NULL,BackType int NOT NULL,Content nvarchar(1000) NOT NULL)
Public void ThirdWay () {Stopwatch sw = new Stopwatch (); Stopwatch sw1 = new Stopwatch (); DataTable dt = GetTable (); using (var conn = new SqlConnection (ConnStr )) {string SQL = @ "INSERT INTO [dbo]. [CustomerFeedback] ([BusType], [CustomerPhone], [BackType], [Content]) select BusType, CustomerPhone, BackType, [Content] from @ TempTb "; using (SqlCommand cmd = new SqlCommand (SQL, conn) {cmd. commandTimeo Ut = 0; SqlParameter catParam = cmd. parameters. addWithValue ("@ TempTb", dt); catParam. sqlDbType = SqlDbType. structured; catParam. typeName = "dbo. customerFeedbackTemp "; conn. open (); Console. writeLine ("from:" + DateTime. now. toString ("yyyy-MM-dd HH: mm: ss fff") + "insert into memory table cyclically:" + cnt + "data entry... "); sw. start (); for (int I = 0; I <cnt; I ++) {DataRow dr = dt. newRow (); dr [0] = m. busType; dr [1] = m. customerPho Ne; dr [2] = m. backType; dr [3] = m. content; dt. rows. add (dr);} Console. writeLine (DateTime. now. when ToString ("yyyy-MM-dd HH: mm: ss fff") + ", insert the memory table cyclically:" + cnt + "data entries completed! Time consumed: "+ sw. ElapsedMilliseconds +" milliseconds. "); Sw1.Start (); if (dt! = Null & dt. Rows. Count! = 0) {cmd. executeNonQuery (); sw. stop ();} Console. writeLine (DateTime. now. when ToString ("yyyy-MM-dd HH: mm: ss fff") + ", execute:" + cnt + "datatable data to the database! Time consumed: "+ sw1.ElapsedMilliseconds +" milliseconds. ");}}}

Running result:

Wow, Q capture, less than 2 seconds, less than 2 seconds, how is it faster than every 4 milliseconds, can't believe it, is it running a problem.

Try again

Try again

Yes, you are not mistaken. There are 10 million data records, less than 2 seconds. Can't wait to know why? Can't wait to know what type of user-defined table we use?

User-Defined table Type

First, it should be easy to understand the type. int, varchar, bit, and so on are all types. What about the table type as wool?

In fact, the user can define a table structure and treat it as a type.

Detailed document for creating custom types: https://msdn.microsoft.com/zh-cn/library/ms175007.aspx

Second, the custom type also has some restrictions, security: https://msdn.microsoft.com/zh-cn/library/bb522526.aspx

Then how to use this type is used as a table value parameter.

Using Table value parameters, you can send multiple rows of data to a Transact-SQL statement or routine (such as a stored procedure or function) without creating temporary tables or many parameters.

Table value parameters are similar to parameter arrays in ole db and ODBC, but they have higher flexibility and are more closely integrated with Transact-SQL. Another advantage of Table value parameters is the ability to participate in dataset-based operations.

Transact-SQL transfers Table value parameters to routines through reference to avoid creating copies of input data. You can use table value parameters to create and execute Transact-SQL routines, and use any managed language to call them from Transact-SQL code, hosted clients, and local clients.

Advantages

Like other parameters, the scope of Table value parameters is stored procedures, functions, or dynamic Transact-SQL text. Similarly, table type variables have the same scope as any other local variables created using the DECLARE statement. You can declare Table value variables in dynamic Transact-SQL statements, and pass these variables as Table value parameters to stored procedures and functions.

Table value parameters are more flexible and provide better performance in some cases than temporary tables or other methods for passing parameter lists. Table value parameters have the following advantages:

When data is filled in from the client for the first time, the lock is not obtained.

Provides a simple programming model.

Complex business logic can be included in a single routine.

Reduced to round-trips to servers.

Table structures with different base numbers can be created.

Is a strong type.

Allows the client to specify the sorting order and unique key.

Cached like a temporary table when used in a stored procedure. Table value parameters are also cached for parameterized queries starting with SQL Server 2012.

Restrictions

Table value parameters have the following restrictions:

SQL Server does not maintain the statistical information of Table value parameter columns.

The table value parameter must be passed to the Transact-SQL routine as the input READONLY parameter. DML operations such as UPDATE, DELETE, or INSERT cannot be performed on Table value parameters in the routine body.

The table value parameter cannot be used as the target of the select into or insert exec statement. Table value parameters can be in the FROM clause of select into, or in the insert exec string or stored procedure.

Common bulk insert dataset insertion Optimization

Public void FourWay () {Stopwatch sw = new Stopwatch (); Stopwatch sw1 = new Stopwatch (); DataTable dt = GetTable (); using (SqlConnection conn = new SqlConnection (ConnStr )) {SqlBulkCopy bulkCopy = new SqlBulkCopy (conn); bulkCopy. bulkCopyTimeout = 0; bulkCopy. destinationTableName = "CustomerFeedback"; bulkCopy. batchSize = dt. rows. count; conn. open (); Console. writeLine ("from:" + DateTime. now. toString ("yy Yy-MM-dd HH: mm: ss fff ") +" insert into memory table cyclically: "+ cnt +" data entry... "); sw. start (); for (int I = 0; I <cnt; I ++) {DataRow dr = dt. newRow (); dr [0] = m. busType; dr [1] = m. customerPhone; dr [2] = m. backType; dr [3] = m. content; dt. rows. add (dr);} Console. writeLine (DateTime. now. when ToString ("yyyy-MM-dd HH: mm: ss fff") + ", insert the memory table cyclically:" + cnt + "data entries completed! Time consumed: "+ sw. ElapsedMilliseconds +" milliseconds. "); Sw1.Start (); if (dt! = Null & dt. Rows. Count! = 0) {bulkCopy. writeToServer (dt); sw. stop ();} Console. writeLine (DateTime. now. when ToString ("yyyy-MM-dd HH: mm: ss fff") + ", execute:" + cnt + "datatable data to the database! Time consumed: "+ sw1.ElapsedMilliseconds +" milliseconds. ");}

Execution result:

Within 1 second, within 1 second. After reading this, we have to complete the pace of inserting 10 million pieces of data within 1 second, against the day, against the day.

Bulk insert detail: https://msdn.microsoft.com/zh-cn/library/ms188365.aspx

Professional comments:

The usage of Table value parameters is similar to that of other dataset-based variables. However, frequent use of Table value parameters is faster than that of large datasets. The startup overhead of a large-capacity operation is greater than that of table-valued parameters. In contrast, table-valued parameters have good execution performance when the number of inserted rows is less than 1000.

Reusable Table value parameters can benefit from the temporary table cache. This table cache function provides better scalability than the equivalent bulk insert operation. When using a small row INSERT operation, you can use the parameter list or batch Statement (instead of bulk insert operation or table value parameter) to achieve small performance improvement. However, these methods are not very convenient in programming, and as the number of rows increases, the performance will rapidly decline.

The execution performance of Table value parameters is equivalent to or even better than that of peer parameter arrays.

Summary

The following is your favorite summary, which includes three parts:

1. I hope to keep an eye on my other articles.

2. Are there any clear instructions in the blog, or you have a better way? Join the two chat groups in the upper left corner to study and discuss them together.

3. You can forget to pay attention to likes, but never forget to scan the QR code to enjoy them.

The following is a supplement from other netizens:

Your previous insert is a concatenation of strings, and no "parameter" is used. Every SQL SERVER has to parse the statement and there is no way to cache it. Of course, it is slow. You can try to use parameterized insert, it may take several seconds to complete the insertion.

Insert large batches of data. SqlBulkCopy is preferred.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.