C # Four ways to BULK insert data into SQL Server-go

Source: Internet
Author: User
Tags bulk insert

Create a database and table for testing, to make inserting data faster, the primary key in the table is a GUID, and no index is created in the table. GUIDs are bound to be faster than self-growth, because you will have less time to generate a GUID algorithm than to re-query the ID value of the previous record from the data table and then add the 1 operation. In the case of an index, the index is rebuilt every time the record is inserted, which is very performance-intensive. If there is no avoidable index in the table, we can improve efficiency by first deleting the index, then bulk inserting, and finally rebuilding the index.

Create Database Carsys;
Go
Use Carsys;

CREATE TABLE Product (Id uniqueidentifier PRIMARY key,name VARCHAR () not Null,price DECIMAL (18,2) not NULL)

We insert data through SQL scripts, which are common in four ways.

Mode one: One insert, the worst performance, not recommended.

Insert into product (Id,name,price) values (NEWID (), ' Barn 1 ', "INSERT INTO Product" (Id,name,price) VALUES (NEWID (), ' 2 section of the Barn ', 260);

Mode two: Insert bulk

The syntax is as follows:

    BULK INSERT [[' database_name '.] [' owner '].           {' table_name ' from ' data_file '} with ([batchsize [= Batch_size]], [check_constraints], [COD epage [= ' ACP ' | ' OEM ' | ' RAW ' | ' Code_page '], [datafiletype [= ' char ' | ' Native ' | ' Widechar ' | ' Widenative '], [fieldterminator [= ' field_terminator ']], [firstrow [= Fir St_row]], [fire_triggers], [formatfile = ' Format_file_path '], [keepidenti TY], [Keepnulls], [kilobytes_per_batch [= Kilobytes_per_batch]], [LAS Trow [= Last_row]], [maxerrors [= Max_errors]], [ORDER ({column [ASC | DESC]} [,... n])], [rows_per_batch [= Rows_per_batch]], [rowterminator [= ' Row_te   Rminator '], [TABLOCK],)

Related parameter Description:

BULK INSERT [database_name. [Schema_name]. | Schema_name.    ] [table_name | view_name] FROM ' data_file ' [with ([[,] batchsize = batch_size] The--batchsize directive sets the number of records that can be inserted into a table in a single transaction [[,] check_constraints]-Specifies that all constraints on the target table or view must be checked during the bulk-import operation.      Without the check_constraints option, all CHECK and FOREIGN KEY constraints will be ignored, and the table's constraints will be marked as untrusted after this operation. [[,] CODEPAGE = {' ACP ' | ' OEM ' | ' RAW ' | ' Code_page '}]--Specifies the code page for the data in the data file [[,] datafiletype = {' char ' | ' Native ' | ' Widechar ' |      ' Widenative '}]--Specifies that BULK INSERT performs an import operation using the specified data file type value. [[,] fieldterminator = ' field_terminator ']--identifies the symbol separating the content [[,] firstrow = First_row]--Specifies the line number of the first row to load. The default value is the first row in the specified data file [[,] fire_triggers]--whether the trigger is started [[,] formatfile = ' Format_file_path '] [[,] KE Epidentity]--Specifies that the identity value in the import data file is used to identify the column [[,] keepnulls]--Specifies that an empty column should retain a null value during the bulk import operation without inserting any default values for the column [[,] Kilobyt Es_per_batch = Kilobytes_per_batch] [[,] LastRow = Last_row]--Specifies the line number of the last row to load [[,] maxerrors = max_errors]--Specifies the maximum number of syntax errors that are allowed in the data, and the bulk import operation is canceled after that number. [[,] ORDER ({column [ASC | DESC]} [,... N]]--Specify how the data in the data file is sorted [[,] rows_per_batch = Rows_per_batch] [[,] rowterminator = ' Row _terminator ']--the symbol that identifies the delimited row [[,] TABLOCK]--Specifies the duration of the bulk-import operation to get a table-level lock [[,] errorfile = ' file_name ']--           Specifies a file that is used to collect rows that are malformed and cannot be converted to an OLE DB rowset.    )]

Way three: INSERT into xx Select ...

INSERT into Product (id,name,price) SELECT NEWID (), ' Barn 1 segment ',  UNION all  

Mode four: Stitching SQL

INSERT into Product (id,name,price) VALUES (NEWID (), ' Cattle Barn 1 ', (newid (), ' Cattle Barn 2 segment ', 260) ...

There are four ways to implement bulk operations in C # by using ADO.

Method One: Insert
        #region way a static void Insertone () {Console.WriteLine ("implemented in an insert-by-line manner");            Stopwatch SW = new Stopwatch ();            The using (SqlConnection conn = new SqlConnection (strconnmsg))//using will automatically open and close connections.                {String sql = "INSERT into Product (id,name,price) VALUES (NEWID (), @p,@d)"; Conn.                Open (); for (int i = 0; i < TotalRow; i++) {using (SqlCommand cmd = new SqlCommand (SQL, Conn )) {cmd.                        Parameters.addwithvalue ("@p", "goods" + i); Cmd.                        Parameters.addwithvalue ("@d", I); Sw.                        Start (); Cmd.                        ExecuteNonQuery (); Console.WriteLine (String. Format ("Insert a record, elapsed {0} milliseconds", SW.)                    Elapsedmilliseconds)); } if (i = = GetRow) {sw.                        Stop ();             Break       }}} Console.WriteLine (String. Format ("Insert {0}" record, insert time per {4} bar is {1} milliseconds, estimated insertion time is {2} milliseconds, {3} minutes ",
TotalRow, SW. Elapsedmilliseconds, (SW. Elapsedmilliseconds/getrow) * TotalRow), Getminute (SW. Elapsedmilliseconds/getrow * totalrow)), GetRow)); } static int Getminute (long L) {return (Int32) l/60000; } #endregion

The results of the operation are as follows:

We will find that inserting 100w Records, it is expected to take 50 minutes, each insert a record about 3 milliseconds.

Mode two: Use Sqlbulk
        #region mode two static void Inserttwo () {Console.WriteLine ("Implementation using BULK Insert");            Stopwatch SW = new Stopwatch ();             DataTable dt = GetTableSchema (); using (SqlConnection conn = new SqlConnection (strconnmsg)) {SqlBulkCopy bulkcopy = new SQLBULKC                OPY (conn);                Bulkcopy.destinationtablename = "Product"; bulkcopy.batchsize = dt.                Rows.Count; Conn.                Open (); Sw.                Start (); for (int i = 0; i < totalrow;i++) {DataRow dr = dt.                    NewRow ();                    Dr[0] = Guid.NewGuid (); DR[1] = string.                    Format ("Commodity", I);                    DR[2] = (decimal) i; Dt.                Rows.Add (DR); } if (dt! = null && dt.                        Rows.Count! = 0) {bulkcopy.writetoserver (dt); Sw.                    Stop ();     }               Console.WriteLine (String. Format ("Insert {0}" record has a total cost of {1} milliseconds, {2} minutes ", TotalRow, SW.) Elapsedmilliseconds, Getminute (SW.            (elapsedmilliseconds)));            }} static DataTable GetTableSchema () {datatable dt = new DataTable (); Dt. Columns.addrange (new datacolumn[] {new DataColumn ("Id", typeof (Guid)), New DataColumn ("Name", typeof (str            ing)), new DataColumn ("Price", typeof (Decimal))});        return DT; } #endregion

The results of the operation are as follows:

Inserting 100w Records is more than 8s, is not very slip.

Open SQL Server Profiler trace and you will find the following statement executed:

Insert Bulk Product ([Id] uniqueidentifier, [NAME] VarChar (COLLATE chinese_prc_ci_as, [Price] Decimal (18,2))
Method Three: Insert data using TVPS (table-valued parameter)

Support for TVPS starting from SQL Server 2008. Create the cache table Producttemp and execute the following SQL.

CREATE TYPE producttemp as  TABLE (Id uniqueidentifier PRIMARY key,name VARCHAR () not Null,price DECIMAL (18,2) not NUL L

After execution, you will find that there is one more cache table below the database Carsys producttemp

It takes more than 11 seconds to insert a 100w record.

Mode four: Stitching SQL

This method is limited in C # and can only be inserted in batches of 1000 at a time, so it has to be segmented for insertion.

        #region mode four static void Insertfour () {Console.WriteLine ("Implemented by splicing bulk SQL Insert");            Stopwatch SW = new Stopwatch ();            The using (SqlConnection conn = new SqlConnection (strconnmsg))//using will automatically open and close connections. {Conn.                Open (); Sw.                Start ();                     for (int j = 0; J < totalrow/getrow;j++) {StringBuilder sb = new StringBuilder (); Sb.                    Append ("INSERT into Product (id,name,price) VALUES"); using (SqlCommand cmd = new SqlCommand ()) {for (int i = 0; i < GetRow; i++) {sb.                        AppendFormat ("(NEWID (), ' commodity {0} ', {0}),", j*i+i); } cmd.                        Connection = conn; Cmd.commandtext = sb. ToString ().                        TrimEnd (', '); Cmd.                  ExecuteNonQuery ();  }} SW.                Stop (); Console.WriteLine (String. Format ("Insert {0} record, time elapsed {1} milliseconds", TOTALROW,SW.            Elapsedmilliseconds)); }} #endregion

The results of the operation are as follows:

We can see that it took about 10 minutes. Although on the basis of mode one, performance has been greatly improved, but obviously not fast enough.

Summary: Big Data Bulk Insert mode One and way four try to avoid using , while the way of two and San du is very efficient bulk insert data way. It is inserted through the construction of a DataTable, and we know that the DataTable is in memory, so when the amount of data is particularly large, large to memory can not be stored in a single time, may be segmented insert. For example, you need to insert 90 million pieces of data, you can insert it into 9 segments, and insert 10 million pieces at a time. We should try to avoid the database operation directly in the For loop. Each time the database is connected, opened, and closed is time consuming, although there is a database connection pool in C #, that is, when we use using or conn. Close (), to release the connection, actually did not actually shut down the database connection, it just let the connection in a similar way to hibernate, when the operation again, the connection pool will find a dormant connection, wake it up, this can effectively improve concurrency, reduce connection loss. We can configure the number of connections in the connection pool.

C # Four ways to BULK insert data into SQL Server-go

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.