Detailed C # Four ways to BULK insert data into SQL Server _c# tutorial

Source: Internet
Author: User
Tags bulk insert first row create database

In this article, I'll explain a lot about inserting data in SQL Server in the future.

To create a database and table to test, the primary key in the table is a GUID, and no indexes are created in the table in order for the data to be inserted faster. GUIDs are bound to be faster than self growth, because the time you take to generate a GUID algorithm is certainly less than the number of times you requery the ID of the previous record from the datasheet and then add a 1 operation. In the case of an index, indexing is rebuilt every time the insert record is present, which can be very performance-intensive. If there is an unavoidable index in the table, we can improve efficiency by deleting the index first, then inserting it in bulk, and then rebuilding the index.

Create database Carsys; 
Go use 
carsys; 
Go 
CREATE TABLE Product (
Id uniqueidentifier PRIMARY KEY,
NAME VARCHAR (m) not NULL, price
DECIMAL (18,2 ) Not NULL
)

We insert data through SQL scripts, which is common in the following four ways.

Way one: One inserts, the performance is the worst, does not recommend to use.

INSERT into Product (id,name,price) VALUES (NEWID (), ' Cattle 1 ', 160);
INSERT into Product (id,name,price) VALUES (NEWID (), ' Pen 2 ', and ') ';
......

Mode two: Insert bulk

The syntax is as follows:

BULK INSERT [[' database_name '.] [' owner '].] {' table_name ' from ' data_file '} 
 With ( 
  [batchsize [= Batch_size]], 
  [check_constraints],  
  [CODEPAGE [= ' ACP ' | ' OEM ' | ' RAW ' | ' Code_page ']], 
  [datafiletype [= ' char ' | ' Native ' | ' Widechar ' | ' Widenative ']],  
  [fieldterminator [= ' field_terminator ']], 
  [firstrow [= First_row]], 
  [fire_triggers ], 
  [formatfile = ' Format_file_path '], 
  [keepidentity], [ 
  keepnulls], 
  [kilobytes_per_batch [= kilo  Bytes_per_batch]], 
  [lastrow [= Last_row]], 
  [maxerrors [= Max_errors]], 
  [Order ({column [ASC | DESC]} [,... n])], 
  [rows_per_batch [= Rows_per_batch]], 
  [rowterminator [= ' row_terminator ']],
   
    [TABLOCK] 
 
   

Related parameter Description:

 BULK INSERT [database_name. [Schema_name]. | Schema_name. ] [table_name | view_name] FROM ' data_file ' [[[,] batchsize = batch_size]--batchsize directive to be inserted in a single transaction Number of records in the table [[,] check_constraints]--Specifies that all constraints to the target table or view must be checked during a bulk-import operation. 
 Without the check_constraints option, all CHECK and FOREIGN KEY constraints are ignored, and the table's constraints are marked as untrusted after this operation. [[,] CODEPAGE = {' ACP ' | ' OEM ' | ' RAW ' | ' Code_page '}]--Specifies the code page for the data in the data file [[,] datafiletype = {' char ' | ' Native ' | ' Widechar ' | 
 ' Widenative '}]--Specifies that BULK inserts perform the import operation using the specified data file type value. [[,] fieldterminator = ' field_terminator ']--identifies the symbol that separates the content [[,] firstrow = First_row]--Specifies the line number of the first row to load. The default value is the first row [[,] fire_triggers] in the specified data file--whether to start the trigger [[,] formatfile = ' Format_file_path '] [[,] keepidentity]- -Specifies that the identity value in the import data file is used to identify the column [[,] keepnulls]--Specifies that the empty column should retain a null value during the bulk import operation, without inserting any default values for the column [[,] Kilobytes_per_batch = Kilobytes_ Per_batch] [[,] lastrow = Last_row]--Specifies the line number of the last line to be loaded [[,] maxerrors = max_errors]--Specify the maximum syntax error allowed in the dataThe number of errors, after which the bulk import operation will be canceled. [[,] Order ({column [ASC | DESC]} [,... N]]--Specify how the data in the data file is sorted [[,] rows_per_batch = Rows_per_batch] [[,] rowterminator = ' Row_termina Tor ']--identifies a delimited line of symbols [[,] TABLOCK]--Specifies that a table-level lock is fetched for the duration of the bulk-import operation [[,] errorfile = ' file_name ']--Specifies that the collection format is incorrect and cannot be converted to OLE D 
 The file for the row of the B rowset. )]

Mode three: INSERT into xx Select ...

 INSERT into Product (id,name,price)
 Select NEWID (), ' pen 1 ', 160 
 UNION all 
 SELECT NEWID (), ' Cattle 2 ', 180
 UNION All
...

Mode four: Splicing SQL

INSERT into Product (id,name,price) VALUES
(NEWID (), ' pen 1 ', 160)
, (NEWID (), ' Pen 2 ',) ...

There are four different ways to implement bulk operations in C # by Ado.net.

Way one: inserting

#region Way One static void Insertone () {Console.WriteLine ("implemented in a single insert Way");
  stopwatch SW = new Stopwatch ();
  using (SqlConnection conn = new SqlConnection (strconnmsg))//using automatically open and close connections.
  {String sql = INSERT into Product (id,name,price) VALUES (NEWID (), @p,@d) "; Conn.
  Open (); for (int i = 0; i < TotalRow i++) {using (SqlCommand cmd = new SqlCommand (SQL, conn)) {cmd.
   Parameters.addwithvalue ("@p", "commodity" + i); Cmd.
   Parameters.addwithvalue ("@d", I); Sw.
   Start (); Cmd.
   ExecuteNonQuery (); Console.WriteLine (String. Format ("Insert a record that is time-consuming {0} milliseconds", SW.
   Elapsedmilliseconds)); } if (i = = GetRow) {sw.
   Stop ();
   Break }} Console.WriteLine (String. Format (Insert {0} records, insert time per {4} bar is {1} milliseconds, estimated total insertion time is {2} milliseconds, {3} minutes, TotalRow, SW. Elapsedmilliseconds, (SW. Elapsedmilliseconds/getrow) * TotalRow), Getminute (SW.
 Elapsedmilliseconds/getrow * totalrow)), GetRow));
 The static int getminute (long L) {return (Int32) l/60000;
 } #endregion

The results of the operation are as follows:

We will find that inserting the 100w record is expected to take 50 minutes, and it may take about 3 milliseconds to insert a record each time.

Mode two: Using Sqlbulk

#region mode two static void Inserttwo () {Console.WriteLine ("Use BULK INSERT implementation");
  stopwatch SW = new Stopwatch (); 
  DataTable dt = GetTableSchema ();
  using (SqlConnection conn = new SqlConnection (strconnmsg)) {SqlBulkCopy bulkcopy = new SqlBulkCopy (conn);
  Bulkcopy.destinationtablename = "Product"; bulkcopy.batchsize = dt.
  Rows.Count; Conn.
  Open (); Sw.
  Start (); for (int i = 0; i < totalrow;i++) {DataRow dr = dt.
   NewRow ();
   Dr[0] = Guid.NewGuid (); DR[1] = string.
   Format ("Commodity", I);
   DR[2] = (decimal) i; Dt.
  Rows.Add (DR); } if (dt!= null && dt.
   Rows.Count!= 0) {bulkcopy.writetoserver (dt); Sw.
   Stop (); } Console.WriteLine (String. Format ("Insert {0} records spend {1} milliseconds, {2} minutes", TotalRow, SW.) Elapsedmilliseconds, Getminute (SW.
  elapsedmilliseconds)));
  } static DataTable GetTableSchema () {DataTable dt = new DataTable (); Dt. Columns.addrange (new datacolumn[] {new DataColumn ("Id", typeof (Guid)), New DataColumn ("Name", typeof (String)), new Da TaColumn ("Price", typeof (Decimal))});
 return DT;
 } #endregion

The results of the operation are as follows:

Insert 100w record only 8s more, is not very slip.

Turning on SQL Server Profiler tracing, you will find that the following statement is executed:

insert bulk Product ([Id] UniqueIdentifier, [NAME] VarChar(50) COLLATE Chinese_PRC_CI_AS, [Price] Decimal(18,2))

Mode three: Insert data using TVPS (table-valued parameters)

Support for TVPS starting with SQL Server 2008. Create the cached table Producttemp and execute the following SQL.

CREATE TYPE producttemp as TABLE (
Id uniqueidentifier PRIMARY KEY,
NAME VARCHAR not NULL, price
DECIMAL (1 8,2) not NULL
)

After the execution is complete, you will find that there is a cache table below the database Carsys producttemp

It is more than 11 seconds to insert the 100w record into the visible section.

Mode four: Splicing SQL

This method is limited in C # and can only be inserted in batches of 1000 in one go, so it is necessary to insert the fragment.

#region mode four
 static void Insertfour ()
 {
  Console.WriteLine ("Implemented in the way of splicing bulk SQL inserts");
  stopwatch SW = new stopwatch ();
  using (SqlConnection conn = new SqlConnection (strconnmsg))//using automatically open and close connections.
  {
  Conn. Open ();
  Sw. Start ();
  for (int j = 0; J < totalrow/getrow;j++)
  {
   StringBuilder sb = new StringBuilder ();
   Sb. Append ("INSERT into Product (id,name,price) VALUES");
   using (SqlCommand cmd = new SqlCommand ())
   {for 
   (int i = 0; i < GetRow; i++)
   {
    sb. AppendFormat ("(NEWID (), ' product {0} ', {0} '),", j*i+i);
   }
   Cmd. Connection = conn;
   Cmd.commandtext = sb. ToString (). TrimEnd (', ');
   Cmd. ExecuteNonQuery ();
   }
  Sw. Stop ();
  Console.WriteLine (String. Format ("Insert {0} records, time consuming {1} milliseconds", TOTALROW,SW.) Elapsedmilliseconds));
  }
 #endregion

The results of the operation are as follows:

We can see it took about 10 minutes. Although on the basis of mode one, the performance has a big promotion, but obviously still not fast enough.

Summary: Large Data Bulk insert mode One and mode four try to avoid using, while mode two and way Sandu is very efficient in bulk insert data mode. It is inserted through the construction of a DataTable, and we know that the DataTable exists in memory, so when the volume of the data is particularly large, it is too large to be stored in memory at once, and can be inserted in segments. For example, you need to insert 90 million pieces of data, can be divided into 9 paragraphs to insert, one insertion 10 million. We should try to avoid the database operation directly in the For loop. Each time the database is connected, opened, and closed is time-consuming, although there is a database connection pool in C #, which is when we use using or conn. Close (), when the connection is released, it does not actually shut down the database connection, it just makes the connection exist in a way that is similar to hibernation, and when you do it again, you will find a dormant connection from the connection pool to wake it up, which can effectively improve the concurrency and reduce the connection loss. We can configure the number of connections in the connection pool.

SOURCE download

The above is the entire content of this article, I hope the content of this article for everyone's study or work can bring some help, but also hope that a lot of support cloud Habitat community!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.