MySQL Bulk SQL insert various performance optimizations

Source: Internet
Author: User

For some systems with a large amount of data. Database problems In addition to inefficient query, there is a long data storage time. Especially like a reporting system, the time spent on data import every day can be as long as a few hours or more than 10 hours. So. Optimizing database Insert performance is very meaningful.
After some performance tests on MySQL InnoDB, we found some ways to improve the efficiency of insert. For everyone to participate in the test.

1. One SQL statement inserts multiple data.

Frequently used INSERT statements such as:

insert  INTO  Span class= "hljs-string" > ' insert_table '  ( ' datetime ' ,  ' uid ' ,  ' content ' ,  Type '  ') values  ( ' 0 ' ,  ' userid_0 ' ,  ' content_0 ' , 0  ); INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ',  ' type ') VALUES (' 1', ' userid_1 ', ' content_1 ', 1);

Change into:

INSERT INTO `insert_table` (`datetime`,`uid`,`content`,`type`)VALUES (‘0‘,‘userid_0‘,‘content_0‘,0),(‘1‘,‘userid_1‘,‘content_1‘,1);

The modified insert operation can improve the efficiency of the program insertion. Another major reason for the high efficiency of SQL is that the combined log volume (MySQL binlog and InnoDB transactions let logs) is reduced. Reduce the amount and frequency of log brush disk data. thereby improving efficiency. By merging SQL statements. At the same time can also reduce the number of SQL statement resolution, reduce network transmission of IO.

Some test data are provided here. Each is a single data import and conversion into a SQL statement for import, Test 100, 1000, 10,000 data records respectively.

2, insert processing in the transaction.

Change the INSERT INTO:

 START TRANSACTION; INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ',    ' type ') VALUES (' 0', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ',     ' type ') VALUES (' 1', ' userid_1 ', ' content_1 ', 1);... COMMIT;

Using transactions can improve the efficiency of inserting data, because in the case of an insert operation, MySQL internally establishes a transaction within the transaction for a true insert processing operation. By using transactions, you can reduce the cost of creating transactions, and all inserts are run before committing.

A test comparison is also provided, each of which does not use transactions with transactions in the case of 100, 1000, 10,000 records.

3, the data is inserted in order.

An orderly insertion of data means that the inserted record is arranged in an orderly manner on the primary key, such as a datetime is the primary key of the record:

insert  INTO  Span class= "hljs-string" > ' insert_table '  ( ' datetime ' ,  ' uid ' ,  ' content ' ,  Type '  ') values  ( ' 1 ' ,  ' userid_1 ' ,  ' content_1 ' , 1 ); insert  INTO  Span class= "hljs-string" > ' insert_table '  ( ' datetime ' ,  ' uid ' ,  ' content ' ,  Type '  ') values  ( ' 0 ' ,  ' userid_0 ' ,  ' content_0 ' , 0 );  INSERT into ' insert_table' (' datetime ',' uid ',' content ', ' type ') VALUES (' 2',' userid_2 ',' content_2 ',2);

Change into:

insert  INTO  Span class= "hljs-string" > ' insert_table '  ( ' datetime ' ,  ' uid ' ,  ' content ' ,  Type '  ') values  ( ' 0 ' ,  ' userid_0 ' ,  ' content_0 ' , 0 ); insert  INTO  Span class= "hljs-string" > ' insert_table '  ( ' datetime ' ,  ' uid ' ,  ' content ' ,  Type '  ') values  ( ' 1 ' ,  ' userid_1 ' ,  ' content_1 ' , 1 );  INSERT into' insert_table' (' datetime ',' uid ',' content ', ' type ') VALUES (' 2',' userid_2 ',' content_2 ',2);

Because the database is inserted. The index data needs to be maintained, and unordered records increase the cost of maintaining the index. We were able to take a look at the B+tree index used by InnoDB, assuming that each time the record was inserted at the end of the index, the index was positioned very efficiently, and the index was resized less. Assuming that the inserted records are in the middle of the index, they need to be b+tree for splitting and merging. Will consume more compute resources, and the index positioning efficiency of the inserted records will decrease, and there will be frequent disk operations when the data volume is large.

The following provides a comparison of the performance of random data with sequential data, each of which is recorded as 100, 1000, 10,000, 100,000, 1 million.


Judging from the test results. The performance of this optimization method has been improved, but the improvement is not very obvious.

Performance Comprehensive test:

Here is a test for optimizing insert efficiency at the same time using the three methods above.

From the test results can be seen, the combination of data + transactions in the small amount of data, performance improvement is very obvious, when the amount of data is large (more than 10 million). Performance can drop sharply, because the amount of data at this time exceeds the capacity of Innodb_buffer, and each location index involves more disk read and write operations. Performance drops faster. The use of combined data + transaction + ordered data in the way data volume reached tens is still good, when the volume of data is large. It is convenient to locate the index of the ordered data, it does not need to read and write the disk frequently, so it can maintain high performance.

Precautions:

1. SQL statements are limited in length and must not exceed the SQL length limit in the same SQL for data merging. max_allowed_packetconfiguration can be changed, the default is 1M, test changes to 8M.

2, the transaction needs to control the size. A transaction that is too large can affect the efficiency of the operation. MySQL has innodb_log_buffer_size configuration items that exceed this value to brush the INNODB data to disk. At this point, efficiency is reduced.

So it's a good idea to commit a transaction before the data reaches this value.

MySQL Bulk SQL insert various performance optimizations

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.