Mysql implementation of BULK INSERT to optimize performance tutorial _mysql

Source: Internet
Author: User
Tags bulk insert

For some systems with large data volume, the problem of database is not only inefficient, but also is the data storage time is long. Like a reporting system, it can take up to a few hours or more than 10 hours a day to spend on a data import. Therefore, it makes sense to optimize the performance of database inserts.
After some performance tests on MySQL InnoDB, we found some ways to improve the insert efficiency for reference.

1. An SQL statement inserts more than one piece of data.
Common INSERT statements such as:

INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0);
INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1);

Modified into:

INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0), (' 1 ', ' use Rid_1 ', ' content_1 ', 1);

The modified insert operation can improve the insertion efficiency of the program. The main reason for the efficiency of the second SQL is that the combined log volume (the MySQL binlog and InnoDB transaction log) is reduced, reducing the amount and frequency of the log brush disk, thereby increasing efficiency. By merging SQL statements, you can also reduce the number of SQL statement resolutions and reduce the IO of network transmissions.
This provides some test comparison data, respectively, the import and transformation of a single data into a SQL statement to import, respectively, test 100, 1000, 10,000 data records.

2. Insert processing in a transaction.
Modify the insertion into:

START TRANSACTION;
INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0);
INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1);
...
COMMIT;

3. Data insertion in an orderly manner.
an orderly insertion of data means that the Insert record is ordered on the primary key, for example, DateTime is the primary key of the record:

INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1);
INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0);
INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2);

Modified into:

INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0);
INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1);
INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') 
  VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2);

Because of the need to maintain index data when a database is inserted, unordered records increase the cost of maintaining the index. We can refer to the B+tree index used by InnoDB, and if each insertion record is at the end of the index, the index is highly efficient and less indexed, and if the inserted record is in the middle of the index, it needs to be b+tree for splitting and merging, which consumes more computational resources. and the indexing efficiency of the inserted records will be reduced, and the data volume will have frequent disk operation.
The following provides performance comparisons between random and sequential data, with records of 100, 1000, 10,000, 100,000, and 1 million respectively.

Judging from the test results, the performance of the optimization method is improved, but the improvement is not obvious.

Performance Synthesis Test:
This provides a test that uses the above three methods to optimize the insert efficiency.

As you can see from the test results, the method of merging data + transaction in small amount of data, the performance increase is obvious, the data volume is larger (more than 10 million), performance will drop dramatically, because the data volume exceeds the capacity of Innodb_buffer, each location index involves more disk read and write operations, Performance drops faster. And the use of merged Data + transaction + ordered data in the amount of data to achieve tens above performance is still good, in large amount of data, orderly data index positioning is more convenient, do not need to frequently read and write to disk, so can maintain high performance.

Precautions:
1. The SQL statement is limited in length and must not exceed the SQL length limit in the same SQL for data merging, which can be modified by the Max_allowed_packet configuration, by default 1M and modified to 8M at the time of the test.
2. Transactions need to be controlled in size, and transactions are too large to affect the efficiency of execution. MySQL has innodb_log_buffer_size configuration items, exceeding this value will brush the INNODB data to disk, at this time, the efficiency will be reduced. So it's a good idea to commit a transaction before the data reaches this value.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.