MySQL Bulk SQL Insert performance optimizations

Source: Internet
Author: User

For some systems with large data volume, the problem of database is not only inefficient, but also the data storage time is long. Especially like a reporting system, the time spent on data import every day can be as long as a few hours or more than 10 hours. Therefore, it makes sense to optimize database insert performance.
After some performance tests on MySQL InnoDB, we found some ways to improve the efficiency of insert, for reference.

1. One SQL statement inserts multiple data.
Common INSERT statements such as:

1234 INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 0 ', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1);

Modified to:

12 insert INTO ' insert_table ' (' datetime ', ' content ', ' Type ')      values ( ' 0 ', ' userid_0 ', ' content_0 ', ( ' 1 ', ' userid_1 ', ' content_1 ',

The modified insert operation can improve the efficiency of inserting the program. The main reason for the high efficiency of the second type of SQL is that the combined log volume (MySQL binlog and InnoDB transaction log) is reduced, reducing the amount of data and frequency of the log brush disk, thus improving efficiency. By merging SQL statements, you can reduce the number of SQL statement parsing and reduce the IO of network transmissions.
Here are some test comparison data, the import and conversion of a single piece of data into a SQL statement to import, Test 100, 1000, 10,000 data records respectively.

2. Insert processing in a transaction.
To modify the INSERT into:

1234567 START TRANSACTION;INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 0 ', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); ...COMMIT;

Using transactions can improve the efficiency of data insertion, because when an insert operation occurs, a transaction is created inside MySQL to perform a true insert processing operation within the transaction. By using transactions, you can reduce the cost of creating transactions, and all inserts are executed before committing.
Test comparisons are also provided here, where transactions are not used with transactions in the case of 100, 1000, 10,000 records.

3. Data is inserted in order.
An orderly insertion of data refers to an orderly arrangement of inserted records on a primary key, such as a primary key for a record:

123456 INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 0 ', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2);

Modified to:

123456 INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 0 ', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2);

Because the index data needs to be maintained when the database is inserted, unordered records increase the cost of maintaining the index. We can refer to the B+tree index used by InnoDB, if each insert record is at the end of the index, the index is positioned very efficiently, and the index is smaller, and if the inserted record is in the middle of the index, it needs to be b+tree for splitting and merging, which will consume more compute resources. The index positioning efficiency of the inserted record will be decreased, and there will be frequent disk operation when the data volume is large.
The following is a comparison of the performance of random data with sequential data, recorded as 100, 1000, 10,000, 100,000, 1 million, respectively.

From the test results, the performance of the optimization method has improved, but the improvement is not very obvious.

Comprehensive Performance Testing:
The test for Insert efficiency optimization using the above three methods is provided here.

From the test results can be seen, the method of merging data + transactions in the small amount of data, performance improvement is very obvious, when the data is large (more than 10 million), performance will drop sharply, because the amount of data at this time exceeds the capacity of Innodb_buffer, each location index involves more disk read and write operations, Performance drops faster. And the use of combined data + transaction + ordered data in the data volume up to tens performance is still good, in a large amount of data, ordered data index positioning is more convenient, do not need to read and write the disk frequently, so can maintain high performance.

Precautions:
1. SQL statements are limited in length, and in the same SQL the data merge must not exceed the SQL length limit, the Max_allowed_packet configuration can be modified by default is 1M, the test is modified to 8M.
2. Transactions need to be controlled in size, and transactions are too large to affect the efficiency of execution. MySQL has innodb_log_buffer_size configuration items, more than this value will INNODB data to the disk, then the efficiency will be reduced. So it's a good idea to commit the transaction before the data reaches this value.

Transferred from: http://tech.uc.cn/?p=634

MySQL Bulk SQL Insert performance optimizations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.