MySQL Bulk SQL Insert performance optimizations

Source: Internet
Author: User

For some systems with large data volume, the problem of database is not only inefficient, but also the data storage time is long. Especially like a reporting system, it can take up to several hours to spend on data import every day. Therefore, it makes sense to optimize database insert performance.

There are a lot of cattle on the network, there will always be some means to improve the efficiency of INSERT, you share with me:

1. One SQL statement inserts multiple data.
Most of the INSERT statements we use are an insert, such as:

Insert INTO ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0); Insert INT O ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1);

Now let's modify it to:

    INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') of           VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0), (' 1 ', ' use Rid_1 ', ' content_1 ', 1);  

"Data comparison"

Below is the online cattle to provide some test comparison data, respectively, a single data import and conversion into a SQL statement to import, Test 100, 1000, 10,000 data records respectively. By comparison, it can be found that the modified insert operation can improve the efficiency of inserting the program.


"Cause analysis"
The main reason for the high efficiency of the second type of SQL is that the combined log volume (MySQL binlog and InnoDB transaction log) is reduced, reducing the amount of data and frequency of the log brush disk, thus improving efficiency. By merging SQL statements, you can reduce the number of SQL statement parsing and reduce the IO of network transmissions.

2. Insert processing in a transaction.
Transactions are also very common when manipulating data. Now let's change the INSERT statement above to:

START transaction;insert into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0); INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1);... COMMIT;

" data comparison "

Test comparisons are also provided here, where transactions are not used with transactions in the case of 100, 1000, 10,000 records. It is also found that the efficiency of inserting data is improved.

"Cause analysis"

This is because when an insert operation occurs, MySQL internally establishes a transaction within the transaction for a true insert processing operation. By using transactions, you can reduce the cost of creating transactions, and all inserts are executed before committing.


3. Data is inserted in order.

An orderly insertion of data refers to an orderly arrangement of inserted records on a primary key, such as a primary key for a record:

The original INSERT statement is as follows:

Insert INTO ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); Insert INT O ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0); insert INTO ' Insert_ Table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2);

modified to:

Insert INTO ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0); Insert INT O ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); insert INTO ' Insert_ Table ' (' datetime ', ' uid ', ' content ', ' type ')     VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2);

"Data comparison"

The following is a comparison of the performance of random data with sequential data, recorded as 100, 1000, 10,000, 100,000, 1 million, respectively. from the test results, the performance of the optimization method has improved, but the improvement is not very obvious.


"Cause analysis"
Because the index data needs to be maintained when the database is inserted, unordered records increase the cost of maintaining the index. We can refer to the B+tree index used by InnoDB, if each insert record is at the end of the index, the index is positioned very efficiently, and the index is smaller, and if the inserted record is in the middle of the index, it needs to be b+tree for splitting and merging, which will consume more compute resources. The index positioning efficiency of the inserted record will be decreased, and there will be frequent disk operation when the data volume is large.


Comprehensive Performance Testing :
The test for Insert efficiency optimization using the above three methods is provided here.


From the test results can be seen, the method of merging data + transactions in the small amount of data, performance improvement is very obvious, when the data is large (more than 10 million), performance will drop sharply, because the amount of data at this time exceeds the capacity of Innodb_buffer, each location index involves more disk read and write operations, Performance drops faster. And the use of combined data + transaction + ordered data in the data volume up to tens performance is still good, in a large amount of data, ordered data index positioning is more convenient, do not need to read and write the disk frequently, so can maintain high performance.

Precautions:

        • SQL statements are limited in length, and in the same SQL the data merge must not exceed the SQL length limit, which can be modified by the Max_allowed_packet configuration, which is 1M by default and modified to 8M when testing.
        • Transactions need to be controlled in size, and transactions are too large to affect the efficiency of execution. MySQL has innodb_log_buffer_size configuration items, more than this value will INNODB data to the disk, then the efficiency will be reduced. So it's a good idea to commit the transaction before the data reaches this value.

MySQL Bulk SQL Insert performance optimizations

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.