MySQL Bulk SQL Insert performance optimizations
Posted by Kiki TitaniumOn December 7, 2012
For some systems with large data volume, the problem of database is not only inefficient, but also the data storage time is long. Especially like a reporting system, the time spent on data import every day can be as long as a few hours or more than 10 hours. Therefore, it makes sense to optimize database insert performance. After some performance tests on MySQL InnoDB, we found some ways to improve the efficiency of insert, for reference.
1. One SQL statement inserts multiple data. Common INSERT statements such as:
MySQL <textarea class="crayon-plain print-no" style="line-height: 15px !important; overflow: hidden; font-size: 12px !important; z-index: 0; opacity: 0; -moz-tab-size: 4; -o-tab-size: 4; tab-size: 4; -webkit-tab-size: 4;" readonly="" data-settings="dblclick">insert INTO ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0); INSERT Into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1);</textarea>
1234 |
INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 0 ', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); |
Modified to:
MySQL <textarea class="crayon-plain print-no" style="line-height: 15px !important; overflow: hidden; font-size: 12px !important; z-index: 0; opacity: 0; -moz-tab-size: 4; -o-tab-size: 4; tab-size: 4; -webkit-tab-size: 4;" readonly="" data-settings="dblclick">INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0), (' 1 ', ' Userid_1 ', ' content_1 ', 1);</textarea>
12 |
insert INTO ' insert_table ' (' datetime ', ' content ', ' Type ') values ( ' 0 ', ' userid_0 ', ' content_0 ', ( ' 1 ', ' userid_1 ', ' content_1 ', |
The modified insert operation can improve the efficiency of inserting the program. The main reason for the high efficiency of the second type of SQL is that the combined log volume (MySQL binlog and InnoDB transaction log) is reduced, reducing the amount of data and frequency of the log brush disk, thus improving efficiency. By merging SQL statements, you can reduce the number of SQL statement parsing and reduce the IO of network transmissions. Here are some test comparison data, the import and conversion of a single piece of data into a SQL statement to import, Test 100, 1000, 10,000 data records respectively.
2. Insert processing in a transaction. To modify the INSERT into:
MySQL <textarea class="crayon-plain print-no" style="line-height: 15px !important; overflow: hidden; font-size: 12px !important; z-index: 0; opacity: 0; -moz-tab-size: 4; -o-tab-size: 4; tab-size: 4; -webkit-tab-size: 4;" readonly="" data-settings="dblclick">START transaction;insert into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 0 ', ' userid_0 ', ' cont Ent_0 ', 0); INSERT into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); ... COMMIT;</textarea>
1234567 |
START TRANSACTION;INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 0 ', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); ...COMMIT; |
Using transactions can improve the efficiency of data insertion, because when an insert operation occurs, a transaction is created inside MySQL to perform a true insert processing operation within the transaction. By using transactions, you can reduce the cost of creating transactions, and all inserts are executed before committing. Test comparisons are also provided here, where transactions are not used with transactions in the case of 100, 1000, 10,000 records.
3. Data is inserted in order. An orderly insertion of data refers to an orderly arrangement of inserted records on a primary key, such as a primary key for a record:
MySQL <textarea class="crayon-plain print-no" style="line-height: 15px !important; overflow: hidden; font-size: 12px !important; z-index: 0; opacity: 0; -moz-tab-size: 4; -o-tab-size: 4; tab-size: 4; -webkit-tab-size: 4;" readonly="" data-settings="dblclick">insert INTO ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); INSERT Into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0); insert INTO ' inser T_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2);</textarea>
123456 |
INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 0 ', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2); |
Modified to:
MySQL <textarea class="crayon-plain print-no" style="line-height: 15px !important; overflow: hidden; font-size: 12px !important; z-index: 0; opacity: 0; -moz-tab-size: 4; -o-tab-size: 4; tab-size: 4; -webkit-tab-size: 4;" readonly="" data-settings="dblclick">insert INTO ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 0 ', ' userid_0 ', ' Content_0 ', 0); INSERT Into ' insert_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); insert INTO ' inser T_table ' (' datetime ', ' uid ', ' content ', ' type ') VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2);</textarea>
123456 |
INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 0 ', ' userid_0 ', ' content_0 ', 0); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 1 ', ' userid_1 ', ' content_1 ', 1); INSERT into ' insert_table ' ('datetime ', ' uid ', ' content ', 'type ') VALUES (' 2 ', ' userid_2 ', ' content_2 ', 2); |
Because the index data needs to be maintained when the database is inserted, unordered records increase the cost of maintaining the index. We can refer to the B+tree index used by InnoDB, if each insert record is at the end of the index, the index is positioned very efficiently, and the index is smaller, and if the inserted record is in the middle of the index, it needs to be b+tree for splitting and merging, which will consume more compute resources. The index positioning efficiency of the inserted record will be decreased, and there will be frequent disk operation when the data volume is large. The following is a comparison of the performance of random data with sequential data, recorded as 100, 1000, 10,000, 100,000, 1 million, respectively.
From the test results, the performance of the optimization method has improved, but the improvement is not very obvious.
Performance synthesis Test: Here is a test that uses the above three methods to optimize insert efficiency.
From the test results can be seen, the method of merging data + transactions in the small amount of data, performance improvement is very obvious, when the data is large (more than 10 million), performance will drop sharply, because the amount of data at this time exceeds the capacity of Innodb_buffer, each location index involves more disk read and write operations, Performance drops faster. And the use of combined data + transaction + ordered data in the data volume up to tens performance is still good, in a large amount of data, ordered data index positioning is more convenient, do not need to read and write the disk frequently, so can maintain high performance.
Precautions: 1. SQL statements are limited in length, and in the same SQL the data merge must not exceed the SQL length limit, which can be modified by the Max_allowed_packet configuration, which is 1M by default and modified to 8M when testing. 2. Transactions need to be controlled in size, and transactions are too large to affect the efficiency of execution. MySQL has innodb_log_buffer_size configuration items, more than this value will INNODB data to the disk, then the efficiency will be reduced. So it's a good idea to commit the transaction before the data reaches this value.
Test code
Articles you may be interested in
- MySQL InnoDB log mechanism in-depth analysis
- MySQL InnoDB Database Performance Practice--varchar vs CHAR
- Optimize temporary table usage, SQL statement performance 100 times Times higher
MySQL Bulk Insert Data method(2011-12-08 11:25:42)
reproduced
Tags:mysql BULK Insert Insert parenting |
Category: Data storage |
Insert multiple data in two ways one is to write multiple INSERT statements with";"each SQL, one is multiple value of insert itself1, write multiple Insert";"Split this is very simple, the number is MySQL executes SQL Terminator, write multiple inserts, the number is to let MySQL execute several times. For example: INSERT into table (FIELD1,FIELD2,FIELD3) value ('a',"b","C"); Insert into table (FIELD1,FIELD2,FIELD3) value ('a',"b","C"); Insert into table (FIELD1,FIELD2,FIELD3) value ('a',"b","C"); Insert into table (FIELD1,FIELD2,FIELD3) value ('a',"b","C"); Insert into table (FIELD1,FIELD2,FIELD3) value ('a',"b","C");2, write multiple value insert into table (FIELD1,FIELD2,FIELD3) VALUES in insert ('a',"b","C"), ('a',"b","C"),('a',"b","C");
MySQL Bulk SQL Insert performance optimizations