MYSQL development performance research-batch data insertion optimization method, mysql batch
I. What problems have we encountered?
In standard SQL, we usually write the following SQL insert statement.
INSERT INTO TBL_TEST (id) VALUES(1);
Obviously, this method is also feasible in MYSQL. However, when we need to insert data in batches, such statements may cause performance problems. For example, if 100000 data records need to be inserted, 100000 insert statements are required. Each statement must be submitted to the relational engine for parsing and optimization, then you can get to the storage engine for real insertion.
Because of the performance bottleneck, MYSQL official documentation also mentions the use of batch INSERT, that is, inserting multiple values in an INSERT statement. That is,
INSERT INTO TBL_TEST (id) VALUES (1), (2), (3)
This practice can also accelerate batch INSERT, which is not difficult to understand because fewer INSERT statements are submitted to the server and less network load, the most important thing is that the time for parsing and optimization seems to increase, but the actually useful data rows are actually more solid. Therefore, the overall performance is improved. According to some opinions on the Internet, this method can be increased by dozens of times.
However, I have also seen several other methods on the Internet, such as preprocessing SQL statements and batch submission. So what is the performance of these methods? This article will compare these methods.
Ii. Comparison environment and Methods
My environment is hard-pressed, basically a lagging virtual machine. Only 2 cores with 6 GB memory. The operating system is SUSI Linux, and the MYSQL version is 5.6.15.
It can be seen that the performance of this machine has caused my TPS to be very low, so all the data below is meaningless, but the trend is different. It shows the performance trend of the entire insert.
Due to the business characteristics, the table we use is very large, with a total of 195 fields, and it is fully written (all fields are filled, including varchar) with a size slightly smaller than 4 kb. Generally speaking, the size of a record is also 3 kb.
Based on our actual experience, we are sure that the performance can be greatly improved by submitting a large number of INSERT statements in a transaction. Therefore, all the tests below are based on the practice of committing every 5000 records inserted.
It should be noted that all the tests below are carried out through the mysql c api and use the INNODB Storage engine.
Iii. Comparison methods
Ideal Type Test (1) -- Method Comparison
Objective: To find the most appropriate insert mechanism in ideal conditions
Key Methods:
1. Each entry/thread is inserted in the primary key order
2. Compare different insertion Methods
3. Compare the impact of different numbers of inputs/threads on Inserts
* "Normal method" refers to the case where INSERT inserts only one VALUE.
* "Pre-processing SQL" refers to the use of pre-processing MYSQL C APIs.
* "Multi-Table SQL statements (10 records)" INSERT 10 records using an INSERT statement. Why 10? The subsequent verification tells us that the performance is the highest.
Conclusion: Obviously, from the trend of the three methods, the multi-Table value SQL (10) method is the most efficient.
Ideal test (2) -- comparison of the number of SQL statements with multiple table values
Obviously, when the data volume increases, each INSERT statement inserts 10 records most efficiently.
Ideal test (iii) -- Comparison of connections
Conclusion: The performance is the highest when the number of CPU cores is doubled.
General test -- Test Based on our business volume
Objective: is the best insertion mechanism suitable for normal transactions?
Key Methods:
1. simulate production data (each record is about 3 KB)
2. insertion of primary keys in disorder in each thread
Obviously, if the primary key is inserted in disorder, the performance will decrease linearly. This is actually consistent with the internal implementation principle of INNODB. But it is still certain that the multi-Table value SQL statement (10 rows) is the best.
Stress Testing
Objective: is the best insertion mechanism suitable for extreme transaction situations?
Key Methods:
1. Fill each field in the Data row (each record is about 4 KB)
2. insertion of primary keys in disorder in each thread
The results are similar to our previous rule, with extreme performance degradation. In addition, it is verified that as the record increases (it may already exceed the size of a page, after all, there are slot and page head information occupying space), there will be page split and other phenomena, and the performance will decrease.
Iv. Conclusion
Based on the tests above and our understanding of INNODB, we can draw the following conclusions.
• Adopt an ordered primary key policy (for example, auto-incrementing primary keys, or modify the business logic to make the inserted records as sequential primary keys as much as possible)
• It is most appropriate to insert multiple-value tables (10 rows ).
• It is appropriate to control the number of processes/threads to 2 times the number of CPUs
V. Appendix
I found that there are few complete examples of SQL statements for MYSQL preprocessing on the Internet. Here is a simple example.
-- Create table statement create table tbl_test (pri_key varchar (30), nor_char char (30), max_num DECIMAL (8, 0), long_num DECIMAL (12, 0), rec_upd_ts TIMESTAMP );
C code
#include <string.h>#include <iostream>#include <mysql.h>#include <sys/time.h>#include <sstream>#include <vector> using namespace std; #define STRING_LEN 30 char pri_key [STRING_LEN]= "123456"; char nor_char [STRING_LEN]= "abcabc"; char rec_upd_ts [STRING_LEN]= "NOW()"; bool SubTimeval(timeval &result, timeval &begin, timeval &end){ if ( begin.tv_sec>end.tv_sec ) return false; if ( (begin.tv_sec == end.tv_sec) && (begin.tv_usec > end.tv_usec) ) return false; result.tv_sec = ( end.tv_sec - begin.tv_sec ); result.tv_usec = ( end.tv_usec - begin.tv_usec ); if (result.tv_usec<0) { result.tv_sec--; result.tv_usec+=1000000;} return true;} int main(int argc, char ** argv){ INT32 ret = 0; char errmsg[200] = {0}; int sqlCode = 0; timeval tBegin, tEnd, tDiff; const char* precompile_statment2 = "INSERT INTO `tbl_test`( pri_key, nor_char, max_num, long_num, rec_upd_ts) VALUES(?, ?, ?, ?, ?)"; MYSQL conn; mysql_init(&conn); if (mysql_real_connect(&conn, "127.0.0.1", "dba", "abcdefg", "TESTDB", 3306, NULL, 0) == NULL) { fprintf(stderr, " mysql_real_connect, 2 failed\n"); exit(0); } MYSQL_STMT *stmt = mysql_stmt_init(&conn); if (!stmt) { fprintf(stderr, " mysql_stmt_init, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } if (mysql_stmt_prepare(stmt, precompile_statment2, strlen(precompile_statment2))) { fprintf(stderr, " mysql_stmt_prepare, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } int i = 0; int max_num = 3; const int FIELD_NUM = 5; while (i < max_num) { //MYSQL_BIND bind[196] = {0}; MYSQL_BIND bind[FIELD_NUM]; memset(bind, 0, FIELD_NUM * sizeof(MYSQL_BIND)); unsigned long str_length = strlen(pri_key); bind[0].buffer_type = MYSQL_TYPE_STRING; bind[0].buffer = (char *)pri_key; bind[0].buffer_length = STRING_LEN; bind[0].is_null = 0; bind[0].length = &str_length; unsigned long str_length_nor = strlen(nor_char); bind[1].buffer_type = MYSQL_TYPE_STRING; bind[1].buffer = (char *)nor_char; bind[1].buffer_length = STRING_LEN; bind[1].is_null = 0; bind[1].length = &str_length_nor; bind[2].buffer_type = MYSQL_TYPE_LONG; bind[2].buffer = (char*)&max_num; bind[2].is_null = 0; bind[2].length = 0; bind[3].buffer_type = MYSQL_TYPE_LONG; bind[3].buffer = (char*)&max_num; bind[3].is_null = 0; bind[3].length = 0; MYSQL_TIME ts; ts.year= 2002; ts.month= 02; ts.day= 03; ts.hour= 10; ts.minute= 45; ts.second= 20; unsigned long str_length_time = strlen(rec_upd_ts); bind[4].buffer_type = MYSQL_TYPE_TIMESTAMP; bind[4].buffer = (char *)&ts; bind[4].is_null = 0; bind[4].length = 0; if (mysql_stmt_bind_param(stmt, bind)) { fprintf(stderr, " mysql_stmt_bind_param, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } cout << "before execute\n"; if (mysql_stmt_execute(stmt)) { fprintf(stderr, " mysql_stmt_execute, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } cout << "after execute\n"; i++; } mysql_commit(&conn); mysql_stmt_close(stmt); return 0; }
The above is the optimization method for batch data insertion in mysql. We suggest you read the previous articles from the customer's home.