A very simple and easy-to-use approach has recently been used to solve the problem of an insert throughput on the business, which is summarized here.
First we make it clear that the insert throughput does not actually refer to the IPs (insert per second), but rather the RPS (effect rows per second).
Next we say batch insert, in fact, as the name implies, is bulk INSERT. This optimization idea is very basic, the most famous application in MySQL is group commit.
Simply put SQL A into SQL B
Insert into Table values ($valuesinsertintotablevalues ($ Values), ($values) ... ($values);
Let's take a look at what this unusual, simple change will look like.
Test environment confessed: Single ID table structure, 10w int values, local use socket connection MySQL server, using shell single process test.
First, let's take a look at the time required to insert 10w int values into the test table using SQL A, which takes 1777 seconds.
Real 29m37.090suser 9m11.705ssys 5m0.762s
Then, let's look at the time it takes to insert 10w int values into the test table using SQL B (insert in ten values each time), taking 53 seconds
Real 0m53.871suser 0m19.455ssys 0m6.285s
This is nearly 33 times times the time of Ascension. This part of the performance improvement is due to the following points:
1, every time and the MySQL server to establish a connection through a variety of initialization, authorization, syntax parsing and so on a number of steps, need to consume a certain amount of resources.
2, update a values and update n values time is basically consistent. (A comparison of the insert single values core insert values is time consuming)
Single values:
+------------------------------+----------+|Status|Duration|+------------------------------+----------+|Starting| 0.000056 ||CheckingPermissions | 0.000010 ||Opening tables| 0.000034 ||System Lock| 0.000010 ||Init| 0.000011 || Update | 0.000061 ||Waiting forQuery Cache Lock| 0.000003 || Update | 0.000015 || End | 0.000003 ||QueryEnd | 0.000053 ||Closing tables| 0.000009 ||Freeing items| 0.000021 ||Logging Slow Query| 0.000002 ||Cleaning up| 0.000003 |+------------------------------+----------+
Ten values:+------------------------------+----------+|Status|Duration|+------------------------------+----------+|Starting| 0.000061 ||CheckingPermissions | 0.000008 ||Opening tables| 0.000027 ||System Lock| 0.000008 ||Init| 0.000012 || Update | 0.000073 ||Waiting forQuery Cache Lock| 0.000003 || Update | 0.000010 || End | 0.000008 ||QueryEnd | 0.000053 ||Closing tables| 0.000010 ||Freeing items| 0.000021 ||Logging Slow Query| 0.000002 ||Cleaning up| 0.000003 |+------------------------------+----------+
But is it more efficient to accumulate more values? The answer is naturally negative, and any optimization scheme will not be purely linear and will definitely have a inflection point under some conditions.
We tested according to different values, 1, 10, 50, 100, 200, 500, 1000, 5000, 10000, respectively.
As we can see, with the increase of the values number, the time-consuming is drastically reduced, from 1777s to 53s, and then the addition of the values number does not change much until the values number exceeds 200 and the last 10,000 values Number takes up to 2 minutes.
From what we can see, as the values numbers, the QPS (Blue Line) first soared, then dropped, and eventually less than 1/s. The RPS (green Line) jumps to a high level as the increase increases, and then decreases gradually with more than 5,000 values.
Also, most crucially, the highestpeak of the QPS and RPS is not under the same values number, which means that the highest level of QPS does not represent the highest throughput of the insert .
In my simple test scenario, the most appropriate value for the values is 50, compared to the single values, which takes less time to 97%and 36 times times the insert throughput.
This value is related to the table structure and the type and size of the field. It needs to be tested according to different scenarios, but generally speaking, 50-100 is a more recommended consideration.
As for how this is done, as long as the front-end is written to join the queue can be combined according to 2 conditions
- After accumulating to n values number in the queue to write to the database, the advantage is the highest performance, the disadvantage is that time is not controllable, it is possible to wait for Nth to n seconds, when the business is not accepted.
- Queue in the accumulation of 1s, how many write how many, the advantage is time controllable, the disadvantage is that the number of values is not possible, high concurrency, perhaps 1s has accumulated thousands of values.
- The best solution is actually 2 conditions at the same time, that is, the number of validation, but also to perform time validation, regardless of the conditions to trigger subsequent write database operations.
Summarize:
1. Use batch insert to increase the throughput of the insert.
2, the superimposed values number needs to be tested according to the actual situation.
3. Use both number and time to control the valve value.
Record values with simple tests:
Valuesnum |
Time |
QPS |
Rows |
1 |
1777 |
56 |
56 |
10 |
53 |
188 |
1886 |
| |
49 |
40 |
2040 |
| |
50 |
19 |
2000 |
| |
51 |
10 |
1960 |
| |
57 |
3< /span> |
|
| 1000 |
|
2 |
1666 |
| 5000 |
|
|
|
10000 |
133 |
0.07 |
751 |