Using batch insert to troubleshoot insert throughput issues for MySQL

Source: Internet
Author: User

A very simple and easy-to-use approach has recently been used to solve the problem of an insert throughput on the business, which is summarized here.

First we make it clear that the insert throughput does not actually refer to the IPs (insert per second), but rather the RPS (effect rows per second).

Next we say batch insert, in fact, as the name implies, is bulk INSERT. This optimization idea is very basic, the most famous application in MySQL is group commit.

Simply put SQL A into SQL B

Insert  into Table values ($valuesinsertintotablevalues ($  Values), ($values) ... ($values);

Let's take a look at what this unusual, simple change will look like.

Test environment confessed: Single ID table structure, 10w int values, local use socket connection MySQL server, using shell single process test.

First, let's take a look at the time required to insert 10w int values into the test table using SQL A, which takes 1777 seconds.

Real     29m37.090suser    9m11.705ssys     5m0.762s

Then, let's look at the time it takes to insert 10w int values into the test table using SQL B (insert in ten values each time), taking 53 seconds

Real     0m53.871suser    0m19.455ssys     0m6.285s

This is nearly 33 times times the time of Ascension. This part of the performance improvement is due to the following points:

1, every time and the MySQL server to establish a connection through a variety of initialization, authorization, syntax parsing and so on a number of steps, need to consume a certain amount of resources.

2, update a values and update n values time is basically consistent. (A comparison of the insert single values core insert values is time consuming)

Single values:
+------------------------------+----------+|Status|Duration|+------------------------------+----------+|Starting| 0.000056 ||CheckingPermissions | 0.000010 ||Opening tables| 0.000034 ||System Lock| 0.000010 ||Init| 0.000011 || Update | 0.000061 ||Waiting forQuery Cache Lock| 0.000003 || Update | 0.000015 || End | 0.000003 ||QueryEnd | 0.000053 ||Closing tables| 0.000009 ||Freeing items| 0.000021 ||Logging Slow Query| 0.000002 ||Cleaning up| 0.000003 |+------------------------------+----------+
Ten values:+------------------------------+----------+|Status|Duration|+------------------------------+----------+|Starting| 0.000061 ||CheckingPermissions | 0.000008 ||Opening tables| 0.000027 ||System Lock| 0.000008 ||Init| 0.000012 || Update | 0.000073 ||Waiting forQuery Cache Lock| 0.000003 || Update | 0.000010 || End | 0.000008 ||QueryEnd | 0.000053 ||Closing tables| 0.000010 ||Freeing items| 0.000021 ||Logging Slow Query| 0.000002 ||Cleaning up| 0.000003 |+------------------------------+----------+

But is it more efficient to accumulate more values? The answer is naturally negative, and any optimization scheme will not be purely linear and will definitely have a inflection point under some conditions.

We tested according to different values, 1, 10, 50, 100, 200, 500, 1000, 5000, 10000, respectively.

As we can see, with the increase of the values number, the time-consuming is drastically reduced, from 1777s to 53s, and then the addition of the values number does not change much until the values number exceeds 200 and the last 10,000 values Number takes up to 2 minutes.

From what we can see, as the values numbers, the QPS (Blue Line) first soared, then dropped, and eventually less than 1/s. The RPS (green Line) jumps to a high level as the increase increases, and then decreases gradually with more than 5,000 values.

Also, most crucially, the highestpeak of the QPS and RPS is not under the same values number, which means that the highest level of QPS does not represent the highest throughput of the insert .

In my simple test scenario, the most appropriate value for the values is 50, compared to the single values, which takes less time to 97%and 36 times times the insert throughput.

This value is related to the table structure and the type and size of the field. It needs to be tested according to different scenarios, but generally speaking, 50-100 is a more recommended consideration.

As for how this is done, as long as the front-end is written to join the queue can be combined according to 2 conditions

    • After accumulating to n values number in the queue to write to the database, the advantage is the highest performance, the disadvantage is that time is not controllable, it is possible to wait for Nth to n seconds, when the business is not accepted.
    • Queue in the accumulation of 1s, how many write how many, the advantage is time controllable, the disadvantage is that the number of values is not possible, high concurrency, perhaps 1s has accumulated thousands of values.
    • The best solution is actually 2 conditions at the same time, that is, the number of validation, but also to perform time validation, regardless of the conditions to trigger subsequent write database operations.

Summarize:

1. Use batch insert to increase the throughput of the insert.

2, the superimposed values number needs to be tested according to the actual situation.

3. Use both number and time to control the valve value.

Record values with simple tests:

Valuesnum

Time

QPS

Rows

1

1777

56

56

10

53

188

1886

49

40

2040

50

19

2000

51

10

1960

57

3< /span>

1000

2

1666

5000

10000

133

0.07

751

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.