Use batch insert to solve MySQL insert throughput Problems

Source: Internet
Author: User

Recently I used a very simple and easy-to-use method to solve the insert throughput problem in the business. Here I will summarize it.

 

First, let's clarify that the insert throughput is not the IPS (insert per second), but the RPS (effect rows per second ).

Next, let's talk about batch insert. In fact, as the name suggests, it is batch insert. This optimization idea is very basic. The most famous MySQL application is group commit.

Simply put, SQL a is converted into SQL B.

SQL A : insert into table values ($values);SQL B : insert into table values ($values),($values)...($values);

 

Next, let's take a look at the changes that may be caused by such abnormal and simple changes.

Test Environment Description: Single-id table structure, 10 W int values, local use socket to connect to MySQL server, use shell single process test.

First, we will see the time required to insert 1777 int values into the test table using SQL A, which takes seconds.

real    29m37.090suser    9m11.705ssys     5m0.762s

Next, let's take 53 seconds to insert int values into the test table using SQL B (insert 10 values each time ).

real    0m53.871suser    0m19.455ssys     0m6.285s

This is nearly 33 times faster. The reasons for this performance improvement are as follows:

1. Each connection to the MySQL server requires various initialization, permission authentication, Syntax Parsing, and other steps, consuming certain resources.

2. Updating a values is basically the same as updating n values. (The following compares the time consumed by the profile of insert 10 values with single values cores)

Single values:
+ Duration + ---------- + | Status | Duration | + ------------------------------ + ---------- + | starting | 0.000056 | checking permissions | 0.000010 | Opening tables | 0.000034 | System lock | 0.000010 | init | 0.000011 | update | 0.000061 | Waiting for query cache lock | 0.000003 | update | 0.000015 | end | 0.000003 | query end | 0.000053 | closing tables | 0.000009 | freeing items | 0.000021 | logging slow query | 0.000002 | cleaning up | 0.000003 | + ---------------------------- + ---------- +
10 values: + Duration + ---------- + | Status | Duration | + ------------------------------ + ---------- + | starting | 0.000061 | checking permissions | 0.000008 | Opening tables | 0.000027 | System lock | 0.000008 | init | 0.000012 | update | 0.000073 | Waiting for query cache lock | 0.000003 | update | 0.000010 | end | 0.000008 | query end | 0.000053 | closing tables | 0.000010 | freeing items | 0.000021 | logging slow query | 0.000002 | cleaning up | 0.000003 | + ---------------------------- + ---------- +

 

However, is the more values accumulate, the higher the efficiency? The answer is naturally no. No optimization solution is purely linear, and there will certainly be inflection points under certain conditions.

We perform tests based on different values numbers: 1, 10, 50, 100, 200, 500, 1000, 5000, and 10000.

As we can see, with the increase of the values number, the time consumed first drops sharply, from 1777s to 53 s, and then the increase of the values number will not change much until the values number exceeds 200, the last 10000 values number took 2 minutes.

As we can see, with the increase of values numbers, QPS (Blue Line) increases first, then decreases, and eventually falls below 1/s. The RPS (Green Line) soars to a high level with the increase, and then gradually declines with the increase. After more than 5000 values number, it begins to drop sharply.

In addition, the most important thing is that,The highest QPS peak and the highest RPS peak are not in the same values number. That is to say, the highest QPS does not mean the highest insert throughput..

In my simple test scenario, the most suitable value for values number is 50. Compared with a single values, the time consumption is reduced.97%, Increases insert Throughput36 times.

This value is related to the table structure, field type, and size. It can be obtained only after testing based on different scenarios, but generally, 50-is a relatively recommended value.

 

As for how to implement this, as long as the frontend is added to the queue when writing data, it can be merged according to two conditions.

  • Data is written into the database after accumulating n values numbers in the queue. The advantage is that the performance is the highest, and the disadvantage is that the time is uncontrollable. It may take n seconds for the nth number, at this time, the service is no longer available.
  • After a queue is accumulated for 1 s, the number of values is written. The advantage is that the time is controllable. The disadvantage is that the number of values numbers is not possible. In the case of high concurrency, thousands of values may have been accumulated for 1 s.
  • The optimal solution is that two conditions take effect at the same time, that is, the number verification and time verification are performed. No matter which condition is met, subsequent database write operations are triggered.

 

Summary:

1. Using batch insert can increase the insert throughput.

2. Test the superimposed values number based on the actual situation.

3. Use both the number and time control valve values.

 

Record values for simple tests:

ValuesNum

Time

QPS

Rows

1

1777

56

56

10

53

188

1886

50

49

40

2040

100

50

19

2000

200

51

10

1960

500

57

3

1754

1000

60

2

1666

5000

69

0.3

1449

10000

133

0.07

751

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.