A solution case for deadlocks caused by the load data statement in Mysql _mysql

Source: Internet
Author: User
Tags set set first row

A deadlock on an online project report that briefly describes the cause, treatment scheme, and related points.

1. Background

This is a project similar to data analysis, which is imported into a InnoDB table entirely through the load data statement. To facilitate the description, the table structure is simplified as follows:

Create table TB (ID int primary key auto_increment, C int not null) Engine=innodb;

The statement that imports the data corresponds to the

Load data infile ' data1.csv ' into table TB;

Load data infile ' data2.csv ' into table TB;

Cat data1.csv

1

2

3


cat data2.csv

12 100


The evidence for the deadlock is that the deadlock information is seen in the latest detected deadlock section of show engine InnoDB status, which is simplified as follows:

Description

As seen from the table above, transaction 1 waits for a lock on a row. and transaction 2 holds the lock on this line, but waits for the table's self-locking (AUTO_INC), which is judged as a deadlock, and the transaction rolls back.
Transaction 1 is not written here, but it can be inferred that transaction 1 holds the self-locking of the table (otherwise it is not a deadlock).

2. Background knowledge 1:auto_inc Lock and its options

In the InnoDB table, if there is a self added field, a table-level lock is maintained, which is called a self-locking. Each time you insert new data, or the UPDATE statement modifies this field, you will need to obtain this lock

Since a transaction may contain more than one statement, and not all statements are related to the self-added field, InnoDB makes a special deal, and the Self-Locking is released immediately after the end of a statement. The reason is special processing, because the ordinary locks are released after the end of the transaction.

If a table has a self added field, an INSERT statement does not specify the value of the field, or if it is specified as NULL, InnoDB assigns it the value of the current auto_increment, then auto_increment plus 1.

One of the parameters associated with this self-locking is Innodb_autoinc_lock_mode. The default value is 1, which is optional 0,1,2.

Let's take a look at this. When this value is set to 0 o'clock, a table with a self-increasing field inserts the behavior of a row of data:

1) Apply for Auto_inc lock

2) Get the current AUTO_INCREMNT value N, give auto_increment plus 1

3) Perform the insert operation and fill n into the new row corresponding field

4) Release Auto_inc lock

We see this process, although innodb in order to reduce the size of the lock, the execution of the statement is released immediately, but the lock is still too large-it includes the time of the insert operation. This results in two INSERT statements, and there is no way to parallel them.

Without this parameter, the behavior is the same as setting 0, and the 0 option is left compatible.

It is easy to think of the time set to 1, should be 3 and 4) swap. But this article still has to discuss the case for 0, because our premise is the load statement, and the LOAD statement is inserted into multiple lines of statements (including insert ... select ...). ), even if set to 1 is useless, will degenerate to 0 mode.

3, Background Knowledge 2:load data statement of master-slave behavior

Why does inserting a multiline statement even if the Innodb_autoinc_lock_mode is set to 1, will also use the 0 pattern?

The main reason is still for master-slave consistency. Imagine binlog_format= ' statement ', where a load data statement is directly recorded in the Binlog of the main library as the statement itself, and how to replay from the library:

1 Send the file of load data to Slave,slave to save the file in the temp directory.

2 The Load data statement is also executed at slave.

There is a problem: How does slave guarantee that the load data statement has the same self-increasing ID field as master?

To solve this problem, there is also a set set INSERT_ID command in the Binlog of the main library that indicates the ID value of the first row inserted by this load data statement. So slave executes the set set INSERT_ID statement before executing load data to ensure that the execution results are identical to the main library.

The above mechanism can guarantee the premise that the master-slave data is consistent: The increment ID value generated by the load data statement on the master-slave library must be continuous.

4, Background knowledge 1+2: Analysis

Back to the difference between patterns 0 and 1, we see that if the Auto_inc lock is acquired before the entire statement begins, it is released after the statement is finished, which guarantees that the entire statement generates the ID continuous-pattern 0.

For 1, each time the next value is released, after inserting data, if you need to reapply, then discontinuous.

This is why, even if set to 1, it degrades to 0 for multiline operations.

So far we know the reason for this deadlock is that the two load data statements will not only access the same records, but also access the same auto_inc lock, causing each other to wait.

Not until this is over, because we know that although two threads accessing two locks may cause deadlocks, there is another condition for deadlocks, which is related to the order of the application. Since Auto_inc is a table lock, whoever gets it first will block the execution of the other same table's load data, and why is there a lock waiting on a record?

5, the background knowledge 3:auto_inc lock time

We mentioned earlier that each time it involved inserting new data, it would require locking the auto_inc and listing the process. But this process is for setting column values that need to be innodb from the value of the data. Another scenario is that the value of the column has already been specified in the statement.

For example, for this table, execute INSERT INTO TB values (9,100). At this time the value of ID is already clear is 9, although do not need to take a value to fill, but after inserting this line may need to change the value of the auto_increment (if the original is <10, it should be changed to 10), so this lock still can not save. The process becomes:

1) inserting data

2 If the failure of the process end

3 If successful, apply for Auto_inc lock

4 Call Set_max ... function, modify auto_increment if necessary

5 Release the Auto_inc lock at the end of the statement.

6. Why Modify Auto_inc Order

What are the benefits of this adjustment? The main purpose is to reduce unnecessary lock access. If an error occurs during the insertion of data, such as other fields causing duplicate KEY error, you do not have to access the Auto_inc lock.

7. The deadlock process is reproduced

You must emphasize that "at the end of the statement." So let's look at a process in which each row has specified the load data statement for the self added column value (that is, the case of this article):

1 Insert the first piece of data

2) apply for Auto_inc lock

3) Insert the second article

4 Apply for Auto_inc lock (because already is own, direct success)

5) ... Insert all remaining rows

6) Release Auto_inc lock.

So this process is simply described as: Insert the first line, request a auto_inc lock, and insert all remaining lines before releasing.

As we mentioned earlier, the record lock that you might need to access when you insert the first piece of data is not released until the entire transaction is finished.

With these background knowledge, let's reproduce the deadlock process.

You can see the trigger condition is still more stringent, especially session2 to use the Session1 lock to the record lock. Note that the same record lock does not mean that the primary key value must be the same because of the INNODB internal representation of the record.

8, Solution 1: Remove unnecessary auto_increment fields

In this business, because all of the data is entered through the load, and the value of the self added field has been specified, this auto increment property is not required.

One less, you can't lock it.

9, Solution 2: Mandatory mode 1

We talked about innodb_autoinc_lock_mode the optional values for this parameter are 0, 1, 2. When set to 1, the load data statement is degraded to mode 0. However, if set to 2, then mode 1 will be used anyway.

We said earlier that using mode 1 would cause the Binlog_format ID value generated by load data to be discontinuous, which would cause the master to never be consistent at 1 o'clock, so the premise of setting to 2 is Binlog_format is row.

Setting Innodb_autoinc_lock_mode to 2 is safe when binlog_format= ' row '.

If allowed, scenario 2 is lighter than scenario 1 and does not require modification of data and table structures.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.