MySQL deadlock caused by loaddata statement _ MySQL

Source: Internet
Author: User
This article describes how to solve the deadlock caused by the loaddata statement in MySQL. it describes some locking mechanisms in the data table of the InnoDB engine, you can refer to the deadlocks reported by the next online project to briefly explain the causes, solutions, and related points.

1. Background

This is a project similar to DATA analysis. DATA is completely imported into an InnoDB table using the load data statement. To facilitate the description, the table structure is simplified as follows:

Create table tb(id int primary key auto_increment, c int not null) engine=innodb;

The statement for data import is

Load data infile ‘data1.csv' into table tb;Load data infile ‘data2.csv' into table tb;

cat Data1.csv1 1002 1003 100Cat data2.csv10 10011 10012 100


Evidence of deadlocks is that the DEADLOCK information is displayed in the latest detected deadlock section of show engine innodb status, which is simplified to the following:

Description

As shown in the table above, transaction 1 is waiting for the lock of a row. Transaction 2 holds the lock for this row, but waits for the auto-increment lock (AUTO_INC) of the table to judge as a deadlock and rolls back the transaction.
Transaction 1 is not written here, but it can be inferred that transaction 1 holds the auto-increment lock of this table (otherwise it is not a deadlock ).

2. background knowledge 1: AUTO_INC lock and its options

In an InnoDB table, if an auto-increment field exists, a table-level lock is maintained, which is called an auto-increment lock. This lock is required each time new data is inserted or this field is modified by the update statement.

Because a transaction may contain multiple statements, not all statements are related to auto-increment fields, InnoDB makes a special process, the auto-increment lock is released immediately after a statement is completed. The reason for special processing is that common locks are released after the transaction ends.

If a table has an auto-increment field, if an insert statement does not specify the value of this field, or if it is NULL, InnoDB will assign it the current AUTO_INCREMENT value, and then add AUTO_INCREMENT to 1.

One parameter related to this auto-increment lock is innodb_autoinc_lock_mode. the default value is 1, which can be 0, 1, or 2.

Let's first look at the behavior of inserting a row of data in a table with auto-increment fields when this value is set to 0:

1) apply for the AUTO_INC lock

2) obtain the current AUTO_INCREMNT value n and add 1 to AUTO_INCREMENT.

3) execute the insert operation and enter n in the corresponding field of the newly added row.

4) release the AUTO_INC lock

We can see that, although InnoDB releases the lock immediately after the statement is executed to reduce the lock granularity, the lock is too large-it includes the insert operation time. This leads to two insert statements, but they cannot be parallel.

Before this parameter is set, the behavior is the same as that set to 0, and the option 0 is compatible.

It is easy to think that when it is set to 1, it should be 3) and 4. However, this article will discuss the case where the value is 0, because the premise is that the LOAD statement, and the LOAD statement, such statements that insert multiple rows (including insert... Select ...), It is useless even if it is set to 1, and the mode degrades to 0.

3. background knowledge 2: Master/slave behavior of the load data statement

Why should I use the 0 mode even if innodb_autoinc_lock_mode is set to 1?

The main reason is for master-slave consistency. Suppose binlog_format = 'statement '. the binlog of a load data statement in the master database is directly recorded as the statement itself. how can we replay the statement from the slave database:

1) send the files used for load data to slave and save the files in the temporary directory.

2) execute the load data statement once in slave.

In the meantime, there is a problem: how can slave ensure that the auto-increment id field of the load data statement is the same as that of the master?

To solve this problem, the binlog of the master database also has a set INSERT_ID command, indicating the auto-increment ID value of the first row inserted by the load data statement. In this way, slave executes the set SET INSERT_ID statement before executing load data to ensure that the execution result is exactly the same as that of the master database.

The premise that the above mechanism can ensure the consistency of master and slave DATA is that the auto-increment ID value generated by the load data statement on the master and slave databases must be continuous.

4. background knowledge 1 + 2: analysis

Return to the difference between the 0 and 1 modes mentioned above. we can see that if the AUTO_INC lock is obtained before the entire statement starts, it will be released after the statement ends, this ensures that the id generated by the entire statement is continuous-mode 0.

For 1, each time the next value is obtained, it is released. after data is inserted, if you need to apply again, it is not continuous.

This is why multi-row operations are degraded to 0 even if the value is set to 1.

So far, we know the cause of this deadlock is that these two load data statements not only access the same record, but also access the same AUTO_INC lock, resulting in mutual waiting.

This is not complete, because we know that although two threads access two locks may cause deadlocks, there is another condition for deadlocks, which is related to the application order. Since AUTO_INC is a table lock, no matter who gets it first, it will block the execution of load data from other tables. why is there a Lock wait on a record?

5. Background Knowledge 3: locking time of AUTO_INC

As we mentioned above, each time a new data is inserted, the AUTO_INC lock is required and the process is listed. However, this process requires you to get the auto-increment value from InnoDB to set the column value. Another case is that the column value has been specified in the statement.

For example, execute insert into tb values (9,100) for this table ). in this case, the value of id is 9. Although this parameter is not required, you may need to change the value of AUTO_INCREMENT after inserting this line (if it is <10, it should be changed to 10 ), therefore, this lock cannot be saved. The process becomes:

1) insert data

2) if the process fails, the process ends.

3) if successful, apply for the AUTO_INC lock

4) call set_max .... Function. modify AUTO_INCREMENT if necessary.

5) release the AUTO_INC lock when the statement ends.

6. why modify the AUTO_INC sequence?

What are the benefits of such adjustments? It is mainly used to reduce unnecessary lock access. If an error occurs during data insertion, such as the duplicate key error caused by other fields, you do not need to access the AUTO_INC lock.

7. recurrence of the deadlock process

The statement end time must be emphasized ". In this case, we can see the process of a load data statement with the auto-incrementing column value specified for each row (that is, in this example ):

1) insert the first data entry.

2) apply for the AUTO_INC lock

3) insert the second entry.

4) apply for the AUTO_INC lock (because it is already your own, it is directly successful)

5 )...... Insert all remaining rows

6) release the AUTO_INC lock.

This process is simply described as: Insert the first row, apply for the AUTO_INC lock, insert all the remaining rows, and then release.

As mentioned above, the record lock that may be accessed when the first data is inserted is released after the end of the entire transaction.

With the above background knowledge, let's reproduce the deadlock process.

It can be seen that the trigger conditions are still harsh, especially the record lock that session2 uses to lock session1. It should be noted that because InnoDB internally represents records, the same record lock does not mean that the primary key value must be the same.

8. Solution 1: Remove unnecessary AUTO_INCREMENT fields

In this service, because all DATA is imported through load data and the AUTO-INCREMENT field value has been specified, this AUTO) INCREMENT attribute is not required.

If one is missing, the deadlock will fail.

9. Solution 2: forced Mode 1

We mentioned earlier that the optional values of innodb_autoinc_lock_mode include 0, 1, and 2. When set to 1, the load data statement degrades to mode 0. However, if it is set to 2, Mode 1 is used in any case.

We mentioned earlier that using Mode 1 will cause the auto-increment id value generated by load data to be discontinuous, which will cause the master to be inconsistent when binlog_format is 1, so the premise of setting it to 2 is as follows, yes binlog_format is row.

When binlog_format = 'row', setting innodb_autoinc_lock_mode to 2 is safe.

If allowed, solution 2 is lighter than solution 1 and does not need to modify the data and table structure.

The above is the solution to the deadlock caused by the load data statement in MySQL _ MySQL content. For more information, please follow the PHP Chinese network (www.php1.cn )!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.