MySQL Plus Lock Processing analysis Original: http://hedengcheng.com/?p=771
1 background 1
1.1 Mvcc:snapshot Read vs current Read 2
1.2 Cluster Index: Clustered Indexes 3
1.3 2pl:two-phase Locking 3
1.4 Isolation Level 4
2 A simple SQL Lock implementation Analysis 5
2.1 Combination one: ID primary key +RC 6
2.2 Combination two: ID unique index +RC 6
2.3 Combination three: ID non-unique index +RC 7
2.4 Combination four: ID no index +RC 8
2.5 combination five: ID primary key +RR 9
2.6 Combination six: ID unique index +RR 9
2.7 Combination seven: ID non-unique index +RR 9
2.8 Combination Eight: ID no index +RR 11
2.9 Combination nine: Serializable 12
3 a complex SQL
4 dead-lock principle and analysis
5 Summary
- Background
Mysql/innodb's locking analysis has been a difficult topic. In the course of my work, my colleagues have often consulted on this issue. At the same time, the micro-blog also often receive the MySQL lock-related private messages, let me help solve some of the deadlock problem. This article, prepares to mysql/innodb the lock problem, launches the more in-depth analysis and the discussion, mainly is introduces one kind of thought, uses this idea, gets any SQL statement, can complete analysis This statement will add what lock? What are the risks of use? Even a deadlock scenario on the analysis line to understand the cause of the deadlock.
Note: MySQL is a database system that supports the plug-in storage engine. All the descriptions below are based on the InnoDB storage engine, and the performance of the other engines will be significantly different.
- Mvcc:snapshot Read vs Current Read
The MySQL InnoDB storage engine, which implements the Concurrency Control Protocol--MVCC (Multi-version Concurrency Control) based on multiple versions (note: Compared to MVCC, is lock-based concurrency control, lock-based Concurrency Control). MVCC The greatest benefits, I believe it is also familiar: Read no lock, read and write no conflict. In an OLTP application that reads and writes less, read-write conflicts are important, greatly increasing the concurrency of the system, which is why at this stage, almost all RDBMS support the MVCC.
In MVCC concurrency control, read operations can be divided into two categories: snapshot read (snapshot read) and current read. Snapshot reads, read the visible version of the record (possibly a historical version), without locking. The current read, read is the latest version of the record, and the current read returned records, will be added to the lock, to ensure that other transactions will no longer concurrently modify this record.
In a system that supports MVCC concurrency control, which read operations are snapshot reads? Which operations are currently read? Take MySQL InnoDB as an example:
- Snapshot read: a simple select operation, which belongs to the snapshot read, without locking. (Of course, there are exceptions, which are analyzed below)
- SELECT * from table where?;
- Current read: Special read operation, insert/update/delete operation, belongs to the current read, need to lock.
- SELECT * FROM table where? lock in Share mode;
- SELECT * FROM table where? for update;
- Insert into table values (...);
- Update table set? Where?;
- Delete from table where?;
All of the above statements belong to the current read, reading the latest version of the record. Also, after reading, it is necessary to ensure that other concurrent transactions cannot modify the current record and lock the read record. In addition to the first statement, the read record plus S lock (shared lock), the other operation, plus X lock (exclusive lock).
Why insert/Update/delete operations are categorized as current read? You can take a look at the following update operation in the database execution flow:
, you can see the specific process for an update operation. When update SQL is sent to MySQL, MySQL server reads the first record that satisfies the condition according to the Where condition, and then the InnoDB engine returns the first record and locks (current read). After the MySQL server receives this lock-up record, an update request is initiated and the record is updated. A record operation is completed, and the next record is read until there are no records that meet the criteria. Therefore, within the update operation, a current read is included. Similarly, the delete operation is the same. The insert operation is slightly different, simply, that the insert operation may trigger a conflict check for a unique key, and a current read.
Note : According to the interaction, for a current read SQL statement, InnoDB and MySQL server interaction is a one, so the lock is also a one-piece. Lock a record that satisfies a condition, return to MySQL Server, do some DML operations, and then read the next lock until the read is complete.
- Cluster Index: Clustered Indexes
The data organization of the INNODB storage engine is a clustered index table: A complete record, stored in the primary key index, and a primary key index that allows you to retrieve all the columns of the record. For the organization of clustered index tables, refer to the official MySQL documentation: Clustered and secondary Indexes. This article assumes that the reader has a certain understanding of this, will not do a specific introduction. Next part, the primary key index/clustered index two names, there will be some mixing, hope readers know.
- 2pl:two-phase Locking
One of the principles of traditional RDBMS locking is 2PL (two-phase Lock): two-phase Locking. Relatively speaking, 2PL is easier to understand, said the lock operation is divided into two stages: lock phase and unlock phase, and ensure that the lock phase and the unlocking phase does not intersect. Below, still take MySQL as an example, to briefly look at the implementation of 2PL in MySQL.
As can be seen, 2PL is the locking/unlock divided into two completely disjoint stages. Lock phase: Lock only, do not put lock. Unlocking phase: Lock only, no lock.
- Isolation level
Isolation levels: Isolation level is also a key feature of the RDBMS. Believe that the database has some knowledge of friends, for 4 isolation levels: Read uncommited,read committed,repeatable read,serializable, have a deep understanding. This article is not intended to discuss how the 4 isolation levels are defined in database theory, but rather to introduce how MYSQL/INNODB defines these 4 isolation levels.
4 isolation levels defined by MYSQL/INNODB:
- Read uncommited
Uncommitted records can be read. This isolation level, which is not used, is ignored.
- Read Committed (RC)
Snapshot read ignored, this article does not consider.
For the current read, theRC isolation level guarantees a read-to-record locking (record Lock), which has a phantom-read phenomenon.
- Repeatable Read (RR)
Snapshot read ignored, this article does not consider.
For the current read, theRR isolation level guarantees that the read locking (record Lock), while ensuring that the read range is locked, the new record satisfies the query condition can not be inserted (GAP Lock), there is no phantom read phenomenon.
- Serializable
From MVCC concurrency control is degraded to lock-based concurrency control. Do not distinguish between the snapshot read and the current read, all the read operation is the current read, read the read-write lock (S lock), write lock (x lock).
Serializable isolation level, read and write conflicts, so the degree of concurrency drops sharply, not recommended under MYSQL/INNODB.
- Analysis of a simple SQL lock implementation
After the introduction of some background knowledge, this article will choose a few representative examples, in detail to analyze the MySQL encryption lock processing. Of course, from the simplest example. Often have a friend sent me a SQL, and then asked me, this SQL plus what lock? Just like the following two simple SQL, what locks do they add?
- SQL1:select * from t1 where id = 10;
- SQL2:delete from t1 where id = 10;
What is the answer to this question? One of the answers I can imagine is:
- SQL1: no lock. Because MySQL is using multiple versions of concurrency control, read no lock.
- SQL2: write lock for record with ID = 10 (go to primary key index).
Is this the right answer? Can't say. That may be correct, or it may be wrong, known conditions are insufficient, and there is no answer to this question. If I were to answer this question, I would have to know some of the following prerequisites, but the answer I can give is different. What prerequisites are missing to answer this question?
- Prerequisite One:is the ID column a primary key?
- premise Two: what is the isolation level of the current system?
- Prerequisite Three:ID column If it is not a primary key, then the ID column is indexed?
- Premise Four:if there is a level two index on the ID column, is this index a unique index?
- Premise Five: What are the two SQL execution plans? Index scan? Full table scan?
Without these prerequisites, just given a SQL, and then asked the SQL will add what locks, are very amateur performance. And when these questions have a definite answer, a given SQL will lock up and be at a glance. Next, I'll combine the answers to these questions and then analyze each combination in a simple, easy-to-order sequence, and which locks will be added to the corresponding SQL?
Note: Following these combinations, I made a premise hypothesis, that is, when there is an index, the execution plan will definitely choose to use the index for Filtering (index Scan). But the reality will be much more complicated, the real execution plan, or the need to depend on the MySQL output.
- combination one:ID column is primary key, RC isolation Level
- combination two:ID column is a level two unique index, RC isolation Level
- combination three:ID column is a level two non-unique index, RC isolation Level
- combination four:no index on the ID column, RC isolation level
- Combination five:ID column is primary key, RR isolation level
- Combination SIX:ID column is a two-level unique index, RR isolation level
- combination seven:ID column is a level two non-unique index, RR isolation level
- combination Eight:no index on ID column, RR isolation level
- Combination nine: Serializable isolation Level
The permutations are not exhaustive yet, but they seem to be a lot. Is it really necessary to be so complicated? In fact, to analyze the lock, it is necessary to be so complex. But from another point of view, as long as you select a combination, SQL needs to add which locks, in fact, it is OK. Next, let's analyze each of these 9 combinations of SQL lock policy.
Note: In the previous eight combinations, that is, under the RC,RR isolation level, the sql1:select operation is not locked, using a snapshot read, so in the following discussion is ignored, the main discussion sql2:delete operation locking.
- Combination one: ID primary key +RC
This combination is the simplest and most easy to analyze combination. The ID is the primary key, Read committed isolation level, given sql:delete from t1 where id = 10; Only the record with ID = 10 on the primary key will be added to the X lock. As shown in the following:
Conclusion:when the ID is the primary key, this SQL only needs to add X lock on the id=10 record.
- Combination two: ID unique index +RC
This combination, the ID is not a primary key, but a unique two-level index key value. Then, under the RC isolation level, delete from t1 where id = 10; What kind of lock do I need to add? See:
In this combination, the ID is a unique index, and the primary key is the name column. At this time, the condition of the lock is different because of the combination one. Because the ID is a unique index, the DELETE statement chooses the index of the ID column to filter the Where condition, and when a record of id=10 is found, the id=10 index record on the unique index is first added to the X lock, and the primary key index is returned according to the Name column read ( Clustered index), and the primary key index entry for name = ' d ' on the clustered index is added to the X lock. Why are records on a clustered index also locked? Imagine if a concurrent SQL is updated with the primary key index: UPDATE t1 Set id = + WHERE name = ' d '; At this point, if the DELETE statement does not lock the record on the primary key index, the concurrency update will not perceive the presence of the DELETE statement, violating the constraint that the update/delete on the same record needs to be executed serially.
Conclusion : If the ID column is a unique column, there is a unique index on it. Then SQL needs to add two X locks, a record corresponding to ID = 10 on the ID unique index, and another lock corresponding to the record of [name= ' d ', id=10] on the clustered index.
- Combination three: ID non-unique index +RC
Relative to the combination of one or two, the combination of three has changed, the isolation level is still RC unchanged, but the constraint on the ID column is reduced, the ID column is no longer unique, only a normal index. Assume that the delete from t1 where id = 10; statement, which still selects the index on the ID column to filter where conditions, what locks are held at this time? See also:
According to this diagram, you can see that, first, the ID column index, which satisfies the ID = 10 query criteria, is locked. At the same time, the records on the primary key index corresponding to these records are also added to the lock. The only difference between the two is that the combination of the two has a maximum of one record that satisfies the equivalent query, and the combination will lock all records that satisfy the query criteria.
Conclusion : If there is a non-unique index on the ID column, all records corresponding to the SQL query criteria will be locked. At the same time, the records on the primary key index are also locked.
- Combination four: ID no index +RC
This is a very special case, relative to the previous three combinations. There is no index on the ID column, where id = 10; the filter is not filtered by the index, so it can only be filtered by a full table scan. What locks does SQL add to this combination? Or in other words, what locks will be added when the full table is scanned? There are a number of answers to this: Some say X locks are added to the table, and some say that the record with the selected ID = 10 will be added to the cluster index. So what's the reality? Please see:
Because there is no index on the ID column, you can only go through the clustered index and scan it all. As you can see, there are two records that satisfy the deletion criteria, but all records on the clustered index are added with an X lock. The x lock is all added regardless of whether the record satisfies the condition. Neither a table lock nor a row lock on a record that satisfies a condition.
Someone might ask? Why not just lock on the record that satisfies the condition? This is due to the implementation of MySQL. If a condition cannot be quickly filtered through the index, then the storage engine plane will lock all records back and then be filtered by the MySQL server layer. So they locked up all the records.
Note: In the actual implementation, MySQL has some improvements, in the MySQL server filter condition, the discovery is not satisfied, will call the Unlock_row method, the record does not meet the conditions of the lock (violating the 2PL constraints). This ensures that only the locks that satisfy the condition record will be held at the end, but the lock operation of each record cannot be omitted.
Conclusion: If there is no index on the ID column, SQL will filter through the full scan of the clustered index, because filtering is done at the MySQL server level. Therefore, each record, whether or not it satisfies the condition, is added with an X lock. However, for the sake of efficiency, MySQL is optimized, for the record that does not satisfy the condition, will put the lock after judgment, finally hold the lock on the record that satisfies the condition, but the locking/release lock action on the record that does not satisfy the condition will not be omitted. At the same time, optimization also violates the 2PL constraint.
- Combination five: ID primary key +RR
The above four combinations are the locking behavior under the Read Committed isolation level, and the next four combinations are the locking behavior under the REPEATABLE READ isolation level.
Combination five, the ID column is the primary key column, repeatable read isolation level, for delete from t1 where id = 10; This SQL, locking and combination one: [ID primary key, Read Committed] is consistent.
- Combination six: ID unique index +RR
Similar to the combination of five, the combination of the six lock, and the combination of two: [ID unique index, Read Committed] consistent. Two x lock, id a unique index satisfies the record of the condition on the previous, corresponding clustered index on one of the records.
- Combination Seven: ID non-unique index +RR
Remember the difference between the four types of isolation levels mentioned earlier in MySQL? The RC isolation level allows for Phantom reads, while the RR isolation level does not allow for Phantom reads. But in the combination of the five, combination of six, the locking behavior is the same as the lock behavior under RC. So how do you prevent phantom reading under RR isolation level? The answer to the question is revealed in group seven.
Combination seven, repeatable read isolation level, ID has a non-unique index, execute delete from t1 where id = 10; What if the index on the ID column is selected for conditional filtering and the last lock behavior? Also look at the following picture:
This figure, relative to the combination of three: [ID column on the non-unique lock, Read Committed] looks the same, actually there is a big difference. The biggest difference is that there is a gap lock in this picture, and the gap lock does not seem to be added to the record, it is like the position between the loading of two records, what is the gap lock?
In fact, this extra gap lock, is the RR isolation level, relative to the RC isolation level, there will be no magic read the key. Indeed, the gap locks the position of the lock, not the record itself, but the gap between two records. The so-called Phantom Reading, is the same transaction, do two times the current read (for example: SELECT * from t1 where id = ten for update;), then these two times the current read return is exactly the same record (the number of records consistent, the record itself is consistent), the second time the current read, no more than the first Returns more records (illusions).
How to guarantee two times the current read returns a consistent record, which is required between the first current read and the second current read, the other transaction will not insert a new record to satisfy the condition and commit. In order to realize this function, Gap lock was born.
, there are locations where you can insert a new item that satisfies the condition (id = 10), and given the order of the B + Tree index, the item that satisfies the condition must be stored continuously. Record [6,c] before, the id=10 record is not inserted, [6,c] and [10,b] can be inserted between [aa];[10,b] and [], you can insert a new [10,d] and so on; [10,bb],[10,c] and [10,d] can be inserted to meet the conditions of the [ 10,e],[10,z], and [11,f] does not insert a record that satisfies the condition. Therefore, in order to ensure [6,c] and [10,b] between [10,b] and [10,d], [10,d] and [11,f] will not insert new conditions to meet the record, MySQL chose to use the gap lock, the three gap to lock up.
Insert operations, such as insert [10,AA], are first positioned between [6,c] and [10,b], and before inserting, the gap is checked to see if it has been locked, and insert cannot insert a record if it is locked. Thus, by the current reading of the first pass, not only the record that satisfies the condition is locked (x lock), similar to the combination of the three. At the same time, add 3 gap Lock, will be inserted to meet the condition record of the 3 gap to lock, to ensure that the subsequent insert can not insert the new id=10 record, but also to eliminate the same transaction of the second current reading, the occurrence of illusion.
Heart Friends See here, you can ask: since the protection of Phantom Reading, need to rely on gap lock, why combination of five, combination six, also RR isolation level, but do not need to add gap lock it?
First of all, this is a good question. Secondly, to answer this question is also very simple. The purpose of the gap lock is to prevent two times of the same transaction from being read at present, and the illusion of reading occurs. In combination five, the ID is the primary key, and the combination six, the ID is a unique key, is guaranteed to be unique. An equivalent query can only return one record at most, and the new record with the same value will not be inserted in the new one, thus avoiding the use of the gap lock. In fact, there is a more in-depth question for this issue: if the combination of five, combination six, for the sql:select * from t1 where id = ten for update; The first query, did not find the record satisfies the query criteria, then the gap lock can also be omitted? This question is left to everyone to think about.
Conclusion: Repeatable Read Isolation level, there is a non-unique index on the ID column that corresponds to sql:delete from t1 where id = 10; First, through the ID index to locate the first to satisfy the query criteria record, add record x lock, add gap on Gap Lock, and then add the primary key clustered index record x lock, then return; then read the next one, repeat. Until the first record that does not meet the criteria [11,f] is reached, the X lock is not required, but the gap lock is still required, and the end is returned.
- Combination eight: ID no index +RR
Combination eight, repeatable the last case under the Read isolation level, there is no index on the ID column. At this point sql:delete from t1 where id = 10; There are no other paths to choose from, only full table scans. The final lock case, as shown in:
, this is a very scary phenomenon. First, all records on the clustered index are added with an X lock. Second, the gap (GAP) between each record of the cluster index is also added to the gap lock. This sample table, only 6 records, altogether requires 6 record locks, 7 gap locks. Imagine, if there are 10 million records on the table?
In this case, on this table, in addition to the non-lock snapshot degrees, any other lock of concurrent SQL, can not be executed, cannot be updated, cannot be deleted, cannot be inserted, the whole table is locked dead.
Of course, with the combination of four: [ID no index, Read Committed] similar, in this case, MySQL also did some optimizations, is called Semi-consistent Read. Semi-consistent read is on, MySQL will lock the record in advance for records that do not meet the query criteria. For the use case above, all record locks are released except for the record [d,10],[g,10], and no gap locks are added. Semi-consistent read how to trigger: either the Read Committed isolation level or the REPEATABLE read isolation level with the Innodb_locks_unsafe_for_binlog parameter set. A more detailed introduction to Semi-consistent read, can refer to my previous blog: Mysql+innodb semi-consitent read principle and implementation analysis.
Conclusion: at the REPEATABLE read isolation level, if the current read of a full table scan is performed, all records in the table are locked and all gaps within the clustered index are locked, eliminating all concurrent update/delete/insert operations. Of course, it is also possible to mitigate the locking overhead and concurrency effects by triggering semi-consistent read, but Semi-consistent read itself also poses other problems that are not recommended.
- Combination nine: Serializable
For the simple SQL mentioned earlier, the last case: Serializable isolation level. For sql2:delete from t1 where id = 10; , the Serializable isolation level is exactly the same as the repeatable read isolation level, so no introduction is made.
Serializable isolation level, which affects sql1:select * from t1 where id = 10; This SQL, under the RC,RR isolation level, is a snapshot read without locking. However, at the Serializable isolation level, SQL1 reads the lock, which means that the snapshot read is no longer present, and MVCC concurrency control is downgraded to lock-based CC.
conclusion: in Mysql/innodb, the so-called read does not lock, and does not apply to all cases, but to the isolation level related. Serializable isolation level, read without lock is no longer set up, all read operations, are currently read.
- A complex SQL
Written here, in fact, the implementation of the encryption of MySQL has also been introduced in 8899. As long as the above analysis of the idea, most of the SQL, can be analyzed to add which locks. And here, let's look at a slightly more complex SQL that illustrates another logic for MySQL locking. The SQL use cases are as follows:
What locks will be added to the SQL in? It is assumed that the lock condition under the Read Committed isolation level at the REPEATABLE read isolation level is left to the reader for analysis. At the same time, let's say that SQL is Idx_t1_pu indexed.
Before we can analyze the SQL lock-up situation in detail, we need to have a knowledge reserve, that is, how to split the Where condition in a SQL? Specific introduction, it is recommended to read one of my previous article: SQL in the Where condition, in the database extraction and application analysis. Here, I give a direct analysis of the results:
- Index key:pubtime > 1 and Puptime < 20. This condition is used to determine the query scope of SQL on the Idx_t1_pu index.
- Index Filter:userid = ' hdc '. This condition, which can be filtered on the Idx_t1_pu index, is not part of index Key.
- Table Filter:comment is not NULL. This condition, which cannot be filtered on the Idx_t1_pu index, can be filtered only on the clustered index.
After analyzing the composition of the SQL where condition, take a look at the SQL lock-up condition (RR isolation level) as shown in:
It can be seen that, under the repeatable read isolation level, the range determined by index key is added to the gap lock; The index filter locks the given condition (UserID = ' hdc ') when filtering, depending on the version of MySQL, in MySQL Before version 5.6, index Condition pushdown (ICP) was not supported, so the index filter was filtered at the MySQL server layer, and the index Condition Pushdown was supported after 5.6, then filtered on index. If the ICP is not supported, do not satisfy the index filter record, also need to add record x lock, if the ICP is supported, does not satisfy the index filter record, no need to record X lock (in the figure, with the red arrow marked X lock, whether to add, depending on whether the ICP is supported); Filter is filtered at the MySQL server level after reading it in the clustered index, so an X lock is also required on the clustered index. Finally, we select a record that satisfies the condition [8,hdc,d,5,good], but the number of locks is much larger than the number of records satisfying the condition.
Conclusion: under the REPEATABLE read isolation level, for a complex SQL, you first need to extract its where condition. Index key to determine the scope, need to add gap lock; Index filter filter, depending on whether the MySQL version supports ICP, if the ICP is supported, the index filter does not meet the record, no X lock, otherwise need x lock; Table Filter filter condition, The x lock is required whether or not it is satisfied.
- The principle and analysis of deadlock
The previous section of this article has basically covered all the lock rules for MYSQL/INNODB. In-depth understanding of how MySQL locks, there are two more important roles:
- You can write a SQL that does not have a deadlock based on the rules of the MySQL lock.
- Can be based on the rules of the lock in MySQL, locating the cause of the deadlock on the outlet;
Below, take a look at two examples of deadlocks (one is two sessions of two SQL to generate a deadlock, the other is a two session of SQL, resulting in a deadlock):
The two deadlock cases above. The first is very well understood, and is also the most common deadlock, each transaction executes two SQL, each holding a lock, and then adding another lock, resulting in a deadlock.
The second use case, although there is only one statement per session, still produces a deadlock. To analyze this deadlock, you must first use the MySQL lock rule mentioned earlier in this article. For session 1, starting from the name index, the read [HDC, 1],[HDC, 6] satisfies the condition, not only the record X lock on the name index, but also the record X lock on the clustered index, the lock order is first [1,hdc,100], and then [6,hdc,10]. and session 2, starting from the Pubtime index, [10,6],[100,1] are satisfied with the filter conditions, the same will be added to the cluster index record x lock, the lock order is [6,hdc,10], after [1,hdc,100]. Found no, with session 1 of the lock order is the opposite, if the two session happens to hold the first lock, the request to add a second lock, the deadlock occurred.
Conclusion: the occurrence or not of the deadlock is not the number of SQL statements in the transaction, the key to deadlock is : two (or more) session Lock Order is inconsistent. Using the above mentioned above, the analysis of MySQL each SQL statement locking rules, analysis of each statement of the lock order, and then check the existence of multiple concurrent SQL in reverse order to lock the case, you can analyze a variety of potential deadlock situation, you can also analyze the cause of the deadlock occurred on the line.
- Summarize
Write here, this article also came to an conclusion, do a simple summary, to do the full mastery of mysql/innodb lock rules, and even any other database lock rules, need to have some of the following knowledge points:
- Understand some basic theoretical knowledge of the database: the storage format of the data (Heap organization table vs clustered index Table), Concurrency Control Protocol (MVCC vs lock-based CC), two-phase Locking; isolation level definition for the database (isolation levels);
- Understand the execution plan of SQL itself (primary key scan vs unique key scan vs range scan vs full table scan);
- Understand some implementation details of the database itself (filter extraction; Index Condition pushdown;semi-consistent Read);
- Understand the cause of the deadlock and the method of analysis (the lock sequence is inconsistent; analyze the lock order of each SQL)
With these knowledge points, coupled with the appropriate combat experience, full control of the MYSQL/INNODB lock rules, when not a cinch.
Go MySQL Plus lock processing analysis