Objective:
We all know the nature of the transaction, in order to maintain these properties, especially the consistency and isolation of the database, the use of locks is generally used in this way. At the same time, the database is a high concurrent application, there will be a large number of concurrent access, if the lock excessive, will greatly reduce the concurrent processing capacity. So for the lock processing, it can be said that the database is the essence of transaction processing. Here through the analysis of MySQL InnoDB engine lock mechanism, to give readers a better understanding of the transaction in the database in the end what to do.
Block or two-block locks at a time?
because there is a large number of concurrent access, in order to prevent deadlocks, the general application of a blocking method is recommended, that is, in the beginning of the method, already know in advance what data will be used, and then all locked, after the method is run, and then unlock all. This is an effective way to avoid looping deadlocks, but not in a database, because the database does not know what data will be used at the beginning of the transaction.
The database follows the two-segment locking protocol, which divides the transaction into two phases, the lock phase and the lock phase (so it is called a two-stage lock).
Lock stage: During this stage, the lock operation can be performed. To request and obtain an S lock (shared lock, other transactions can continue to add a shared lock, but not to lock it) before reading any data, apply for and obtain an X lock (exclusive lock, no other transaction will be able to acquire any locks) before the write operation. If the lock is unsuccessful, the transaction enters the wait state until lock succeeds before continuing.
Unlock phase: When a transaction releases a blockade, the transaction enters the unlock phase, which can only be unlocked at this stage and cannot be locked again.
Transaction lock/Unlock Processing
Begin
INSERT into Test ..... Insert the corresponding lock
Update Test Set ... Add the lock corresponding to the update
Delete from Test .... Add the delete corresponding lock
Commit When a transaction commits, the lock for INSERT, UPDATE, delete is also released
Although deadlock cannot be avoided in this way, the two-segment locking protocol ensures that concurrent scheduling of transactions is serialized (serialization is important, especially when data recovery and backup).
Lock mode in a transaction
Four isolation levels for transactions
in the database operation, the transaction isolation level is proposed in order to effectively guarantee the correctness of concurrent read data. Our database locks also exist to build these isolation levels.
Isolation level Dirty Read (Dirty Read) non-repeatable read (Nonrepeatable Read) Phantom Read (Phantom Read)
- Uncommitted reads (READ UNCOMMITTED) may be possible
- Committed read (Read Committed) could not possibly
- Repeatable reads (REPEATABLE Read) cannot possibly be
- Serializable (Serializable) impossible impossible impossible
Uncommitted read (READ UNCOMMITTED): Allow dirty reads, that is, data that may be read to uncommitted transaction modifications in other sessions
Commit read (Read Committed): Only data that has been submitted can be read. Most databases, such as Oracle, default to this level (no repetition)
Repeatable read (repeated read): Repeatable read. Queries within the same transaction are always at the beginning of the transaction, InnoDB the default level. In the SQL standard, the isolation level eliminates the need for repeatable reads, but there are also phantom reads
Serial Read (Serializable): Fully serialized read, each read need to obtain table-level shared lock, read and write each other will block
Read uncommitted This level, the database is generally not used, and any operation is not locked, here is not discussed.
Types of locks in MySQL
There are many types of locks in MySQL, there are common table locks and row locks, also has the newly added metadata lock and so on, the table lock is to an entire table lock, although may divide into reads the lock and writes the lock, but is locks the entire table, will cause the concurrency ability to descend, is usually does the DDL processing use.
Row lock is the lock data row, this method is more complicated, but because only locked limited data, no restrictions on other data, so concurrency is strong, MySQL is generally using row locks to handle concurrent transactions. The main discussion here is the row lock.
Read Committed (reading submissions)
at the RC level, data is read without locking, but data writes, modifies, and deletes are required to be locked. The effect is as follows
Mysql> Show CREATE TABLE Class_teacher \g\
table:class_teacher
create table:create table ' Class_teacher ' (
' id ' int (one) not null auto_increment,
' class_name ' varchar (MB) COLLATE utf8mb4_unicode_ci NOT null,
' teacher_id ' int (one) not NULL,
PRIMARY key (' id '),
key ' idx_teacher_id ' (' teacher_id ')
engine=innodb auto_ Increment=5 DEFAULT charset=utf8mb4 Collate=utf8mb4_unicode_ci
Mysql> select * from Class_teacher;
+----+--------------+------------+
| id | class_name | teacher_id
| +----+--------------+------------+
| 1 | 1 |
| 3 | Class One | 2 |
| 4 | Second Class | 2 |
+----+--------------+------------+
Since MySQL's innodb defaults to the RR level used, we first turn the session into an RC level and set the Binlog mode
SET session Transaction Isolation level read Committed;
SET session Binlog_format = ' ROW ';
(or a mixed)
Transaction a Business B
Begin Begin
Update Class_teacher set Class_name= ' Second class ' where teacher_id=1; Update Class_teacher set Class_name= ' first Class 33 ' where teacher_id=1;
ERROR 1205 (HY000): Lock wait timeout exceeded; Try restarting transaction
In order to prevent the modification conflict in the concurrency process, the MySQL in transaction a locks the data row of the teacher_id=1 and does not commit (releases the lock), then transaction B can not get the row lock, wait until timeout.
At this point we should note that teacher_id is indexed, and if there is no index of class_name? Update Class_teacher set teacher_id=3 where class_name = ' First Class ';
Then MySQL will lock all rows of data for the entire table. This sounds a little weird, but when SQL runs, MySQL doesn't know which data rows are class_name = ' Junior class ' (no index), and if a condition can't be filtered through the index quickly, the storage engine level will lock all the records and then return it by MySQL Server layer for filtering.
But in the actual use of the process, MySQL made some improvements in the MySQL server filter conditions, found not satisfied, will call the Unlock_row method, the record does not meet the conditions to release the lock (violation of the two-paragraph lock protocol constraints). This ensures that only the lock that satisfies the condition record is held at the end, but the lock operation of each record cannot be omitted. It can be seen that even MySQL, in order to efficiency is also a violation of the specification. (see "High-performance MySQL" Chinese third edition p181)
The same applies to MySQL's default isolation level RR. So for a large data table to do batch modification, if you can not use the appropriate index, MySQL server filtering data is particularly slow, it will appear although there is no modification of some rows of data, but they are still locked in the phenomenon.
Repeatable Read (Reread)
This is the default isolation level for InnoDB in MySQL. Let's take a "read" and "write" two modules to explain.
Read
to read is to be reread, to be reread. Multiple instances of a transaction when reading data concurrently, they see the same data row, a bit abstract, and we look at the effect.
The display of RC (non-stressed) mode
Transaction a Business B
Select id,class_name,teacher_id from Class_teacher where teacher_id=1;
ID class_name teacher_id
1 Second Class 1
2 Grade 1
update class_teacher set class_name= ' first 33 classes ' where id=1;
commit;
Select id,class_name,teacher_id from Class_teacher where teacher_id=1;
ID class_name teacher_id
1 First 33 Class 1
2 Junior Class 1
Read the transaction B modified data, and the results of the first query is not the same, is not reread.
After transaction B modifies the data submitted by id=1, transaction A has the same query, which is not the same as the previous one, which is not accented (the result of the re-read is different). This is likely to bring some problems, so let's take a look at the performance of MySQL at the RR level:
Business A Affairs B business C
Begin;
Begin;
Begin;
Select id,class_name,teacher_id from Class_teacher where teacher_id=1;
ID class_name teacher_id
1 Second Class 1
2 Grade 1
update class_teacher set class_name= ' first 33 classes ' where id=1;
commit;
INSERT into class_teacher values (null, ' first 33 classes ', 1);
commit;
Select id,class_name,teacher_id from Class_teacher where teacher_id=1;
ID class_name teacher_id
1 Second Class 1
2 Junior Class 1
Data that has not been read to Transaction B modification is repeatable as it was read for the first time by SQL.
The newly added data for transaction C was not read.
We note that when teacher_id=1, transaction a reads first, the id=1 data is modified in transaction B, and after the commit, transaction a reads the second time exactly the same data as the first. So it's stressed. So how does MySQL do it? Let's sell a few words here, we look down.
The difference between non-repeatable and phantom reading
It is true that many people are prone to confusing, repeatable and illusory. But the key to not repeating reading is update and delete, and the focus of phantom reading is insert.
If you use the lock mechanism to implement both isolation levels, in repeatable reads, the SQL is locked for the first time after it is read, and other transactions cannot modify the data to enable repeatable reads. But this method does not lock the insert data, so when transaction A has previously read the data, or has modified all the data, transaction B can still insert data submissions, then transaction a will find that there is no more than one piece of data before, this is the Phantom read, not through the row lock to avoid. Need to serializable isolation level, read with read lock, write with write lock, read lock and write lock mutually exclusive, do this can effectively avoid the problem such as Phantom Read, can't repeat read, dirty read, but will greatly reduce the concurrency ability of the database.
Therefore, the most important difference between reading and reading is how to solve the problems caused by the locking mechanism.
The above is to use pessimistic locking mechanism to deal with these two problems, but MySQL, ORACLE, PostgreSQL and other mature databases, for performance reasons, are using optimistic locking theory based on MVCC (multiple version concurrency control) to avoid these two problems.
Pessimistic lock and optimistic lock
Pessimistic lock
as its name implies, it is conservative in the way that data is modified by the outside world (including other transactions of the system, as well as transactions from external systems), so that the data is locked during the entire process of processing. Pessimistic lock implementation, often rely on the database to provide the lock mechanism (and only the database layer to provide the lock mechanism to truly ensure the exclusive access to data, otherwise, even in this system to achieve the lock mechanism, can not guarantee that the external system will not modify the data).
In the case of pessimistic lock, in order to ensure the isolation of the transaction, it is necessary to lock read. Locks are added when reading data, and other transactions cannot modify the data. When you modify the delete data, you also lock it, and other transactions cannot read the data.
Optimistic lock
in the case of pessimistic locks, the optimistic locking mechanism adopts a more relaxed locking mechanism. Pessimistic lock in most cases rely on the database lock mechanism to achieve the maximum degree of exclusivity to ensure the operation. But then comes the massive cost of database performance, especially for long transactions, which are often unsustainable.
And the optimistic lock mechanism solves this problem to some extent. Optimistic locks, most of which are based on data Versioning (version) recording mechanism implementations. What is the data version? That is, adding a version identity to the data, in a version solution based on a database table, typically by adding a "version" field to the database table. When you read out the data, read the version number together, and then when you update it, add one to this version number. At this point, the version data for the submitted data is compared to the current version information of the corresponding record in the database table, and is updated if the submitted data version number is greater than the current version number of the database table, otherwise it is considered to be an expired data.
To illustrate, the implementation of MVCC has no fixed specifications, each database will have a different implementation, here is a discussion of InnoDB's MVCC.
The realization of MVCC in the InnoDB of MySQL
in InnoDB, two additional hidden values are added after each row of data to implement the MVCC, which records when the row of data is created, and another records when the line of data expires (or is deleted). In practice, the store is not the time, but the version number of the transaction, and each time a new transaction is opened, the transaction's version number is incremented. Under the repeatable reads transaction ISOLATION level:
- When select, reads the Create version number <= the current transaction version number, deletes the version number to be empty or > The current transaction version number.
- When insert, save the current transaction version number as the creation version number of the row
- Delete, save the current transaction version number as the deleted version number of the row
- When update, insert a new record, save the current transaction version number as the row creation version number, and save the current transaction version number to the previously deleted row
- Through MVCC, although each row of records requires additional storage space, more line checking and some additional maintenance work, but can reduce the use of locks, most read operation without locking, read data operation is very simple, performance is very good, and can ensure that only read to meet the standard line, and only lock the necessary lines.
Whether we learn from the database textbook or see it on the web, most of the four isolation levels of the above transaction mean that the RR level is repeatable, but it cannot resolve the Phantom reading, and only at the serializable level can you resolve the Phantom reading. So I added a transaction C to show the effect. Adding a teacher_id=1 in transaction c the COMMIT,RR level should have a phantom read, and transaction A will read the new data in transaction C when querying teacher_id=1 data. But after the test found that in MySQL does not exist in this case, after transaction C commits, transaction A will still not read this data. Visible in the MySQL RR level, is to solve the problem of phantom reading. See figure below
Read the problem solved, according to the definition of MVCC, concurrent submission of data will conflict, then how to solve the conflict? Let's take a look at the InnoDB RR level for write data processing.
The difference between "read" and "read"
readers may wonder if the isolation level of a transaction is actually a definition of read data, but here it is torn down into reading and writing two modules to explain. This is mainly because the read in MySQL, and the transaction isolation level in the reading, is not the same.
We also see that, at the RR level, through the MVCC mechanism, while making the data repeatable, the data we read may be historical data, not timely data, not the database's current data! This is likely to be problematic in businesses that are particularly sensitive to the timeliness of data.
For this way of reading historical data, we call it the snapshot read (snapshot read), and the way the current version of the database is read, called current read. Obviously, in the MVCC:
Snapshot read: Is Select
Current read: Special read operation, insert/update/delete operation, belong to current read, processing is current data, need to lock.
SELECT * FROM table where? lock in share mode;
SELECT * FROM table where? for update;
Insert;
Update;
Delete
The isolation level of a transaction is actually defined as the current level of reading, and MySQL improves concurrency by reducing lock processing (including waiting for other locks), and introduces the concept of snapshot reading, which makes the select lock-free. and update, insert these "current read", you need another module to solve.
Write ("current read")
Although only the requirements for reading data are defined in the isolation level of a transaction, this can be said to be a requirement for writing data. The "read" above is actually a snapshot of the reading, and the word "write" here is currently read.
In order to solve the problem of phantom reading in the current reading, the MySQL transaction uses the Next-key lock.
Next-key Lock
Next-key Lock is a row lock and Gap (Gap Lock) of the merger, line Lock has been introduced above, the next gap Gap lock.
Row locks can prevent data conflicts when data modifications are committed on different transaction versions. But how to avoid other transactions inserting data is a problem. We can look at the RR level and the RC level comparison.
RC level:
Transaction a Business B
Begin;
Begin;
Select id,class_name,teacher_id from Class_teacher where teacher_id=30;
ID class_name teacher_id
2 Second Class
update class_teacher set class_name= ' four classes ' where teacher_id=30;
INSERT into class_teacher values (NULL, ' Second class ');
commit;
Select id,class_name,teacher_id from Class_teacher where teacher_id=30;
ID class_name teacher_id
2 grade four class
10 grade Two
RR level:
transaction a transaction b
begin;
Begin;
Select id,class_name,teacher_id from Class_teacher where teacher_id=30;
ID class_name teacher_id
2 Second Class
update class_teacher set class_name= ' four classes ' where teacher_id=30;
INSERT into class_teacher values (NULL, ' Second class ');
waiting ..... Select id,class_name,teacher_id from Class_teacher where teacher_id=30;
ID class_name teacher_id
2 grade four class
commit;
After transaction acommit, insert execution of transaction B.
By comparison we can find that in the RC level, transaction a modifies all teacher_id=30 data, but when the transaction binsert into the new data, transaction a discovers an odd line of teacher_id=30 data and is not modified by the previous UPDATE statement , this is the "current read" of the Phantom Reading.
In the RR level, transaction a locks up after update, and transaction B cannot insert new data so that transaction A is consistent with the data read before and after the update, avoiding Phantom reads. This lock is the gap lock.
This is how MySQL is implemented:
In Class_teacher This table, teacher_id is an index, then it maintains a set of B + Tree data relations, in order to simplify, we use the list structure to express (actually is a tree structure, but the principle is the same)
As shown in the figure, InnoDB uses a clustered index, and teacher_id, as a two-level index, maintains a tree structure of index fields and primary key IDs (shown here in the form of a linked list) and keeps the order.
InnoDB the data into several intervals
(Negative infinity, 5],
(5,30],
(30,positive infinity);
update class_teacher set class_name= ' four classes ' where teacher_id=30;
Not only with the row lock, the corresponding data line is locked, but also in both sides of the interval, (5,30) and (30,positive Infinity), have joined the Gap lock. This way, transaction B cannot insert new data in this two interval.
Limited by this implementation, InnoDB often locks the interval that does not require a lock. As shown below:
Business A Affairs B business C
Begin Begin Begin;
Select id,class_name,teacher_id from Class_teacher;
ID class_name teacher_id
1
Grade 5 2 Second Class
update class_teacher set class_name= ' First class ' where teacher_id= ;
INSERT into class_teacher values (null, ' first 35 classes ', ten);
waiting ..... INSERT into class_teacher values (null, ' first 35 classes ',);
Commit After transaction a commits, this statement inserts a successful commit;
Commit
Update Teacher_id=20 is in the (5,30] interval, even if no data modification, INNODB will add gap lock in this interval, and other intervals will not affect the normal insertion of transaction C.
If you are using a field that does not have an index, such as update class_teacher set teacher_id=7 where Class_name= ' first Class 38 (even if there is no match to any data) ', then add the gap lock to the full table. At the same time, it can not be the same as the previous lock in the MySQL server filter automatically lifted the lock does not meet the conditions, because there is no index, then these fields are not sorted, there is no interval. Other transactions cannot insert any data unless the transaction is committed.
Row locks prevent other transactions from being modified or deleted, gap locks prevent other transactions from being added, and the Next-key locks formed by the combination of row and gap locks solve the problem of phantom reading at RR level while writing data.
Serializable
This level is very simple, read plus shared lock, write plus exclusive lock, read and write mutually exclusive. Using the theory of pessimistic locking, the implementation is simple, the data is more secure, but the concurrency is very poor. Use this pattern if your business is very small or concurrent, and requires data to be timely and reliable.
Here to spit a sentence, do not see the select said no lock, in the serializable this level, or will add locks!