Mysql mvcc multi-Version Control recently looked at the database. After learning about mysql mvcc, I began to look for various materials, but I didn't know much about it. After reading it for a few days, I found that I still don't know what is going on. I found an article today, which is very thorough and clear. Is it recorded here? How does Mysql implement MVCC? Countless people have this problem.
Mysql mvcc multi-Version Control recently looked at the database. After learning about mysql mvcc, I began to look for various materials, but I didn't know much about it. After reading it for a few days, I found that I still don't know what is going on. I found an article today, which is very thorough and clear. Is it recorded here? How does Mysql implement MVCC? Countless people have this problem.
Mysql mvcc multi-Version Control
After reading about the database recently and learning about mysql mvcc, I began to look for various materials, but I am not very clear about it. After reading it for a few days, I still don't know what is going on, I found an article today, which is very thorough and clear and recorded here.
?
How does Mysql implement MVCC? There are countless people asking this question, but there is no answer in google. This article attempts to find the answer from the Mysql source code.
? In Mysql, MVCC is supported in the Innodb Storage engine. Innodb implements three hidden fields for each row of records:
?
- 6-byte transaction ID (
DB_TRX_ID
?)
- 7-byte rollback pointer (DB_ROLL_PTR)
- Hidden ID
A 6-byte transaction ID is used to identify the transactions described in this row. A 7-byte rollback pointer needs to understand the transaction model of Innodb .? 1. Innodb Transaction-related concepts in order to support transactions, Innbodb introduces the following concepts:
- Redo log
Redo log is used to save the executed SQL statement to a specified Log file. when Mysql executes the recovery, it can re-execute the SQL operation of the redo log record. When the client executes each SQL statement (update statement), The redo log is first written into the log buffer. When the client executes the COMMIT command, the content in the log buffer is refreshed to the disk as needed. Redo log exists as an independent file on the disk, that is, the Innodb log file.
- Undo log
Opposite to redo log, undo log is used for rollback. The specific content is to copy the database content (ROW) before the transaction to the undo buffer, refresh the content in the undo buffer to the disk at the appropriate time. Like the redo buffer, the undo buffer is also a ring buffer, but when the buffer is full, the content in the undo buffer will be refreshed to the disk. Unlike the redo log, there is no separate undo log file on the disk. All undo logs are stored in the primary ibd data file (tablespace), even if each table is set to a data file on the client.
- Rollback segment
The concept of rollback segments comes from the Oracle transaction model. In Innodb, the undo log is divided into multiple segments, and the undo log of a Specific Row is stored in a specific segment, it is called a rollback segment. It can be considered that the undo log and rollback segment share the same meaning.
- Lock
Innodb provides row-based locks. If the number of rows is very large, the number of locks under high concurrency may be relatively large. According to Innodb documentation, innodb effectively optimizes the lock space. Even if the concurrency is high, the memory will not be exhausted.
There are two types of row locks: exclusive locks and shared locks. The exclusive lock and the exclusive lock are completely equivalent to the read/write lock. If a transaction is updating a row (exclusive lock), other transactions, whether read or write, must wait. If a transaction reads a row (shared lock ), other read tasks do not have to wait, but write tasks do. The shared lock ensures no waiting among multiple reads, but the lock application depends on the transaction isolation level of Mysql.
- Isolation level
The isolation level is used to limit the degree of direct interaction of transactions. Currently there are several industrial standards:
-? READ_UNCOMMITTED: Dirty read
-? READ_COMMITTED: Read and submit
-? REPEATABLE_READ: Repeated read
-? SERIALIZABLE: SERIALIZABLE
Innodb supports all four types of data. There are not many dirty read and serialized application scenarios, and it is widely used for read submission and repeated read. The implementation method will be introduced later.
2. the row update process below demonstrates the transaction update process for a row record: 1. Initial Data row F1 ~ F6 is the name of a row or column, 1 ~ 6 is the corresponding data. The following three hidden fields correspond to the transaction number and rollback pointer of the row respectively. If the data is just inserted, the ID is 1 and the other two fields are blank. 2. Transaction 1 changes the value of each field of the row. When transaction 1 changes the value of the row, it performs the following operations:
- Lock this row with exclusive locks
- Redo log
- Copy the value before the row modification to the undo log, that is, the lower row
- Modify the value of the current row and enter the transaction number to point the rollback pointer to the row before the modification in the undo log.
3. Transaction 2 changes the value of this row to be the same as transaction 1. At this time, there are two rows of records in the undo log and the records are linked together through the rollback pointer. Therefore, if the undo log is not deleted, the initial content of the row will be traced back through the rollback pointer of the current record. Fortunately, the purge thread exists in Innodb, it queries undo logs earlier than the oldest active transactions and deletes them to ensure that the undo log file does not grow infinitely. 4. transaction COMMIT when the transaction is committed normally, Innbod only needs to change the transaction state to COMMIT without additional work, while Rollback is slightly more complex, you need to find the transaction version before modification from the undo log based on the current rollback pointer and restore it. If there are many rows affected by the transaction, rollback may be less efficient. Based on experience, there are no rows in the transaction: 1000 ~ Between 10000, Innodb efficiency is still very high. Obviously, Innodb is a storage engine with a higher COMMIT efficiency than Rollback. It is said that the implementation of ipvss is exactly the opposite. 5. Insert? The above process of the Undo log exactly describes the transaction process of UPDATE. In fact, the undo log is divided into insert and update undo log because the original data does not exist during insert, therefore, the insert undo log can be discarded during rollback, while the update undo log must follow the above process. 3. As we all know, the transaction level is update (update, insert, delete) as a transaction process. In Innodb, queries are also a transaction and read-only transactions. When read/write transactions concurrently access the same row of data, what content can be read depends on the transaction level:
- READ_UNCOMMITTED
When the read is not committed, the read transaction reads the master record directly, regardless of whether the update transaction is complete.
- READ_COMMITTED
When a read transaction is committed, it reads the latest version of the undo log each time. Therefore, two reads to the same field may read different data (phantom read ), however, the latest data can be read every time.
- REPEATABLE_READ
Read the specified version every time, so that no phantom read is generated, but the latest data may not be read.
- SERIALIZABLE
Lock tables, read/write mutual blocking, less used
A read transaction is generally triggered by a SELECT statement, which ensures non-blocking in Innodb. Except for select statements with for update, SELECT statements with for update will apply an exclusive lock to rows, wait until the update transaction is complete and read its latest content. The design goal of Innodb is to provide efficient and non-blocking query operations. 4. MVCC creates an undo log before the above update. MVCC is used for non-blocking reading based on various policies, and the row in the undo log is the multi-version in MVCC, this may differ greatly from the MVCC we understand. We generally think that MVCC has the following features:
- Each row of data has a version, which is updated each time the data is updated.
- When the modification is made, the current version is copied and modified without interference between transactions.
- Compare the version number when saving. If it succeeds (commit), it overwrites the original record; if it fails, it discards copy (rollback)
That is, each row has a version number, which determines whether the lock is successful based on the version number. It sounds like an optimistic lock. the Innodb implementation method is as follows:
- The transaction modifies the original data in the form of exclusive locks.
- Store the data before modification in the undo log, and associate it with the master data through the rollback pointer.
- If the modification succeeds (commit), nothing is done. If the modification fails, the data in the undo log is restored (rollback)
The most essential difference between the two is: Do I have to lock the data when I modify it? Isn't it MVCC if I lock it ??? The implementation of Innodb is really not MVCC, because it does not implement multi-version coexistence of the core. The content in the undo log is only the result of serialization and records the process of multiple transactions, does not belong to multi-version coexistence. However, the ideal MVCC is hard to implement. When a transaction only modifies one row of records and uses the ideal MVCC mode, the transaction can be rolled back by comparing the version number; however, when a transaction affects multiple rows of data, the ideal MVCC data is powerless .? For example, if Transaciton1 executes the desired MVCC and Row1 is modified successfully, but Row2 fails to be modified, Row1 needs to be rolled back. However, because Row1 is not locked, its data may be modified by transaction2, if the Row1 content is rolled back at this time, the modification result of Transaction2 will be damaged, resulting in Transaction2 violating ACID .? The root cause of the difficulty in implementing the ideal MVCC is the attempt to replace the second commit with an optimistic lock. Two rows of data are modified, but to ensure consistency, there is no difference with modifying the data in the two distributed systems. Second commit is the only way to ensure consistency in this scenario. The essence of the second commit is locking. The essence of optimistic locks is to eliminate locks, which are in conflict. Therefore, the ideal MVCC cannot be applied in practice. Innodb only uses the MVCC name, read is not blocked. 5. in summary, it does not mean that MVCC is nowhere to be available. scenarios with low consistency requirements and scenarios with single data operations can still play a role. For example, when multiple transactions change the number of online users at the same time, if a transaction fails to be updated, re-calculate and retry until the transaction is successful. Using MVCC will greatly increase the concurrency and eliminate the thread lock .? From: http://blog.csdn.net/chen77716/article/details/6742128