InnoDB MVCC Implementation Mechanism

Source: Internet
Author: User
Tags rollback

Multi-version concurrency control

Most of the MySQL storage engines, such as Innodb,falcon, and pbxt, are not simply using the row lock mechanism. They all use row locks combined with a technique for improving concurrency, known as MVCC (multi-version concurrency control). MVCC is not just used in MySQL, other databases such as Oracle,postgresql, and other databases also use this technology.

MVCC avoids many situations that require locking and reduces consumption. It depends on how it is implemented, it allows non-blocking reads, and blocks the necessary records at the time of the write operation.

MVCC holds a snapshot of the data for a moment. It means that no matter how long things run, they can see consistent data. That is to say, at the same time, different things look at the same table data is different. If you have never had this experience, it may be a bit confusing to say that. But in the future this will be easy to understand and familiar with.

Each storage engine implements a different MVCC approach. There are many kinds of concurrency controls that include optimism (optimistic) and pessimism (pessimistic). We use simple innodb behavior to illustrate the way MVCC works.

To say things, we must know a little bit about the basics:

First, the basic knowledge

transaction: A transaction is a set of atomic SQL query statements that are treated as a unit of work. If MySQL performs the normal execution of all SQL statements in the transaction unit, the transaction operation is considered successful, all SQL statements take effect on the data, and if any failure or error occurs in SQL, the transaction operation fails, and all operations on the data are invalid (by rolling back the recovered data). A transaction has four properties:1, atomicity: a unit of work that a transaction is considered to be non-divided, either fully executed or not executed at all. 2, consistency: Transactional operations are always transitioning from one consistent state to another in a consistent state. 3, Isolation: The result of the operation of a transaction is internally consistent, visible, and invisible to transactions other than itself. 4, permanent: The transaction can roll back the recovery data in the case of uncommitted data, and once the commit data is changed, it will be permanent (certainly with update). The database for the Ps:mysam engine does not support transactions, so it is best not to operate on the hybrid engine (such as InnoDB, MYISAM), and if it is the best you want, the operations in the transaction for non-supported transaction tables cannot be rolled back for recovery. Read Lock: Also known as a shared lock, S lock, if the transaction T to the data object a plus s lock, then the transaction T can read a but cannot modify a, the other transaction can only a plus s lock, and cannot add x lock, until T release S lock on A. This ensures that other transactions can read a, but cannot make any modifications to a before T releases the S lock on a. Write Lock: Also known as exclusive lock, X lock. If the transaction t has an X lock on the data object A, the transaction T can read a or modify a, and the other transaction cannot add any locks to a, until T releases the lock on A. This ensures that the other transaction cannot read and modify a until the lock on the A is released by T. Table Lock: The Action object is a data table. Most of MySQL's lock policies are supported (common MySQL InnoDB), which is the lowest system cost but the least concurrency lock policy. The transaction T reads the entire table lock, then other transactions can be read and not writable, if the write lock, then other transactions add or delete can not be changed. Row-level Lock: An Action object is a row in a data table. MVCC technology is used more, but not in the MyISAM, row-level lock with the MySQL storage engine implementation instead of MySQL server. However, the row-level lock has higher overhead and better processing concurrency. MVCC: Multi-version concurrency control (mvcc,multiversion Currency controls). In general, the transactional storage engine does not use only table locks, row-and-lock processing data, but combines the MVCC mechanism to handle more concurrency problems. The MVCC handles high concurrency with the strongest, but the system overhead is higher than the maximum (table lock, row-level lock), which is the most expensive concurrency cost. Autocommit: MySQL is a system variable, by default autocommit=1 means that MySQL does not automatically commit a SQL statement without a commit statement. So, when you want to turn on transaction operations, set Autocommit to 0, which can be done by "SetSession autocommit=0; "To set

Second, MVCC implementation principles and examples of understanding (including tests to understand)

First: Take a look at almost all the same online understanding, including the "high-performance MySQL second edition (Chinese version)" And so on, so it is easy to understand. But I think 2 places inappropriate, first look at the content, in the back I will give inappropriate places with (1、2...) The bold sign comes out and the test proof is given. Ps: These are just outside of the understanding level, deep in the 3rd explanation------------------------------------------the way InnoDB implements MVCC is that it stores two of each row (1additional hidden fields, both of which record the time the row was created and the time it was deleted. Each time the event occurs, the version number is stored per row, not when the event actually occurs. Each time the beginning of a thing this version number will increase. Since the recording time, each thing will save the system version number of the record. Check the version number of each line according to the version of the thing. In the case where the object isolation level is repeatable, let's look at how to apply it. SELECTINNODB check the data to make sure they meet two criteria:1, InnoDB only finds rows of data that are earlier than the current version of the transaction (that is, the version of the data row must be less than or equal to the transaction), which ensures that the current transaction reads the rows that existed before the transaction, or the rows created or modified by the current transaction2, the version of the delete operation for the row must be undefined or greater than the version number of the current transaction. Determined that the row was not deleted before the current transaction started (2The query results are returned in accordance with the above two points.  INSERT InnoDB records the current system version number as the creation ID for each new row.  Delete InnoDB the current system version number for each deleted row as the deletion ID for the row. Updateinnodb copied a row. The version number of this new line uses the system version number. It also takes the system version number as the version of the deleted row. ----------------------------------------------(1) is not two, is three. 1DB_TRX_ID: A 6byte identity, with each transaction being processed, its value is automatically+1, the "Create Time" and "Delete Time" records mentioned above are the values of this db_trx_id, such as INSERT, UPDATE, delete operations, which are represented by 1 bits. DB_TRX_ID is the most important one, can be found by the statement "Show engine InnoDB status", as follows:-----------------------------------------... Transactions------------Trx ID Counter0 430621Purge Done forTrx\'s N:o < 0 430136 undo N:o < 0 0History List Length7...------------------------------------------2db_roll_ptr: The size is 7byte, which points to an undo log record written to rollback segment (rollback segment) 3DB_ROW_ID: The size is 6byte, the update operation, the ROW value before the update This value increases monotonically with new lines inserts, and when the clustered index is automatically generated by InnoDB, the clustered index includes the value of this db_row_id, otherwise the value is not included in the clustered index. This is used in the index (2here is not the real delete data, but the sign out of the delete. The true meaning of the deletion is at the time of commit. Online parlance is easy to misunderstand (3This is not marked on the insert operation when the "Create Time" =db_row_id, when the "delete time" is undefined, at update, the copy of the new row "Create Time" =db_row_id, delete time is undefined, old data row "create Time" unchanged, delete time = The db_row_id;delete operation of the transaction, the "creation time" of the corresponding data row is unchanged, the deletion time =the db_row_id;select operation of the transaction does not modify both, read only the corresponding data second, the following graphically represents how MVCC handles SELECT, INSERT, delete, update has two transactions A, B assumes the start time order ABCD, And db_trx_id meet the following conditions a. db_trx_id= .B. db_trx_id= .C. db_trx_id= -D. db_trx_id= -


1, B. db_trx_id> A. DB_TRX_ID is because the value of db_trx_id is the value of the system version number, the system version number is automatically incremented, so db_trx_id is also automatically incremented. This happens, however, if a transaction starts with an insert operation that inserts a row of data (no Bengin, Comint) before the start of the B transaction, B. Db_trx_id= A. Db_trx_id+1+1, does not conform to the system version number increment of 1.

InnoDB implements MVCC by storing two additional hidden fields for each row, each of which records the time the row was created and the time it was deleted. Each time the event occurs, the version number is stored per row, not when the event actually occurs. Each time the beginning of a thing this version number will increase. Since the recording time, each thing will save the system version number of the record. Check the version number of each line according to the version of the thing. In the case where the object isolation level is repeatable, let's look at how to apply it.


InnoDB check each line to make sure it complies with two criteria.

InnoDB must know the version number of the line, the version number of the line is at least as old as the thing version number. (That is, it may have a version number that is less than or the same as the thing version number). This determines whether the line is present before the thing starts, or determines whether the line is created or modified.

The version of the delete operation for a row must be undefined or greater than the version number of the thing. The row was not deleted until the start of the thing was determined.

Meet the above two points. The query results are returned.


InnoDB records the system version number of the currently added row.


The system version number of the deleted row of the InnoDB record as the deletion ID of the row.


InnoDB copied a row. The version number of this new line uses the system version number. It also takes the system version number as the version of the deleted row.

The results of all other records are saved as queries that have never been locked. This way they can query the data as quickly as possible. Be sure to follow these criteria for query lines. The downside is that the storage engine stores more data for each row, more processing to check the rows, and other internal operations.

MVCC can only take effect under the isolation level of repeatable read and read-committed reads. Non-committed read cannot use it because a row version that conforms to a thing version cannot be read. They always read the latest row version. The reason that serializable cannot use MVCC is that it always locks rows.

The following table shows the modes and concurrency levels for different locks in MySQL.

Lock Policy Concurrency of Overhead Engine
Table Minimum Minimum Myisam,merge,memory
Yes High High NDB Cluster
Line and MVCC Highest Highest Innodb,falcon,pbxt,solidd

When a database is concurrently read and written, the read operation may be inconsistent with the data (dirty Read). In order to avoid this situation, it is necessary to implement concurrent access control of the database, the simplest way is locking access. Because the lock will serialize the read and write operations, there will be no inconsistent state. However, read operations are blocked by write operations, significantly reducing read performance. In the Java concurrent package, there are classes of the Copyonwrite series that are specifically designed to optimize read-to-write scenarios. and its optimization means, in the writing operation, the data copy, will not affect the original data, and then modify, after the completion of the modification of the atom to replace the old data, and read operation will only read the original data. Writing in this way does not block read operations, thus optimizing read efficiency. The write operation is mutually exclusive, and each write operation will have a copy, so it is only suitable for reading more than the case of writing.

A few more words, the principle of MVCC and copyonwrite similar, each read operation will see a consistent snapshot, and can implement non-blocking read. MVCC allows data to have multiple versions, this version can be a timestamp or a globally incrementing transaction ID, and at the same point in time, different transactions see different data.

Implementation principle:

------------------------------------------------------------------------------------------>Time Axis|-------R (T1)-----| | -----------U (T2)-----------|For example, suppose there are two concurrent operations R (T1) and U (T2), T1 and T2 are transaction id,t1 less than T2, and the system contains data a=1(T1), R and W operate as follows: R:read A (T1) u:a=2(T2) The version of R (read operation) T1 indicates the version of the data to be read, and then the write operation will not update the version, and the read operation will not. On the timeline, R is later than U, and because u commits after r, it is not visible to r. Therefore, R will only read the T1 version of the data, i.e. a=1. Because the consistency of existing data cannot be affected until the update operation is committed, the old data is not changed and the update operation is split into insert+Delete. You need to mark the deletion of old data, insert new data. Subsequent read operations are not affected until the update is committed. For read operations, however, the write operation that is being performed is not visible to all of the writes before it. It says a bunch of imaginary theories, and here's a little work to see how MySQL's InnoDB engine is MVCC. InnoDB adds two fields for each row, representing the version and deleted version of the row, and fills in the version number of the transaction, which is incremented as the transaction is created. At the isolation level of repeated read (see this article for the isolation level of the transaction), the implementation of the various database operations is as follows:Select: The following two conditions are met InnoDB returns the row data: (1The created version number of the row is less than or equal to the current version number and is used to guarantee that all operations have been performed before the select operation. (2the deleted version number of the row is greater than the current version or is empty. Deleting a version number that is larger than the current version means that there is a concurrent transaction that deletes the row. Insert: Sets the created version number of the newly inserted row to the version number of the current system. Delete: The deleted version number of the row you want to delete is set to the version number of the current system. Update: Do not perform in-place update, but convert to insert+Delete. Set the delete version number of the old row to the current version number and insert the new row with the same setting to create the version number as the current version number. Where write operations (insert, delete, and update) are performed, you need to increment the system version number. Because the old data is not really deleted, so the data must be cleaned up, InnoDB will open a background thread to perform cleanup work, the rule is to delete the deletion version number is less than the current system version of the row delete, this process is called purge. Through MVCC very good realizes the isolation of the transaction, can reach repeated read level, to implement serializable also must lock. 

InnoDB MVCC Implementation Mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.