MySQL Database InnoDB storage Engine multi-version control (MVCC) Implementation principle analysis

Source: Internet
Author: User
Tags bit set visibility

Wen/Ho Dengcheng

Guide:

The MySQL kernel technology researcher from the NetEase Research Institute, Ho Dengcheng, InnoDB the multi-version control (abbreviated: MVCC) of the MySQL database and the storage engine, doing in-depth research and detailed text chart analysis, Easy to understand the InnoDB storage engine implementation of the multi-version control technology (abbreviation: MVCC).

Basic knowledge

Assume that you have a knowledge of the basics of multi-versioning (MVCC). MySQL database InnoDB storage engine in order to achieve multiple versions of consistent read, the use of a rollback segment based on the protocol.

Row structure

The MySQL database InnoDB storage engine table data is organized as a primary key clustered index. Because of the Index organization table structure, the record's rowID is mutable (when the index page splits, Structure modification operation,smo), so the two-level index uses a combination of (index key value, primary key key value) to uniquely determine a record.

Whether it is a clustered index or a two-level index, each record contains a deleted bit that identifies whether the record was deleted. In addition, the clustered index record has two system columns: Data_trx_id,data_roll_ptr. DATA _trx_id represents the transaction id;data that produces the current record entry _roll_ptr The undo information that points to the current record entry.

Clustered index row structure (section related to multiple versions of consistent read, DELETED bit omitted):

Second-level index row structure:

From the clustered index row structure, and the two-level index row structure, it can be seen that the clustered index contains version information (transaction number + rollback pointer), level Two index does not contain version information, level Two index item visibility how to judge? The following will be given.

Read View

The default isolation level for the INNODB storage engine is repeatable read (RR), which can be read repeatedly. The InnoDB storage engine creates a read View before starting a RR read. Read view is used to determine the visibility of a record. The read view is defined in the Read0read.h file, where the most significant visibility-related properties are as follows:

Dulint low_limit_id;/*the record of the transaction number >= low_limit_id is not visible for the current read view*/Dulint up_limit_id; /*Transaction Number < up_limit_id, which is visible for the current read view*/Ulint N_trx_ids; /*Number of cells in the trx_ids array*/Dulint* TRX_IDS;/*Additional Trx IDs which the read should not see:typically, these is the active trans Actions at the time of the read is serialized, except the reading transaction itself; The TRX IDs in this array is in a descending order*/Dulint creator_trx_id; /*TRX ID of creating transaction, or (0, 0) used in purge*/

Simply put, the Read view record reads at the beginning of all the active transactions, and the modifications made by these firms are not visible to the read view. In addition, all other records that are smaller than the transaction number that created the read view are visible. Visible includes two levels of meaning:

    • The record is visible, and deleted bit = 0; The current record is a valid record that is visible.
    • The record is visible, and deleted bit = 1; The current record is a visible delete record. This record was deleted before this transaction started.

Test method:
–Create Table  and IndexCreate TableTest (IDint Primary Key, commentChar( -)) engine=InnoDB;Create IndexTest_idx ontest (comment); –InsertInsert  intoTestValues(1, ' AAA ');Insert  intoTestValues(2, ' BBB '); –Update Primary KeyUpdateTestSetId= 9 whereId= 1; –UpdateNon-Primary Key  withdifferent valueUpdateTestSetComment=' CCC 'whereId= 9; –UpdateNon-Primary Key  withSame ValueUpdateTestSetComment=' BBB 'whereId= 2  andComment=' BBB ';

–read Isolation Level

Repeatable Read (RR)

Test results

Update PRIMARY key

Code Invocation Process:

 -  -  -  -  -  -  Row_ins_index_entry

To put it simply, the old record mark bits of the cluster index are deleted, and a new record is inserted. After the statement is executed, the data structure is as follows:

The old version is still stored in the clustered index, and its data_trx_id is set to the 1811,deleted bit set to the transaction id = 1809 of the pre-image recorded in 1,undo. The new version data_trx_id is also 1811. With this diagram, you can also find that although the old and new version is a record, it is identified by two records in the clustered index. At the same time, because the primary key is updated, the level two index needs to be updated accordingly (the primary key entry is included in the level two index).

Update non-primary key (diff value)

Update the comment field, the code invocation process is different from the above, and can be traced by itself, omitted here. After the update operation has finished executing, the index structure changes as follows:

From the visible, when updating the key value of a level two index, the clustered index itself does not produce a new record entry, but instead records the old version information in Undo. At the same time, a level two index will produce a new index entry with the PK value remaining unchanged, pointing to the same record in the clustered index. The attentive reader may find that the Level two index page has a max_trx_id that records the maximum transaction ID for the update Level two index page. With max_trx_id filtering, InnoDB is able to achieve most of the secondary index coverage scans (just scan the secondary index and do not need to back up the clustered index). The specific filtering method will be given in the following section.

Update Non-primary key (same value)

The last test case is to update the comment item to the same value. In my test, the index structure after the update is as follows:

The clustered index is still updated, but the level two index remains unchanged.

Summarize
    1. Whether it is a clustered index or a two-level index, a new version is generated whenever its key value is updated. Set the old version data deleted BTI to 1, and insert the new version.
    2. For clustered indexes, if the update operation does not update primary key, the update does not produce a new version, but is updated on the previous version, and the old version enters the undo tablespace and is rolled back by the undo pointer on the record.
    3. For a level two index, a level two index record remains unchanged if the update operation does not update its key value.
    4. For a level two index, the update operation will cause the new version data to be generated by the two-level index regardless of whether the update primary key or the two-level index key value.
    5. When the clustered index setting records deleted bit, the data_trx_id column is also updated. The old version data_trx_id enters the undo tablespace; When the secondary index is set deleted bit, Undo is not written.

Visibility judgment

Primary Key Lookup

SELECT * FROM test where id = 1;

    • For Test 1, if 1811 (data_trx_id) < read_view.up_limit_id, the record 1 that is marked as deleted is visible. Delete visible--no record returned.
    • For Test 1, if 1811 (data_trx_id) >= read_view.low_limit_id, proves that the record 1 marked for deletion is not visible, and the record is rolled back by data_roll_ptr, data_trx_id = 1809 is obtained. If 1809 is visible, the record (1,AAA) is returned, otherwise no records are returned.
    • For Test 1, if up_limit_id,low_limit_id cannot determine visibility, then traverse the trx_ids in Read_view, comparing the transaction ID in turn, if it is not visible in the Trx_ids array (update uncommitted).

SELECT * FROM Test where id = 9;

    • For Test 2, if 1816 is visible, return (9,CCC).
    • For Test 2, if 1816 is not visible, rollback to 1811 via Data_roll_ptr, if 1811 is visible, return (9, AAA).
    • For Test 2, if 1811 is not visible, no result is returned.

SELECT * from Test where ID > 0;

    • For Test 1, the same record that satisfies the condition in the index, has two versions (version 1,delete bit =1). So will a record be returned two times? Certainly not, this is because the visibility of PK = 1 is consistent with the visibility of PK = 9, and PK = 1 is the version that marks the deleted bit. If transaction id = 1811 is visible. Then pk = 1 Delete is visible, no record returned, PK = 9 Returns the record, if 1811 is not visible, rollback to 1809 is visible, then PK = 1 returns record, PK = 9 no record after rollback.

Summary :

    1. Through the primary key to find records, need to cooperate with Read_view, record data_trx_id, record data_roll_ptr pointer common judgment.
    2. Read_view is used to determine whether the current record is visible (judging data_trx_id). Data_roll_ptr is used to roll back the current record to the previous version.

Non-primary Key lookup

Select comment from test where comment > ';

    • For the test 2, Level Two index, the maximum update transaction max_trx_id = 1816 for the current page. If max_trx_id < read_view.up_limit_id, all data on the current page is visible, this page can be scanned for index coverage. Discards all deleted bit = 1 records, returns the record of deleted bit = 0, and returns at this time (CCC). (Row_select_for_mysql-Lock_sec_rec_cons_read_sees)
    • For test 2, level two indexes, if the current page does not meet max_trx_id < read_view.up_limit_id, the current page cannot be indexed for coverage scanning, at which point the visibility needs to be determined for each item in the clustered index. Back to test 2, the two-level index has two PK = 9 (one deleted bit = 1 and the other 0), and the corresponding clustered index has only one item pk= 9. How can I guarantee that multiple versions of the same record coming through a level two index will be returned at most once in the clustered index? If the current transaction ID 1811 is visible. Secondary index PK = 9 of records (two items), with the clustered index of undo, are positioned to the same record entry. At this point, InnoDB is guaranteed to come from a level two index with multiple version entries that point to the same clustered index record, with at most one version returning the data, using one of the following expressions:
  if (Clust_rec    && (old_vers | | Rec_get_deleted_flag (Rec,dict_table_is_comp (sec_index,table))) &&!row_sel_sec_rec_is_for_clust_rec (REC, Sec_index, Clust_rec, Clust_index))

All clustered index records that satisfy if judgment are discarded directly, the logic of the above judgments is as follows:

    1. Need to return a clustered index scan and get a record
    2. A clustered index record is a rollback version, or a record in a level two index is a deleted version
    3. Clustered index entries, with two-level index entries whose key values are not equal

Why is it possible to discard data directly if the If judgment is satisfied? In the vernacular, that is, we pass the two level index record, locate the clustered index record, after locating, also need to check again whether the clustered index record is still the record that I see in the Level two index. If not, it is discarded directly, and if it is, it is returned.

Based on this condition, combine the query with the index structure in test 2. The visible version is transaction 1811. Both PK = 9 in the two-level index can be rolled back to the 1811 version via the clustered index. However, the level two index record (ccc,9) is not consistent with the version (aaa,9) after the clustered index rollback, and is discarded directly. Only level two index records (aaa,9) remain consistent and return directly.

Summary :

    1. The multi-version visibility of a secondary index needs to be done through a clustered index.
    2. MAX_TRX_ID is saved in the secondary index page to quickly determine whether all items are visible in the current page and can achieve index overlay scanning at level two index pages. In general, this judgment is conditional and guarantees the efficiency of index-only scan.
    3. Items in a secondary index need to be compared to the visibility in the clustered index, ensuring that the visible items in the clustered index are consistent with the item data in the two-level index.

Questions
    1. In Http://blogs. In innodb.com/wp/2011/04/mysql-5-6-multi-threaded-purge/, the author mentions that the purge operation of the InnoDB storage engine is done by iterating through undo to implement the collection of the token bit deleted items. If the level two index itself marks the deleted bit does not record undo, then how does this recycle operation complete? Or is it that purge is recycled by parsing redo? (This issue is resolved according to the process analysis for purge below)

Purge process

Purge Features:

InnoDB because the multi-version protocol is to be supported, both the update and the delete are simply setting the deleted bit mark bit on the record, rather than actually deleting the record. The subsequent actual deletion of these records is achieved through the purge daemon process. The purge process periodically scans the undo of InnoDB, reading each undo record in the order of first reading old undo and then reading the new undo. For each undo record, determine whether its corresponding record can be purge (the purge process has its own read view, which is equivalent to the view before the oldest active transaction at the beginning of the process, guaranteeing purge data, which must be invisible to anyone), If purge is possible, the full record (ROW_PURGE_PARSE_UNDO_REC) is constructed. Then follow the purge two index first, and finally purge the order of the clustered index, purge a complete record of the old version generated by an operation.

A complete purge function call flow is as follows:

row_purge_step->row_purge->trx_purge_fetch_next_rec->Row_purge_parse_undo_rec->row_purge_ del_mark->Row_purge_remove_sec_if_poss->row_purge_remove_clust_if_poss

Summarize:

    1. The purge is implemented by iterating through undo.
    2. The granularity of a purge is an action on a record. If a record has been update 3 times, resulting in 3 old versions, all can be purge. Then purge reads undo, and for each operation, a purge is called. An purge deletes an old version of an operation (as per the operation from the old to the new Order).
    3. Purge is performed in the order of the first two-level index and the last clustered index.
    4. Purge a two-level index to find and locate by constructing an index entry. You cannot do this directly for a level two page, because you do not know where the record is stored page.
    5. For level two indexing settings deleted bit is not required to record undo because purge is implemented according to the clustered index undo. So the level two index deleted bit is set to 1, no undo is recorded, and can still be purge.
    6. Purge is a time-consuming operation. Secondary index purge, need to search_path location data, equivalent to each level two index, have done an index unique scan.
    7. Once delete operation, IO doubles. The first IO is to set the record's deleted bit to 1, and the second IO is to delete the record.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.