Principle Analysis of MySQL database InnoDB Storage engine multi-Version Control (MVCC)

Source: Internet
Author: User

Text/He dengcheng

Guidance:

He dengcheng, a researcher from the MySQL kernel technology of Netease Research Institute, implements the multi-Version Control (MVCC) of the InnoDB Storage engine of MySQL database, after in-depth research and detailed analysis of text charts, you can easily understand the multi-version control technology (MVCC) implemented by the InnoDB Storage engine ).

Basic knowledge

Assume that you have an understanding of the basic knowledge of Multi-Version Control (MVCC. The InnoDB Storage engine of MySQL Databases uses a rollback segment-based protocol to achieve consistent read across multiple versions.

Row Structure

The organization of table data in the InnoDB Storage engine of MySQL database is a primary key clustered index. Because the table structure is organized by indexes, the record rowid is variable (Structure Modification Operation, SMO) when the index page is split, so what is used in the secondary index is (index key value, key value) to uniquely identify a record.

Each record of a clustered index or secondary index contains a deleted bit to identify whether the record is deleted. In addition, the clustered index record has two system columns: data_trx_id and data_roll_ptr. Data _ trx_id indicates the transaction ID that generates the current record item; Data _ roll_ptr indicates the Undo information of the current record item.

The row structure of the clustered index (the part related to consistent read of multiple versions is omitted by deleted bit ):

Secondary index row structure:

From the clustered index row structure and the secondary index row structure, we can see that the clustered index contains version information (transaction number + rollback pointer), and the secondary index does not contain version information, how can we determine the visibility of secondary index items? The following will be given.

 

Read View

The default isolation level of the InnoDB Storage engine is Repeatable read (RR), which can be re-read. Before starting an RR read, the InnoDB Storage engine creates a read view. Read view is used to determine the visibility of a record. Read view is defined in read0read. h. The main visibility-related attributes are as follows:

?
123456789101112131415161718192021 Dulint low_limit_id; /* Transaction number> = low_limit_id record, which is invisible to the current read view */   Dulint up_limit_id; /* Transaction number <up_limit_id, which is visible to the current read view */   Ulint n_trx_ids; /* Number of cells in the trx_ids array */   Dulint * trx_ids; /* Additional TRX IDs which the read shold   Not see: Typically, these are the active   Transactions at the time when the read is   Serialized, cannot the reading transaction   Itself; the TRX IDs in this array are in   Descending Order */   Dulint creator_trx_id; /* Trx id of creating transaction, or   (0, 0) used in purge */

To put it simply, the read view records all the active transactions at the beginning of the read, and the modifications made by these transactions are invisible to the read view. In addition, all other records that are less than the transaction number of the created read view can be seen. It can be seen that there are two meanings:

    • The record is visible, and deleted bit = 0; the current record is visible and valid.
    • The record is visible, and deleted bit = 1; the current record is a visible Delete record. This record has been deleted before the transaction starts.

 

Test method: ?
12345678910111213141516 - Create Table And Index Create Table Test (ID Int Primary Key , Comment Char (50) engine = InnoDB; Create Index Test_idx On Test (comment );   - Insert Insert Into Test Values (1, 'aaa '); Insert Into Test Values (2, 'bbb ');   - Update Primary Key Update Test Set Id = 9 Where Id = 1;   - Update Non- Primary Key With Different value Update Test Set Comment = 'ccc' Where Id = 9;   - Update Non- Primary Key With Same Value Update Test Set Comment = 'bbb' Where Id = 2 And Comment = 'bbb ';

-Read isolation level

Repeatable read (RR)

 

Test Results

 

Update primary key

CodeCall process:

?
1 Ha_innobase: update_row-> row_update_for_mysql-> row_upd_step-> row_upd-> row_upd_clust_step-> row_upd_clust_rec_by_insert-> row_ins_index_entry

To put it simply, the old record tag of the Cluster Index is deleted and a new record is inserted. After the statement is executed, the data structure is as follows:

The old version is still stored in the clustered index. Its data_trx_id is set to 1811, the deleted bit is set to 1, and the transaction ID of the previous image is recorded in undo. The new version of data_trx_id is 1811. Through this figure, we can also find that although the old and new versions are a record, they are identified by two records in the clustered index. At the same time, because the primary key is updated, the secondary index also needs to be updated accordingly (the secondary index contains the primary key ).

 

Update non-primary key (diff value)

Update the Comment field. The Code calling process is different from the preceding one. You can trace it by yourself. This is omitted here. After the update operation is completed, the index structure is changed as follows:

As you can see, when updating the key value of the secondary index, the clustered index itself does not produce new record items, but records the old version information in the Undo. At the same time, the secondary index will generate new index items, and its pkvalue will remain unchanged, pointing to the same record of the clustered index. Careful readers may find that there is a max_trx_id in the secondary index page, which records the maximum transaction ID of the updated secondary index page. By filtering max_trx_id, InnoDB can perform most secondary index coverage scans (only secondary indexes are scanned, and clustered indexes are not required ). The specific filter method will be provided in the following content.

 

Update non-primary key (same value)

The last test case is to update the comment item to the same value. In my tests, the updated index structure is as follows:

The clustered index is updated, but the secondary index remains unchanged.

 

Summary
    1. No matter whether it is a clustered index or a secondary index, a new version will be generated as long as its key value is updated. Set deleted BTI to 1 and insert a new version.
    2. For clustering indexes, if the update operation does not update the primary key, the update will not generate a new version, but will be updated on the original version. The old version will enter the Undo tablespace, roll back using the Undo pointer on the record.
    3. For secondary indexes, if the update operation does not update their key value, the secondary index record remains unchanged.
    4. For secondary indexes, the update operation will generate new data for the secondary index, regardless of whether the primary key is updated or the secondary index key value.
    5. When the deleted bit record is set for the clustered index, the data_trx_id column is updated at the same time. In earlier versions, data_trx_id enters the Undo tablespace. When the secondary index is set to deleted bit, undo is not written.

 

Visibility judgment

 

Primary Key search

Select * from test where id = 1;

    • For test 1, if 1811 (data_trx_id) <read_view.up_limit_id, the log marked as deleted 1 is visible. Delete visible-> no record returned.
    • For test 1, if 1811 (data_trx_id)> = read_view.low_limit_id, it indicates that the deleted record 1 is invisible. You can use the data_roll_ptr rollback record to obtain data_trx_id = 1809. If 1809 is visible, record (1, AAA) is returned; otherwise, no record is returned.
    • For test 1, if neither up_limit_id nor low_limit_id can determine the visibility, traverse the trx_ids in read_view and compare the transaction IDs in sequence. If the data_trx_id is in the trx_ids array, it is invisible (the update is not submitted ).

       

Select * from test where id = 9;

    • For test 2, if 1816 is visible, return (9, CCC ).
    • For test 2, if 1816 is invisible, roll back to 1811 through data_roll_ptr. If 1811 is visible, return (9, AAA ).
    • For test 2, if 1811 is invisible, no result is returned.

       

Select * from test where ID> 0;

    • For test 1, there are two versions of the same record that meet the condition in the index (version 1, delete bit = 1 ). Will a record be returned twice? Certainly not. This is because the visibility of PK = 1 is consistent with that of PK = 9, and PK = 1 is marked with the deleted bit version. If the transaction id = 1811 is visible. Then PK = 1 Delete visible, no record returned, PK = 9 return record; if 1811 is invisible, rollback to 1809 visible, then PK = 1 return record, PK = 9 No record after rollback.

 

Summary:

    1. To query records through the primary key, use read_view to record data_trx_id and record data_roll_ptr pointer for joint judgment.
    2. Read_view is used to determine whether the current record is visible (data_trx_id ). Data_roll_ptr is used to roll back the current record to the previous version.

 

Non-primary key Lookup

Select comment from test where comment> '';

  • Max_trx_id = 1816 is the maximum update transaction on the current page for Test 2 and secondary cable reference. If max_trx_id <read_view.up_limit_id, all data on the current page can be seen. This page can perform index coverage scanning. Discard all records with deleted bit = 1 and return records with deleted bit = 0. At this time, return (CCC ). (Row_select_for_mysql-> lock_sec_rec_cons_read_sees)
  • For test 2 and secondary index reference, if the current page cannot meet max_trx_id <read_view.up_limit_id, it means that the current page cannot perform index coverage scan. In this case, you need to determine the visibility of each item in the clustered index. Back to test 2, there are two primary keys in the secondary index = 9 (one deleted bit = 1 and the other is 0), and only one primary key = 9 in the corresponding clustered index. How can we ensure that multiple versions of the same record can be returned at most once in the clustered index? If the current transaction ID is 1811 visible. Secondary index PK = 9 records (two items), the same record item is located through the cluster index undo. In this case, InnoDB uses the following expression to ensure that multiple version items from the secondary index point to the same clustered index record. If there is only one version at most, data will be returned: ?
    1234567 If (Clust_rec  & (Old_vers | rec_get_deleted_flag (  REC, dict_table_is_comp (sec_index-> table )))  &&! Row_sel_sec_rec_is_for_clust_rec (REC, sec_index, clust_rec, clust_index ))

All clustered index records that meet the if judgment are discarded directly. The above judgment logic is as follows:

    1. Cluster Index scanning and record retrieval are required
    2. The clustered index record is the rollback version, or the record in the secondary index is the deleted version.
    3. The key value of a clustered index is not equal to that of a secondary index.

Why can data be directly discarded if the condition is true? In the vernacular, we use the secondary index record to locate the clustered index record. After positioning, we also need to check again whether the clustered index record is still the record I saw in the secondary index. If not, it is discarded directly. If yes, it is returned.

Based on this condition, the index structure in query and Test 2 is combined. It can be seen that the version is transaction 1811. Two primary keys = 9 in the secondary index can be rolled back to version 1811 through the clustered index. However, the secondary index record (CCC, 9) is inconsistent with the version (AAA, 9) after the Cluster Index is rolled back, and is directly discarded. Only secondary index records (AAA, 9) are consistent and are directly returned.

 

Summary:

    1. Multi-version visibility judgment of secondary indexes must be completed through clustering indexes.
    2. The secondary index page stores max_trx_id, which can be used to quickly determine whether all items on the current page can be seen. Secondary index page-level index coverage scanning can be implemented. Generally, this condition is met, ensuring the efficiency of index only scan.
    3. The items in the secondary index must be compared with the visibility in the clustered index to ensure that the visible items in the clustered index are consistent with the items in the secondary index.

 

Question
    1. InHttp://blogs.InnoDB.com/wp/2011/04/mysql-5-6-multi-threaded-purge/The author mentioned that the purge operation of the InnoDB Storage engine is to reclaim the tag-bit deleted items by traversing undo. If the deleted bit of the secondary index does not record undo, how can this recycling operation be completed? Or is purge recycled by parsing redo? (According to the following analysis of the purge process, this problem has been resolved)

 

Purge Process

Purge features:

InnoDB must support multi-version protocols. Therefore, whether update or delete, it only sets the deleted bit flag on the record, rather than deleting the record. The subsequent deletion of these records is implemented through the purge background process. The purge process regularly scans the InnoDB undo and reads each undo record in the order of reading the old undo and then the new undo. For each undo record, determine whether the corresponding record can be purged (the purge process has its own read view, which is equivalent to the view before the oldest active transaction when the process starts, ensure that purge data is invisible to anyone. If purge can be used, construct a complete record (row_purge_parse_undo_rec ). Then, based on the order of the first purge secondary index and the last Purge Cluster Index, the old version of the full record generated by one purge operation is complete.

A complete purge function call process is as follows:

?
123 Row_purge_step-> row_purge-> trx_purge_fetch_next_rec-> row_purge_parse_undo_rec-> Row_purge_del_mark-> row_purge_remove_sec_if_poss-> Row_purge_remove_clust_if_poss

Summary:

    1. purge is implemented by traversing undo.
    2. the purge granularity is an operation on a record. If a record is updated three times and three old versions are generated, the record can be purged. Purge reads undo data and calls purge once for each operation. A purge deletes the old version generated by an operation (in the order from old to new operations ).
    3. purge performs the following operations based on the first secondary index and the last clustered index.
    4. purge secondary index, which is searched and located by constructed index items. You cannot directly target a second-level page because you do not know the page where the record is stored.
    5. setting deleted bit for secondary indexes does not need to record undo, because purge is implemented based on the Cluster Index undo. Therefore, the deleted bit of the secondary index is set to 1 and can still be purged if Undo is not recorded.
    6. purge is a time-consuming operation. The purge of the secondary index requires the search_path to locate the data, which is equivalent to the index unique scan for each secondary index.
    7. one delete operation doubles Io. The first Io is to set the ed bit of the record to 1; the second Io is to delete the record.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.