Preliminary discussion on InnoDB MVCC source code Realization

Source: Internet
Author: User

1. Background

Based on the MySQL InnoDB source code, this paper makes some simple research on how to realize InnoDB non-locking consistency reading.

2. Basic Concepts 2.1 hidden fields

The implementation of MVCC in MySQL is mentioned in section 1.4 of the classic "high performance MySQL", which mentions

InnoDB implements MVCC by storing with each row of additional, hidden values that record when the row was created and whe n It was expired (or deleted). Rather than storing the actual times at which these events occurred, the row stores the system version number at the time Each event occurred. This is a number, increments each time a transaction begins. Each transaction keeps their own record of the current system version, as of the time it began. Each query has a to check for each row's version numbers against the transaction ' s version.

We know that the clustered index in InnoDB contains complete information about the data rows, and "high performance MySQL" Here is that the rows in the clustered index of InnoDB contain information about when the row records were created and when they were deleted. "High-performance MySQL" is described here as a convenience for the reader to understand. The rows in the clustered index actually contain so few hidden field information:

    • DATA_ROW_ID 6 bytes Internally stored monotonically incrementing row IDs
    • data_trx_id 6 bytes The latest transaction ID for adding or deleting a record
    • Data_roll_ptr 7-byte rollback segment pointer

You can refer to the storage/innobase/include/data0type.h header file for information here.

For a level two index record, it does not contain the above hidden information, but for a level two index, a page_max_trx_id is recorded on the page that represents the maximum transaction ID modified for the page data.
For information about this, refer to the storage/innobase/include/page0page.h header file

2.2 Read View

Read view is used to determine if a consistent read is visible to other transactions that modify the table.
Read view is defined in the Read0types.h header file, and the following is a look at some of these fields:

// 事务id>=m_low_limit_id的修改对于当前读不可见trx_id_t    m_low_limit_id;// 事务id<m_up_limit_id的修改对于当前读可见trx_id_t    m_up_limit_id;// 创建view的事务idtrx_id_t    m_creator_trx_id;// 创建view时处于active状态的读写事务列表,这里的ids_t可以简单看作是一个vectorids_t       m_ids;

The transaction definition in InnoDB (reference trx0trx.h header file) contains a field that represents the read View for the transaction.

ReadView*   read_view;

When the InnoDB is in a consistent read, it will determine if the current transaction's read view exists, and if it does not, get a new Read view (InnoDB has a mechanism to reuse the read view, so if there is no reusable read The view object will not be shown to the new one). Here is the Trx_assign_read_view method implementation:

ReadView*trx_assign_read_view(/*=================*/    trx_t*      trx)    /*!< in/out: active transaction */{    ut_ad(trx->state == TRX_STATE_ACTIVE);    if (srv_read_only_mode) {        ut_ad(trx->read_view == NULL);        return(NULL);    elseif (!MVCC::is_view_active(trx->read_view)) {        trx_sys->mvcc->view_open(trx->read_view, trx);    }    return(trx->read_view);}

Here's a look at how read view is initialized.

voidReadview::p repare (trx_id_t id) {Ut_ad (Mutex_own (&trx_sys->mutex)); m_creator_trx_id = ID;//TRX_SYS->MAX_TRX_ID is the currently minimal unassigned transaction ID. M_low_limit_no = m_low_limit_id = trx_sys->max_trx_id;//Copy the ID of the current read-only transaction to the M_ids in view.     if(!trx_sys->rw_trx_ids.empty ())    {copy_trx_ids (trx_sys->rw_trx_ids); }Else{m_ids.clear (); }//Trx_sys->serialisation_list is a list sorted by Trx->no when the transaction commits.     //This takes the first (if any) of the list as the m_low_limit_no for purge thread as the basis for whether to clean up undo.     if(Ut_list_get_len (Trx_sys->serialisation_list) >0) {Consttrx_t* Trx; Trx = Ut_list_get_first (trx_sys->serialisation_list);if(Trx->no < M_low_limit_no)        {m_low_limit_no = trx->no; }    }}voidReadview::complete () {//m_up_limit_id take the active transaction minimum ID. m_up_limit_id =!m_ids.empty ()?    M_ids.front (): m_low_limit_id;    Ut_ad (m_up_limit_id <= m_low_limit_id); m_closed =false;}

For the isolation level of Read Committed, the read view is closed after the end of the consistent read statement, and for the isolation level of repeatable read, the read view is closed until the end of the transaction after it has been created.

3 Read View how to determine visibility

The read view is outlined above, and the following is a look at how InnoDB determines whether a record is visible to the current transaction. The entrance here is storage/innobase/row/row0sel.cc the row_search_mvcc method.

3.1 Walk-through clustered index case

Assuming that the SQL query is a clustered index, the following Lock_clust_rec_cons_read_sees method is used to determine whether the record rec is visible to the current transaction.

boollock_clust_rec_cons_read_sees(    const rec_t*    rec,        dict_index_t*   index,    const ulint*    offsets,    ReadView*   view)   {    ut_ad(dict_index_is_clust(index));    ut_ad(page_rec_is_user_rec(rec));    ut_ad(rec_offs_validate(rec, index, offsets));        // 对于InnoDB处于只读模式或者表为临时表的情况永远都是可见的。    if (srv_read_only_mode || dict_table_is_temporary(index->table)) {        ut_ad(view == 0 || dict_table_is_temporary(index->table));        return(true);    }    // 获取行记录上的事务id。    trx_id_t    trx_id = row_get_rec_trx_id(rec, index, offsets);    // 判断是否可见。    return(view->changes_visible(trx_id, index->table->name));}

Next look at the implementation of the Readview::changes_visible method source code:

bool changes_visible(    trx_id_t        id,    const table_name_t& name) const    MY_ATTRIBUTE((warn_unused_result)){    ut_ad(id > 0);    // 如果行记录上的id<m_up_limit_id或者等于m_creator_trx_id则可见。    if (id < m_up_limit_id || id == m_creator_trx_id) {        return(true);    }    check_trx_id_sanity(id, name);    // 如果行记录上的id>=m_low_limit_id,则不可见。    if (id >= m_low_limit_id) {        return(false);    } else if (m_ids.empty()) {        return(true);    }    const ids_t::value_type*    p = m_ids.data();    // 二分判断是否在m_ids中,如果存在则不可见。    return(!std::binary_search(p, p + m_ids.size(), id));}

The basis for judging here

    • The transaction ID of the record is m_creator_trx_id, which is the modification of the current transaction and must be visible.
    • The logged transaction id<m_up_limit_id, stating that the transaction that modified this record was committed and therefore visible when the read view was initialized.
    • Logged transaction id>=m_low_limit_id, stating that when the read view is initialized, the transaction that modifies the record has not been opened (to be exact, it is not yet assigned to the transaction ID) and is therefore not visible.

If this is not satisfied, it will go to row_sel_build_prev_vers_for_mysql->row_vers_build_for_consistent_read the call, based on the information in the rollback segment to build the previous version information continuously until the current transaction is visible.

3.2 Go to the two-level index case
boollock_sec_rec_cons_read_sees(    const rec_t*        rec,        const dict_index_t* index,    const ReadView* view){    ut_ad(page_rec_is_user_rec(rec));    if (recv_recovery_is_on()) {        return(false);    elseif (dict_table_is_temporary(index->table)) {        return(true);    }    // 取索引页上的PAGE_MAX_TRX_ID字段。    trx_id_t    max_trx_id = page_get_max_trx_id(page_align(rec));    0);    return(view->sees(max_trx_id));}

The following is the implementation of the readview:sees, you can see in fact is to determine whether the page_max_trx_id is less than readview initialization of the minimum transaction ID, that is, to determine whether the maximum transaction ID on the modified page has been submitted at the time of the snapshot generation, simple and rude.

bool sees(trx_id_t id) const{    return(id < m_up_limit_id);}

Therefore, lock_sec_rec_cons_read_sees if the method returns True, then it is certainly visible, and returning false is not necessarily invisible, but the next step is to use the clustered index to get the visible version of the data.
Prior to this INNODB will use the ICP (index Push down) according to the index information to determine whether the search conditions are satisfied, if not satisfied that there is no need to go to the cluster index to take, if the ICP is determined to meet the criteria, it will go to the row_sel_get_clust_rec_for_mysql method of the clustered index to take the visible version of the data.

4. Summary

Through InnoDB source code, this paper introduces the basic data structure and concept of Read view and how to judge the visibility by creating a read view in InnoDB. In fact, read view is a snapshot of an active transaction, and both the RC and RR isolation levels reuse the same structured read view to determine visibility, unlike the read view's life cycle depending on the corresponding isolation level.

5. Reference

Official MySQL Handbook
Database Core Monthly

Preliminary discussion on InnoDB MVCC source code Realization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.