MySQL series: page Structure Analysis for innodb source code analysis

Source: Internet
Author: User

MySQL series: page Structure Analysis for innodb source code analysis
In the tablespace structure analysis, we know that the minimum physical storage allocation unit of innodb is the page, in the MySQL-3.23 version of the source code, the page only has two types of pages, one is index page, one is the undo page. Its Type value is defined in fil0fil. h. The data index page of FIL_PAGE_INDEX belongs to this type in inode pages and xdes pages of tablespaces. FIL_PAGE_UNDO_LOG transaction rollback log page. Here we mainly analyze the index page. The undo log page is introduced in the department of affairs. Both index page and undo log page are composed of three parts: page_header, page_body, and page_trailer. Analyzer's three-part structure for index page. 1. the page headerpage header is the page header information, which occupies 38 bytes and stores the following information: space id FIL_PAGE_OFFSET 4-byte space of the table space to which the page belongs page no, generally, the physical offset of the tablespace is FIL_PAGE_PREV 4 bytes. The page no on the previous page (the leaf nodes of B + tree are serialized through the linked list, with a forward-backward relationship) page no FIL_PAGE_LSN on the last page of FIL_PAGE_NEXT 4 bytes the maximum redo log lsn when the record is changed 8 bytes, generally, FIL_PAGE_TYPE 2-byte page type FIL_PAGE_FILE_FLUSH_LSN is used for redo log recovery. The 8-byte space file is flush. The redo log lsn is set only in the first page of space. G_NO 4 bytes the last archive log file number to be archived. This value will only be set on the first page of space 2. page trailer is the last 8 bytes at the end of the file. The low 4 bytes are used to represent the checksum of data in the page. The high 4-bit is used to store part of the Information of FIL_PAGE_LSN, the Calculation of checksum is obtained through the buf_calc_page_checksum function, which is basically obtained by using ut_fold_binary as a parameter for data in the page. In subsequent versions, page checksum allows you to select other algorithms for calculation. These two words will be modified when the page is saved to the physical disk, and will be verified when the page is read from the physical disk. The purpose is to ensure the integrity of the page. 3. page body index page body consists of five parts: body header, recorders, free recorders, free heap, and page directory. The structure of the body header is defined as follows:

# DefinePAGE_N_DIR_SLOTS 0/* number of slots owned by page directory */# define PAGE_HEAP_TOP 2/* offset of idle position in heap */# define PAGE_N_HEAP 4/* Number of records in heap, number of records allocated, free rec + PAGE_N_RECS + 2 */# define PAGE_FREE 6/* offset to the free space in the page */# define PAGE_GARBAGE 8/* Number of deleted record bytes, used to re-allocate */# define PAGE_LAST_INSERT 10/* the position of the last inserted record */# define PAGE_DIRECTION 12/* The Operation direction of the record, PAGE_LEFT PAGE_RIGHT PAGE_SAME_REC PAGE_SAME_PAGE PAGE_NO_DIRECTION */# define PAGE_N_DIRECTION 14/* Number of records inserted in the Same Direction */# define PAGE_N_RECS 16/* Page, excluding infimum and supremum */# define PAGE_MAX_TRX_ID 18/* modifying the maximum transaction ID of the current page */# define PAGE_HEADER_PRIV_END 26 # define PAGE_LEVEL 28/* the position of the current page on the layer of the Index Tree */# define PAGE_BTR_SEG_LEAF 36/* B + segment header of the segment of the tree leaf node */define PAGE_BTR_SEG_TOP (36 + FSEG_HEADER_SIZE) /* segment header information of the segment of the non-leaf node in B + tree */
Innodb puts the space available for a real PAGE in front of a heap. When a record needs to be inserted, innodb first checks whether appropriate records can be used in page free. If not, A rec_t record block of the specified size will be allocated on the offset of PAGE_HEAP_TOP, and the record primary key value will be inserted into the recorders. In this case, how do recorders organize them? 3.1 The record organization method is in the index page body, and the rec (record) organization method is one-way linked list, the first record and the last record are the virtual records defined by innodb, namely infimum and supremum. The physical Substances of these two records are the records followed by the body header. The offset is as follows:
# Define PAGE_DATA (PAGE_HEADER + 36 + 2 * FSEG_HEADER_SIZE) # define PAGE_INFIMUM (PAGE_DATA + 1 + REC_N_EXTRA_BYTES) /* location of the minimum index record on this page */# define PAGE_SUPREMUM (PAGE_DATA + 2 + 2 * REC_N_EXTRA_BYTES + 8)/* location of the maximum index record on this page */
These two records will be created when the index page is created. For more information, see the page_create function. Other records are inserted between them:
3.2 In addition to valid records, the body free list also contains records that have been used but deleted between pages, such records are not directly recycled to heap (because rec is organized in a logical order and cannot be directly recycled to heap). innodb uses the page free recorders list for organization and management, the position is located through PAGE_FREE in the body header, and PAGE_FREE points to the offset in the page of the first deleted rec record. As follows:
In addition to PAGE_FREE, the body header uses PAGE_GARBAGE to manage the size of released records. This value indicates the total space occupied by all deleted records, in this way, deleted records can be reused to improve space usage. In addition to recorders and free recorders, there is a continuous space used for record allocation. Only when there is no suitable record space in free recorders, in this continuous space. The address offset of this space is in PAGE_HEAP_TOP. 3.3 directory slotsinnodb defines a directory slots called directory after the body to quickly find records. Each slot occupies two bytes and uses reverse storage, that is to say, the slot of the mifimum is always in the last two bytes of the body, and so on. Each slot can store multiple records. The following is the description range of the number of slot records (n_owned ):

Infimum slot owned

Only one record

Supremum slot owned

1 to 8 Records

Common slot owned

4 to 8 Records

If the number of records managed by the common slot or supremum is 8 when a new record is inserted, the supremum will be split to generate a slots, so its range is from 4. The following is a link of directory:
From the above, we can see that the owned in the rec to which the slot points represents the number of rec in the forward direction under the jurisdiction of this slot, and the owned of the rec under the jurisdiction in the middle = 0. Only the slots to which the corresponding record belongs can be found through the binary lookup of directory, and the corresponding record can be precisely located through the binary lookup within owned. This design can reduce the usage of directory on page space and improve the search efficiency. Description of slot-related functions: page_dir_split_slot refers to the slot Split function. When a new record is inserted within the scope of a slot, the maximum number of records managed by the slot is exceeded, then the system splits the range evenly. Page_dir_balance_slot: the slot balancing function. When a record is deleted within the scope of a slot, the number of records managed by the slot is smaller than its minimum range, and the adjacent slot is balanced. Whether balanced or split, it is the maximum range to improve the efficiency of directory storage space and record search. 3.4 index page Structure Diagram
The index page of innodb provides three types of operations for records: query records, insert records, and delete records. Page operations are implemented in page0cur. * in these operations, innodb defines a page_cur_t, that is, the page cursor. It is a logical concept cursor and is only valid in the memory. This page cur is the record pointing to the current operation. Definition:
Typedef struct page_cur_struct {byte * rec;/* pointer to cursor record */} page_cur_t;
Because all page operations must read pages from the physical disk to the memory for logical page construction, and then use page_cur for search, insert, and delete operations. 4.1 query operations we know that in the innodb B + Tree index search, only the index page of the corresponding record can be found. After the page is found, records will be searched on the page, the search process on this page is as follows: 1. first, use the key to perform binary search in the directory slots of the page and find the slot 2 corresponding to the key. because the slot is used to manage multiple records (Common slot owned = []), it will perform a binary search for the records managed by the corresponding slot based on the KEY until the record is found. In the page_cur_search_with_match function of page0cur. c, this function records the number of matched bytes and the number of hops in the binary search process. It is worth noting that this function supports search in four modes, which are defined as follows:
# DefinePAGE_CUR_G 1/* greater than query */# definePAGE_CUR_GE 2/* greater than or equal to query */# definePAGE_CUR_L 3/* Less Than query */# definePAGE_CUR_LE 4/* Less than or equal query */
4.2 Before inserting a record, the insert operation will locate the position to be inserted by inserting the record KEY. The search mode is PAGE_CUR_LE. The specific steps are as follows: 1. use the record key and record search function to find the location to insert (Operation page cur points to the previous record inserted) 2. modify the association between the record and the inserted record. 3. modify the page cursor direction counter and page last insert 4. modify the owned value of the slot. If the value is out of the range, split slot 5. because the insert record is modified on the page, the mtr log of the insert record is recorded. To restore the page when an exception occurs. The mtr log structure for inserting records is complex. The following is its structure:
Vc + jrNXiuPax5MG/release/b7doaMK0rvM9bzHwry1xLLlyOvKvtLizbyjugo8aW1nIHNyYz0 = "http://www.2cto.com/uploadfile/Collfiles/20141229/20141229093158128.jpg" alt = "\">
The whole record insertion process is implemented in the page_cur_insert_rec_low function in page0cur. c.
4.3 deleting an operation record is first determined by deleting the record key or record address. the procedure is as follows: 1. determine page cur through record information 2. add an mtr log for deleting a record. 3. delete and change the associations before and after the record 4. set page last insert and other header information (n _ rec) 5. insert the record to the starting position of the body header free List and modify PAGE_GARBAGE 6. set the owned of the slot. If it is smaller than the minimum value within the jurisdiction, perform slot balancing. The deleted mtr log format is as follows:
Delete record:

5. Summary The index page structure of innodb is a storage structure that efficiently utilizes the space. It not only takes into account the query speed, but also takes into account the storage efficiency of the storage space reasonably. Innodb finds a better balance between the two. In addition to basic insert/delete query operations, the page also provides features such as batch copy records and batch delete records. These operations are based on basic insert and delete operations. The batch operation functions are as follows:

Page_copy_rec_list_end

Copy all records after rec in page to new page, including rec

Page_copy_rec_list_start

Copy all records in the page before rec to the new page, excluding rec

Page_delete_rec_list_end

Delete all records after rec in the page, including rec

Page_delete_rec_list_start

Delete all records before rec in the page, excluding rec

Page_move_rec_list_end

Move all records after rec in the page to the new page, including rec, which are deleted on the page.

Page_move_rec_list_start

Move all records before rec in the page to the new page, excluding rec. These records are deleted on the page.

Innodb provides these functions to facilitate upper-layer calls. By analyzing the page structure, we can understand the record organization method of innodb and the indexing method of B + Tree.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.