MySQL kernel parsing: InnoDB page storage structure-1

Source: Internet
Author: User

This article introduces the index data page storage structure of InnoDB, mainly introduces the overall structure of the data page, and the detailed structure of the page will be described in another article.

1. B + tree

Before reading this article, you must first know some basic knowledge about B-tree. A table in InnoDB may contain multiple indexes, and each index is stored in the B + tree. Indexes include clustered indexes and secondary indexes. Clustered indexes use the primary key of the table as the index key and contain all the fields of the table. Secondary indexes only contain the index key and clustered index key (Primary Key), excluding other fields. Each index is a B + tree. Each B + tree consists of many pages, and the size of each page is generally 16 kb. From the perspective of the organizational structure of the B + tree, the pages of the B tree can be divided:

  • Leaf node: a page with a B-tree hierarchy of 0, storing all recorded content
  • Non-leaf node: a page with a B-tree hierarchy greater than 0 stores only the index key and page pointer.

The structure of a B + tree is roughly

Figure 1 overall structure of the B + tree

We can see that pages of the same level are connected by a two-way linked list. Generally, from the leftmost leaf node of the B + tree to the right, you can get all the data of the B + tree from small to large. Therefore, leaf nodes have the following features:

  • The data on the page is sorted by the index key.
  • The index key value of any record on the page is no less than any record on its left brother page.

The following describes two aspects:

  • Page Storage Format
  • How to ensure order of page records

2. Page Storage Format

The storage format of a page is shown as follows:

Figure 2 InnoDB page storage format

The storage of a page consists of the following parts:

1.Page header): Records the page control information, which consists of 150 bytes, including the page pointer and page space usage of the left and right brothers of the page. The detailed description of the page header will be described in the next article.

2.Minimum virtual record and maximum virtual record: Virtual records stored in two fixed locations are not stored. The minimum virtual record is smaller than any record, and the maximum virtual record is larger than any record.

3.Record heap): Indicates the orange part. The record space allocated to the page is also the real storage area of index data. Record heap is divided into two types:Valid recordAndDeleted records. A valid record is a record that the index is normally used. Deleting a record indicates that the index has been deleted and is not in use, such as the dark blue part. As records are updated and deleted more frequently, the more deleted records in the record stack, the more holes (fragments) will appear ). These deleted records are connected toFree Space linked list.

4.Unallocated Space: Refers to the storage space not used by the page. As the page continues to be used, unallocated space will become smaller and smaller. When a new record is inserted, the system first tries to obtain a suitable storage location (sufficient space) from the free space linked list. If not, the system will apply in the unallocated space.

5.SlotZone: Slot is a pointer to some valid page records. Each slot occupies two bytes and stores the offset of the record relative to the first address of the page. If there are n valid records on the page, the number of slots is N/8 + 2 ~ N/4 + 2. The next section describes the slot area in detail, which is the key to recording the order of pages and binary search.

6.Page tailer): The last part of the page, which occupies 8 bytes, mainly stores page verification information.

The page header, the Maximum/minimum virtual record, and the end of the page are fixed storage locations on the page. The page structure we are most concerned about may be the record heap and slot area. How can we ensure the order of page records? See the introduction in the next section.

3. How to ensure order

As mentioned above, the data on the data page is ordered by the index key. So how can we ensure this orderliness? Let's analyze some possible practices first.

 

Normal practice 1:

Rec1 <rec2 <rec3... <rec_m. Therefore, if you want to insert a new record rec index key between rec2 and rec3, You Need To rec3 record and subsequent record heap content to the right of memmov a rec size space, then Insert the new record rec. This ensures that the data is ordered, but the performance will be very low.

Common Practice 2:

In another way, if the page records are not sorted by storage space, they are connected by a linked list and sorted from the chain header to the end of the chain. Insert a record. You only need to modify the pointer of the two records before and after the record, so as to avoid moving the record while ensuring the orderliness. However, the problem is that the linked list cannot use binary search. This design will lead to low query efficiency.

 

Literary practices:

To reduce the movement of page records, ensure the order of page data, and meet the requirements of binary search, slot areas can be allocated in the space at the end of the page. Each slot occupies two bytes and stores the offset of the record relative to the page. In addition, the slot records are sequentially ordered from right to left (from high address to low address). in example 2, the record corresponding to S2 must be greater than the record pointing to S1.

Assume that a slot corresponds to a record, for example, slot1 corresponds to rec_a, slot2 corresponds to rec_ B, slot3 corresponds to rec_c, and so on, and rec_a <rec_ B <rec_c .... If you insert a new record rec between rec_a and rec_ B, you only need to translate the space between slot2 and slot_n to 2 bytes to the left, and the original slot2 space points to the new record rec. In this way, you do not need to care about the storage location of REC when inserting records. You only need to translate several slot spaces and ensure that the data is sorted by slot.

If you search for records, because each slot is fixed in size and the space is continuous, you can locate the records through binary search.

I personally think this is a relatively literary practice, but let's look at the real practice of InnoDB comparing 2B.

 

InnoDBReal 2BPractice:

 

Figure 3 logical organizational structure of a page record

In fact, the InnoDB page records have been maintained through a linked list, and are increased from the chain header to the end of the chain (common method 2 ). As mentioned before, binary search cannot be used. To solve this problem, InnoDB uses several slots to point to some records of the chain, rather than one-to-one correspondence between slots and records, as shown in 3.

The slot points to records that are still ordered from right to left. We use rec [s2] to indicate records pointed to by S2, so rec [s2]> rec [S1]. In addition, the record between two slots is calledSlotBranch, The part enclosed by the three dotted lines, which indicates the slot branch to which S2 points. Slot maintained by InnoDBBranch heightThe value range is 4-8. If the height of a branch is greater than or insufficient, the branches are split and merged.

How to query?

If you query record R1 on the page (for simplicity, assume that the index key is unique ). First, locate Slot Number X through binary search to meet

REC [X-1] <R1 <= rec [x]

Record R1 either does not exist or is in slot branch X. The next step is to traverse this branch and find the real record. However, you can only traverse the branches one by one and cannot use binary search.

How to insert?

First, the slot branches and insertion locations are determined through the query method. In the free space linked list or unallocated space, the space is obtained and the recorded content is written. The slot height is increased by 1, maintain the relationship between the original linked list.

After a record is inserted, if the height of the slot branch is greater than 8, the Branch is split into two sub-chains and one slot is applied for at the same time (Translating the slot and the space behind it ).

 

What are the benefits of this design?

What are the advantages of this design in terms of literary practices? After thinking for a long time, only one of them isReducedSlotAllocation and movement. However, this design has many disadvantages.

  • From the bucket, although the slot is reduced, but the record needs to be linked with pointers (also two bytes), there is no need to link the record in order in literature and art, as a result, the space usage is increased. (In addition, the maintenance of the height of the branch chain also costs space, which will be detailed in the next article ).
  • In terms of access efficiency, binary search cannot be used completely. Binary Search + sequential traversal is required, and query performance is also discounted.
  • The complexity greatly increases the complexity of the Code.

 

Is this design needed to achieve better performance in some clever places? This is also possible, but it has not been found yet. Some of the databases I know use the design scheme of "literature and art.

In fact, the overall storage structure of the fixed-length page is similar, mainly composed of these parts in Figure 2. The specific difference may be the detailed content of the page header and the storage format of the record, which will be introduced in the next article.

 

 

Reference

MySQL technology insider-InnoDB Storage Engine

Http://hedengcheng.com /? P = 118

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.