From MySQLBug #67718 to discuss the Split Optimization of B + tree indexes

Source: Internet
Author: User
Problem background today, we can see that the DBA team of Twitter has released its latest MySQL branch: Taobao, the most important improvement of which is to fix the MySQL Bug #67718: InnoDBdrasticallyunder-fillspagesincertainconditions. A detailed description of the Bug and how

Background: Today, the DBA team of Twitter published its latest MySQL branch: Changes in Twitter MySQL 5.5.28.t9. The most important improvement of this branch is that it fixed the MySQL Bug #67718: innoDB drastically under-fills pages in certain conditions. A detailed description of the Bug and how



Problem background

Today, the DBA team on Twitter published its latest MySQL branch: Changes in Twitter MySQL 5.5.28.t9. One of the most important improvements to this branch is to fix MySQL Bug #67718: innoDB drastically under-fills pages in certain conditions. For a detailed description of this Bug and how to reproduce it, read the above link and briefly describe the problem corresponding to this Bug:

The index split policy of InnoDB has a problem with index page split under certain circumstances. As a result, each split page only stores one record, and the page space utilization is extremely low.

This Bug aroused my interest, therefore, I am going to talk about the structure of the B + tree index, the split of the B + tree, the optimization of the split operation of the B + tree, and the cause of Bug #67718, and some personal suggestions on how to fix this Bug.

B + Tree Index Structure

Traditional relational databases (Oracle/MySQL/PostgreSQL ...), Its main index structure uses the B + tree. What's more, the table data of the InnoDB engine is stored in the form of B + tree ., It is a classic B + tree structure chart (layer-2 B + tree, fan-out of each page is 4 ):

Note:

  • This B + tree is based on the B + tree structure implemented by InnoDB;

  • The B + tree contains five user records, which are 1, 2, 3, 4, and 5;

  • The records on the upper-layer pages of the B + tree store the Low Key values on the lower-layer pages );

  • All data of the B + tree is stored on the leaf node of the B + tree;

  • All pages of B + leaf nodes are linked through a two-way linked list;

Split B + tree

On the basis of the B + tree, inserting records 6, 7, and B + tree structure will produce the following changes:

Insert record 6. The new B + tree structure is as follows:

Insert RECORD 7. Because only four records can be stored on the leaf page and record 7 can be inserted, the leaf page is split and a new leaf page is generated.

Analysis of traditional B + tree page splitting operations:

  • Split Based on 50% of the data volume on the original page. For the current split operation, 3, 4 records are retained on the original page, 5, 6 records, and moved to the new page. Finally, insert the new record 7 to the new page;

  • 50% advantages of the splitting policy:

    • After the split, the space utilization of the two pages is the same; if the new insertion is randomly selected on the two pages, the next split operation will be triggered later;

  • 50% disadvantages of the split strategy:

    • Low space utilization: according to the traditional 50% page splitting policy, the space utilization of index pages is about 50%;

    • Split frequently:Incremental insert)Every time two new records are inserted, the rightmost leaf page is split again;

Question:

How can we optimize the traditional 50% split strategy? Next, let's look at it.

Optimization of B + tree split operations

Due to the shortcomings of the traditional 50% split strategy, all relational databases, including Oracle, InnoDB, and PostgreSQL, And the Oscar database I have previously developed, the NTSE and TNT storage engines currently under development are optimized for incremental/progressive insertion of B + tree indexes. After optimization, the above B + tree indexes are inserted after record 6 is completed, and record 7 is inserted, resulting in split. The new B + tree structure is shown in:

Comparing the structure of the B + tree index after two inserted records 7, we can find that there are many differences between the two:

  • The new split policy does not move any records on the original page when inserting 7, but only writes the newly inserted RECORD 7 to the new page;

  • The utilization of the original page is still 100%;

  • Advantages of splitting policy optimization:

    • The cost of index splitting is small: records do not need to be moved;

    • The probability of index split is reduced: if the next insert is still incremental, four records need to be inserted to re-split the page. Compared with the 50% splitting strategy, the probability of splitting is halved;

    • Increase the space utilization of index pages: The New split policy ensures that the pages before the split still maintain a 100% utilization rate, improving the index space utilization;

  • Disadvantages of optimizing the splitting policy:

    • If a new insert does not meet the incremental insert condition, but is inserted to the original page, the original page will be split again, increasing the probability of split.

Therefore, this optimization splitting policy is only effective for incremental and decreasing inserts. For random inserts, the significance of optimization is lost, which leads to a higher split probability.

In the implementation of InnoDB, each index page is maintained at the position of the last insert, And the last insert is an incremental/descending identifier. Based on this information, InnoDB can determine whether the newly inserted records on the page still meet the increasing/decreasing constraints. If the constraints are met, the optimized splitting policy is adopted. If the constraints are not met, the split policy is returned to 50%.

However, the implementation of InnoDB may cause a Bug mentioned below.

Bug #67718 causes

As mentioned in Bug #67718, the index page utilization of InnoDB is extremely low under certain insert conditions, which is caused by incorrect use of the optimization split policy of InnoDB.

Consider the following B + tree. The existing user data is 100, 100, and after records are inserted, the index page is split, record 100 is inserted to the new page after the split:

Because inserting 100 can meet the increasing judgment conditions, we adopt an optimized splitting policy to split data without moving data. The new record 100 is inserted into the new page, the last insert position of the original page remains unchanged on the 6th record, and the original page remains unchanged with the incremental insert ID.

In this case, consider inserting records 9, 8, and 7 consecutively. What kind of B + tree will we get? At this time, the global incremental insert is changed to the global incremental insert.

The B + tree structure after record 9 is inserted:

Because of the InnoDB B + tree, the upper-layer node stores the minimum value (Low Key) in the lower-layer page, so record 9 is still inserted to the [3, 4, 5, 6] page, and the page is full, yes. In addition, it is determined that record 9 still meets the incremental judgment condition on the page (Last_Insert_Pos = 6, 9 is inserted to 6, and it is inserted incrementally ). Therefore, an optimized splitting policy is used to generate a new page insert record 9, and the original page record remains unchanged.

The B + tree structure after record 8 is inserted:

Insert RECORD 7. The same is true. RECORD 7 exclusive to a page using an optimized split policy.

Analysis:

  • Bug # main side effect of 67718

    • Is that the page utilization is extremely low, and each index leaf page can only store one record;

  • Bug # main cause of 67718

    • InnoDB uses an optimized index splitting policy for errors. InnoDB determines whether the incremental/progressive insertion mode is met. It uses page-level judgment. Even if the global mode changes, as long as the mode recorded on the page remains unchanged, the optimized index splitting policy will still be selected;


Bug fixing #67718 suggestions

I also encountered this problem when I was optimizing the index split of the Oscar database. The solution at that time was: split each time. If the inserted record is the last record on the page, at least the previous record is split into a new page. With this policy, insertion of, series will generate the following series B + trees:

Insert the B + tree after, 9, and 8:

When 100 is inserted, move the last record of the original page to the new page (Move 6 to the new page). The record in the new page is 6,100 ]. Next, insert 9 and 8 will be inserted into the new page. No split operation will be generated, the space utilization is improved, the index page splitting is reduced, and the Bug #67718 problem is solved.

Of course, there must be better strategies. You are welcome to discuss them together!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.