How does a "serial" relational database work? (5)-B+tree Index

Source: Internet
Author: User

Although the two-fork search tree described in the previous section behaves well when querying for a specified value, there is a big problem when querying multiple nodes between two values. Because you need to traverse the nodes of the entire tree and check that each node is within the specified interval. and traversing the entire tree is random disk IO ( translator Note: Random io causes frequent head changes, so it is time consuming compared to sequential IO ), so we need to find a way to do a more efficient range query . To solve this problem, the modern database corrects the previous two-fork search tree, which we call the revised data structure as B+tree:

    • Only the leaf nodes (the nodes at the bottom of the tree, the orange nodes in the graph) store information, i.e. the exact position of the row in the table, i.e. rowID;
    • The other nodes are only used to route to the correct node when searching.

As shown in, with more nodes, actually these additional nodes are decision nodes that will help us find the right node (the exact location of the actual storage row), but its search complexity is still O (log (N)), The biggest difference between the search and the binary tree is that the leaf node holds the pointer to the next node (the Translator notes: it can be seen as an ordered one-way list).

Using B+tree, if we need to query for values between 40 and 100:

    • You just need to find the 40 node, or if the 40 node does not exist, find the nearest node greater than 40;
    • Based on the pointer held by the node found in the previous step, the Vine melon finds the next node until the 100 node cutoff is found.

Translator Note: In fact, it also indirectly explains how the database is built B+tree index, it should be from the bottom up to build.

Assuming the actual need to find the M nodes in a tree with N nodes, the time complexity is M+log (n). Because log (N) is required to locate the start node, it is possible to find m nodes sequentially based on the pointer, which actually consumes M. M+log (n) compared to the previous binary search book, you do not need to search the entire tree, just search m+log (n) nodes, which means less disk IO consumption, if M is small (e.g. 200 lines), and N is large (for example: 1 000 000 rows), the gap between the two is huge.

But we found another problem, if the database uses the B+tree index, and you delete a row in a table, then:

    • You must maintain the order of the nodes in the whole tree, otherwise you cannot find the correct node in the chaotic tree species.
    • You must ensure that the height of the tree is as low as possible, otherwise the complexity of a single-value query or a range query will be close to O (n) infinitely from O (log (n)). ( Translator Note: When the height of the tree is the number of nodes, then the shape of the tree is like a vertical line, then to find a node actually need to traverse the whole tree )

In short, b+tree requires self-ordering and self-balancing. While we can efficiently and quickly delete and insert, this is a cost: in the B+tree database The cost is O (log (N)). This is why you often see that creating too many indexes is not a good idea , which reduces the speed of inserting/deleting/updating rows in the table. The reason for this is that every insert/delete/update causes the database to update (the translator Note: Refers to the self-sorting and self-balancing ) Table's index tree, and this update each index cost is O (log (N)) ( Translator Note: The price referred to above ). Also, increasing the index will cause the transaction manager to load more (the next chapters will describe it).

For more details on b+tree, you can view the B+tree instructions in Wikipedia. If you're looking for an implementation example of B+tree in the database, you can see the two articles from the MySQL core developer's contribution to how InnoDB handles the index: the physical structure of InnoDB index pages, B+tree Index structures in InnoDB.

How does the

Serial relational database work? (5)-B+tree index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.