MySQL indexing principle b-tree, B+tree

Last Update:2016-05-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One: Create an index using B-tree

B-tree

To describe the B-tree, first define a data record as a two-tuple [key, Data],key is the key value of the record, for different data records, the key is not the same; data records for data other than key. Then B-tree is a data structure that meets the following criteria:

1. D is a positive integer greater than 1, called the degree of B-tree.

2. h is a positive integer, called the height of the b-tree.

3. Each non-leaf node consists of a n-1 key and n pointers, of which d<=n<=2d.

4. Each leaf node contains at least one key and two pointers, with a maximum of 2d-1 keys and 2d pointers, and the pointer of the leaf node is null.

5. All leaf nodes have the same depth, equal to the tree height H.

6. The key and the pointer are spaced each other, and the nodes are pointers at both ends.

7. Keys in one node are not descending from left to right.

8. All nodes make up the tree structure.

9. Each pointer is either null or points to a different node.

10. If a pointer is on the leftmost node and is not NULL, all key points to the node are less than V (key1), where V (key1) is the value of node's first key.

11. If a pointer is on the rightmost node and is not NULL, all keys to the node are greater than V (Keym), where V (keym) is the value of the last key of node.

12. If a pointer is keyi and keyi+1 and NOT NULL for the left and right neighboring keys of node nodes, then all keys to the node are less than V (keyi+1) and greater than V (keyi).

Figure 2 is a d=2 b-tree.

Figure 2

Because of the characteristics of B-tree, the algorithm of retrieving data by key in B-tree is very intuitive: first, binary lookup from the root node, if found to return the corresponding node of data, otherwise the corresponding interval pointer to the node to find recursively, until the node is found or a null pointer, the former lookup success , the latter lookup failed. The pseudo-code for the lookup algorithm on B-tree is as follows:

Btree_search (node, key) {    if (node = = NULL) return null;     foreach (Node.key)    {        if (node.key[i] = = key) return node.data[i];        if (Node.key[i] > key) return Btree_search (Point[i]->node);    }     Return Btree_search (Point[i+1]->node);} data = Btree_search (root, My_key);

About B-tree has a series of interesting properties, such as a degree of D B-tree, set its index N key, the upper limit of its tree height h is logd ((n+1)/2), retrieve a key, its search node number of the progressive complexity of O (LOGDN). as can be seen from this point, B-tree is a very efficient index data structure.

In addition, since inserting deletes a new data record destroys the nature of the b-tree, it is necessary to do a split, merge, transfer, etc. to preserve the b-tree nature of the tree when inserting and deleting it, and this article does not intend to discuss the contents of B-tree completely. Because there is already a lot of information detailing the mathematical nature of b-tree and the insertion and deletion algorithm, interested friends can find the corresponding information in the reference column at the end of this article to read.

B+tree

B-tree has many variants, the most common of which is b+tree, such as MySQL, which generally uses b+tree to implement its index structure.

Compared with B-tree, B+tree has the following differences:

1. The pointer to each node is capped at 2d instead of 2d+1.

2. The inner node does not store data, only the key, and the leaf node does not store the pointer.

Figure 3 is a simple b+tree schematic.

Figure 3

Since not all nodes have the same domain, the b+tree and inner nodes are generally of different sizes. This is different from B-tree, although the number of keys and pointers stored in B-tree may be inconsistent, but the domain and the upper bound of each node are consistent, so in the implementation b-tree often apply for the same size of space for each node.

In general, B+tree is more suitable than b-tree to achieve the external storage index structure, the specific reason and the principle of external memory and the principle of computer access, will be discussed below.

B+tree with sequential access pointers

Generally, the b+tree structure used in database system or file system is optimized on the basis of classical b+tree, and the sequential access pointers are added.

Figure 4

As shown in 4, a pointer to an adjacent leaf node is added to each leaf node of B+tree, and a b+tree with sequential access pointers is formed. The purpose of this optimization is to improve the performance of the interval access, example 4, if you want to query key from 18 to 49 of all data records, when the 18 is found, simply follow the node and the pointer in order to traverse to all data nodes, and greatly mentions the efficiency of the interval query.

This section provides a brief introduction to B-tree and B+tree, and the next section, combined with the memory access principle, explains why the current b+tree is the preferred data structure for the database system implementation index.

Why use B-tree (B+tree)

As mentioned above, data structures such as red and black trees can also be used to implement indexes, but file systems and database systems generally use B-/+tree as the index structure, and this section will discuss B-/+tree as the theoretical basis of index based on the knowledge of computer composition principles.

In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this way, the index lookup process will generate disk I/O consumption, relative to memory access, I/O access to the consumption of a few orders of magnitude, so the evaluation of a data structure as an index is the most important indicator of the number of disk I/O operations in the process of incremental complexity. In other words, the structural organization of the index minimizes the number of disk I/O accesses during the lookup process. The following first describes the memory and disk access principles, and then combined with these principles to analyze the efficiency of B-/+tree as an index.

MySQL indexing principle b-tree, B+tree

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL indexing principle b-tree, B+tree

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MySQL indexing principle b-tree, B+tree

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support