B tree, B-tree, B + tree, and B * tree

Source: Internet
Author: User

B tree

That is, the binary search tree:
1. All non-leaf nodes have at most two sons (left and right );
2. All nodes store a keyword;
3. The left pointer of a non-leaf node points to the subtree smaller than its keyword, And the right Pointer Points to the subtree larger than its keyword;
For example:


B-tree search starts from the root node. If the query keyword is the same as the node keyword, it hits; otherwise, if the query keyword is smaller than the node keyword, it enters the left son; if it is larger than the node keyword, it enters the right son; If the left son or right son pointer is empty, the report cannot find the corresponding keyword;
If the number of left and right Subtrees of all non-leaf nodes of Tree B is almost (balanced), the search performance of Tree B approaches binary search; however, it has the advantage of Binary Search over the continuous memory space: Changing the B-tree structure (inserting and deleting nodes) does not need to move large segments of memory data, or even constant overhead;
For example:


However, after multiple inserts and deletions, Tree B may have different structures:


The right side is also a B-tree, but its search performance is linear. The same keyword set may lead to different tree indexes. Therefore, to use Tree B, we should also consider keeping the structure of the Left graph as much as possible, and avoiding the structure of the right graph, that is, the so-called "balance" problem;
The B-tree actually used is based on the original B-tree with a balanced algorithm, that is, the "balanced binary tree". The key to balancing a binary tree is how to maintain a balanced distribution of B-Tree nodes; the balancing algorithm is a strategy for inserting and deleting nodes in Tree B;

B-tree

Is a multi-path search tree (not binary ):
1. Define any non-leaf node with a maximum of M sons; and M> 2;
2. The number of sons at the root node is [2, m].
3. The number of non-leaf nodes except the root node is [m/2, m];
4. Each node stores at least m/2-1 (rounded up) and at most M-1 keywords (at least 2 keywords );
5. Number of keywords for non-leaf nodes = number of pointers to Son-1;
6. Non-leaf node keywords: K [1], K [2],…, K [M-1]; and K [I] <K [I + 1];
7. Non-leaf node pointer: P [1], p [2],…, P [m]; where P [1] points to a subtree with a keyword less than K [1], p [m] points to a subtree with a keyword greater than K [M-1, other P [I] points to the subtree where the keyword belongs (K [I-1], K [I;
8. All leaf nodes are on the same layer;
For example, (M = 3 ):


B-tree search: starts from the root node and performs a binary search for the keyword (ordered) sequence in the node. If hit, the query ends. Otherwise, the son node in the search keyword range is entered; repeat until the corresponding son pointer is null or is already a leaf node;
B-Tree features:
1. The set of keywords is distributed in the entire tree;
2. Any keyword appears only in one node;
3. The search may end at a non-leaf node;
4. The search performance is equivalent to performing a binary search in the complete set of keywords;
5. Automatic hierarchical control;
Because the non-leaf nodes except root nodes are restricted, at least m/2 sons are contained, and the minimum utilization of nodes is ensured. The lowest search performance is as follows:


M indicates the maximum number of Subtrees for non-leaf nodes and N indicates the total number of keywords;
Therefore, the performance of B-tree is always equivalent to binary search (irrelevant to m value), so there is no B-tree balance problem;
Due to the limitation of M/2, if the node is full when inserting the knot, you need to split the knot into two nodes each occupying M/2. When deleting the knot, merge two sibling nodes with less than M/2;

B + tree

The B + tree is a variant of the B-tree and also a multi-path Search Tree:
1. Its definition is basically the same as that of B-tree,:
2. The number of subtree pointers and keywords for non-leaf nodes is the same;
3. the subtree pointer P [I] for non-leaf nodes, pointing to the subtree with the key value [K [I], K [I + 1]) (B-tree is an open interval );
5. Add a chain pointer to all leaf nodes;
6. All keywords appear at the leaf node;
For example, (M = 3 ):


The search for B + is basically the same as that for B-trees. The difference is that B + trees hit only when they reach the leaf node (B-trees can hit non-leaf nodes ), its performance is also equivalent to performing a binary search in the full set of keywords;
Features of B +:
1. All keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list are exactly ordered;
2. It is impossible to hit non-leaf nodes;
3. Non-leaf nodes are equivalent to leaf node indexes (sparse indexes), and leaf nodes are equivalent to data layers that store (keywords) data;
4. More suitable for file index systems;

B * tree

Is a variant of the B + tree. In the non-root and non-leaf nodes of the B + tree, add a pointer to the sibling node:

The B * tree defines that the number of non-leaf node keywords should be at least (2/3) * m, that is, the minimum block usage is 2/3 (instead of 1/2 of B + tree );
Split of B + tree: When a node is full, allocate a new node, copy 1/2 of the data from the original node to the new node, and add a pointer to the new node in the parent node; the split of the B + tree only affects the original node and the parent node, but does not affect the sibling node, so it does not need to point to the sibling node;
B * tree split: When a node is full, if its next sibling node is not full, move part of the data to the sibling node, and then insert a keyword into the original node, finally, modify the keywords of the sibling node in the parent node (because the keyword range of the sibling node has changed). If the sibling node is full, add a new node between the original and sibling nodes, copy 1/3 of the data each to the new node, and add a pointer to the new node at the parent node;
Therefore, the probability of B * tree allocating new nodes is lower than that of B + tree, and the space usage is higher;

Summary

Tree B: Binary Tree. Each node stores only one keyword. If it is equal to or equal to a hit, it is smaller than the left node and greater than the right node;
B-tree: multi-path search tree. Each node stores M/2 to M keywords, and non-leaf nodes store subnodes that point to the keyword range;
All keywords appear in the entire tree only once, and can be hit by non-leaf nodes;
B + tree: On the basis of B-tree, add a linked list pointer to the leaf node. All keywords appear in the leaf node. Non-leaf nodes are used as the index of the leaf node; the B + tree always hits the leaf node;
B * tree: On the basis of B + tree, the linked list pointer is also added for non-leaf nodes to increase the node's lowest utilization rate from 1/2 to 2/3;

 

Reprinted from: http://hi.baidu.com/petercao2008/item/62786f00352b8110cd34ea7d

B tree, B-tree, B + tree, and B * tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.