B tree
That is, the binary search tree:
1. All non-leaf nodes have at most two sons (left and right );
2. All nodes store a keyword;
3. The left pointer of a non-leaf node points to the subtree smaller than its keyword, And the right Pointer Points to the subtree larger than its keyword;
For example:
B-tree search starts from the root node. If the query keyword is the same as the node keyword, it hits;
Otherwise, if the query keyword is smaller than the node keyword, enter the left son. If the query keyword is larger than the node keyword, enter
Right son; if the pointer of the Left or Right son is null, no corresponding keyword is found in the report;
If the number of left and right Subtrees of all non-leaf nodes of Tree B is almost (balanced ),
The search performance is similar to binary search, but it has the advantage of changing the B tree structure than the binary search of the continuous memory space.
(Insert and delete nodes) do not need to move large segments of memory data, or even the constant overhead;
For example:
However, after multiple inserts and deletions, Tree B may have different structures:
The right side is also a B-tree, but its search performance is linear. The same keyword set may lead to different
Tree structure index. Therefore, to use Tree B, you must consider keeping Tree B's structure on the left and avoiding the structure on the right.
It is a so-called "balance" problem;
The B-tree actually used is based on the original B-tree with a balanced algorithm, that is, the "balanced binary tree". How to keep the B-tree
A balanced algorithm for balanced node distribution is the key to balancing Binary Trees. A balanced algorithm is used to insert and delete nodes in Tree B.
Policy;
B-tree
Is a multi-path search tree (not binary ):
1. Define any non-leaf node with a maximum of M sons; and M> 2;
2. The number of sons at the root node is [2, m].
3. The number of non-leaf nodes except the root node is [m/2, m];
4. Each node holds at least m/2-1 (rounded up) and at most M-1 keywords; (at least 2 keywords)
5. Number of keywords for non-leaf nodes = number of pointers to Son-1;
6. Non-leaf node keywords: K [1], K [2],…, K [M-1]; and K [I] <K [I + 1];
7. Non-leaf node pointer: P [1], p [2],…, P [m]; where P [1] points to
Subtree, P [m] pointing to a subtree with a keyword greater than K [M-1], other P [I] pointing to a keyword belonging to (k [I-1], K [I]) child tree;
8. All leaf nodes are on the same layer;
Example: (M = 3)
B-tree search, starting from the root node, performs a binary search for the keyword (ordered) sequence in the node. If
The hit ends. Otherwise, the child node in the scope of the query keyword is entered. Repeat until the corresponding son pointer is
Null, or it is already a leaf node;
B-Tree features:
1. The set of keywords is distributed in the entire tree;
2. Any keyword appears only in one node;
3. The search may end at a non-leaf node;
4. The search performance is equivalent to performing a binary search in the complete set of keywords;
5. Automatic hierarchical control;
Because it limits non-leaf nodes other than root nodes, it must contain at least m/2 sons, ensuring at least
Utilization, the lowest search performance is:
M indicates the maximum number of Subtrees for non-leaf nodes and N indicates the total number of keywords;
Therefore, the performance of B-tree is always equivalent to binary search (irrelevant to m value), so there is no B-tree balance problem;
Due to the limitation of M/2, if the node is full when the end point is inserted, split the node into two
M/2 nodes. When deleting a node, You need to merge two sibling nodes that are less than m/2;
B + tree
The B + tree is a variant of the B-tree and also a multi-path Search Tree:
1. Its definition is basically the same as that of B-tree,:
2. The number of subtree pointers and keywords for non-leaf nodes is the same;
3. Non-leaf node subtree pointer P [I], pointing to the subtree whose key value belongs to [K [I], K [I + 1 ])
(B-the tree is an open interval );
5. Add a chain pointer to all leaf nodes;
6. All keywords appear at the leaf node;
Example: (M = 3)
B + searches are basically the same as B-trees. The difference is that B + trees hit only when they reach the leaf node (B-trees can
Non-leaf node hit), its performance is also equivalent to performing a binary search in the full set of keywords;
Features of B +:
1. All keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list are exactly
Is ordered;
2. It is impossible to hit non-leaf nodes;
3. Non-leaf nodes are equivalent to leaf node indexes (sparse indexes), and leaf nodes are equivalent to storage.
(Keyword) data layer;
4. More suitable for file index systems;
B * tree
Is a variant of the B + tree. In the non-root and non-leaf nodes of the B + tree, add a pointer to the sibling node;
The B * tree defines that the number of non-leaf node keywords should be at least (2/3) * m, that is, the minimum block usage is 2/3.
(Instead of 1/2 of B + trees );
Split the B + tree: When a node is full, allocate a new node and 1/2 of the data in the original Node
Copy to the new node, and add the pointer of the new node to the parent node. the split of the B + tree only affects the original node and the parent node.
Node, but does not affect the sibling node, so it does not need to point to the sibling pointer;
B * tree split: When a node is full, if its next sibling node is not full
Move the data to the sibling node, insert a keyword to the original node, and modify the keywords of the sibling node in the parent node.
(Because the keyword range of the sibling node has changed); If the sibling node is full
Add new nodes, copy 1/3 of data each to the new node, and add a pointer to the new node at the parent node;
Therefore, the probability of B * tree allocating new nodes is lower than that of B + tree, and the space usage is higher;
Summary
B tree: Binary Tree. Each node stores only one keyword. If it is equal to or equal to, it hits. If it is less than or equal to the left node, it is greater
Go to the right node;
B-tree: multi-path search tree. Each node stores M/2 to M keywords. Non-leaf node storage points to the key.
Subnode of the word range;
All keywords appear in the entire tree only once, and can be hit by non-leaf nodes;
B + tree: adds a linked list pointer to the leaf node on the basis of B-tree. All the keywords are on the leaf node.
The non-leaf node is used as the index of the leaf node. The B + tree always hits the leaf node;
B * tree: On the basis of B + tree, the linked list pointer is also added for non-leaf nodes, which minimizes the utilization of nodes.
From 1/2 to 2/3;
Original article addressHttp://blog.csdn.net/manesking/archive/2007/02/09/1505979.aspx