I. B + tree
1. B + tree definition and features
The B + tree is a variant of the B-tree and also a multi-path Search Tree:
Its definition is basically the same as that of B-tree,:
1). The number of subtree pointers and keywords for non-leaf nodes is the same;
2 ). the subtree pointer P [I] for non-leaf nodes, pointing to the subtree with the key value [K [I], K [I + 1]) (B-tree is an open interval );
3). Add a chain pointer to all leaf nodes;
4). All keywords appear at the leaf node.
For the sake of comprehensiveness, here is another online saying:
The difference between an m-Level B + tree and a m-Level B tree is as follows:
1. the node with n subtree contains n keywords (whereas B has n-1-1 keywords)
2. All leaf nodes contain information about all the keywords and pointers containing these keyword records. The leaf nodes themselves are connected in a small and large order based on the keyword size. (The leaf node of Tree B does not include all the information to be searched)
3. All non-terminal nodes can be regarded as the index part. The node only contains the maximum (or minimum) keywords in its subtree root nodes. (The non-final node of Tree B also contains valid information to be searched)
A typical Level 3 B + tree example is provided.
Features of B +:
1). All the keywords appear in the linked list of the leaf node (dense index), and the keywords in the linked list are exactly ordered;
2). It is impossible to hit non-leaf nodes;
3). Non-leaf nodes are equivalent to leaf node indexes (sparse indexes), and leaf nodes are equivalent to data layers that store (keywords) data;
4). More suitable for file index systems;
2. Basic operations on the B + tree
1) search operation
You can perform two search operations on the B + tree:
A. Search from the smallest keyword in sequence;
B. Start from the root node and perform random search.
When searching, if the cast machine on a non-terminal node is equal to the given value, it does not terminate, but continues until the leaf node. Therefore, in the B + tree, no matter whether the search is successful or not, each search takes a path from the root to the leaf node. The rest are similar to the B-tree lookup.
2) Insert
The Insert Process of the B + tree is similar to that of the B tree. The difference is that the B + tree is carried out on the leaf node. If the number of key codes in the leaf node exceeds m, it must be split into two nodes with the same number of key codes, and ensure that the upper node has the maximum key code of the two nodes. (See Baidu encyclopedia for algorithms)
3) Delete
The deletion of the B + tree is only performed at the leaf node. When the maximum keyword in the leaf node is deleted, the value in the non-terminal node can be used as a "Demarcation keyword. If the number of keywords in the node is less than m/2 due to deletion (the upper bound of m/2 results, for example, 3 is the result of 5/2, the process of merging with sibling nodes is similar to that of B-tree.
PS:
A. Different from B + tree, B + tree is only suitable for random search. B + tree supports both random search and sequential search. It is widely used in practice.
B. Why is B + tree more suitable for the file index and database index of the operating system in actual applications than B tree?
1) the disk read/write cost of the B + tree is lower.
The internal node of the B + tree does not point to the specific information of the keyword. Therefore, the internal node is smaller than the B-Tree node. If you store all the keywords of the same internal node in the same disk, the more keywords the disk can hold. The more keywords you need to search for In-memory reading at one time. IO reads and writes are reduced.
For example, assume that a disk block contains 16 bytes, while a keyword is 2 bytes, and a keyword is 2 bytes. An internal node of a 9-Level B-tree (a node can have up to eight keywords) requires two disks. While the internal node of the B + tree only needs one disk (all the keywords are in the leaf node ?). When the internal node needs to be read into the memory, the B-tree is more than the B + tree to find the disk block once (the disk is the disk rotation time) (the inner node of the B + tree only acts as an index. Why does it "read the inner node into memory "..., for the B + tree, you can find the leaf node, and for the B + tree, you can search in sequence ).
2) the query efficiency of the B + tree is more stable.
Because the non-endpoint is not the final point to the file content node, it is only the index of the keyword in the leaf node. Therefore, any keyword search must follow a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the query efficiency of each data.
The biggest difference between c. B + and B-is:
1 ). b-the keywords and records of the tree are put together. Leaf nodes can be considered as external nodes without any information; the non-leaf nodes of the B + tree only have keywords and indexes pointing to the next node. The records are only placed in the leaf node.
2 ). in the B-tree, the closer the record to the root node, the faster the query time. You only need to find the keyword to determine the existence of the record; in the B + tree, the search time for each record is basically the same. You need to go from the root node to the leaf node and compare the keywords in the leaf node. From this perspective, the performance of the B-tree seems to be better than that of the B + tree, but in actual application, the performance of the B + tree is better. Because the non-leaf nodes of the B + tree do not store actual data, each node can accommodate more elements than the B-tree, and the tree height is smaller than that of the B-tree, this reduces the number of disk accesses. Although the B + tree requires more comparisons than the B-tree for finding a record, the time for one disk access is equivalent to the time for hundreds of memory comparisons, therefore, in practice, the performance of the B + tree may be better, and the leaf nodes of the B + tree are connected by pointers to facilitate sequential traversal (for example, viewing all files in a directory, all records in a table). This is why many databases and file systems use the B + tree.
II. B * tree (this is a rare introduction on the internet, and I have not found any detailed introduction in textbooks)
B * Tree is a variant of B + Tree. In B + Tree, the non-root and non-leaf nodes (inner nodes) add pointers pointing to brothers.
The B * tree defines that the number of non-leaf node keywords should be at least (2/3) * M, that is, the minimum block usage is 2/3 (instead of 1/2 of B + tree );
Split of B + tree: When a node is full, allocate a new node, copy 1/2 of the data from the original node to the new node, and add a pointer to the new node in the parent node; the split of the B + tree only affects the original node and the parent node, but does not affect the sibling node, so it does not need to point to the sibling node;
B * tree split: When a node is full, if its next sibling node is not full, move part of the data to the sibling node, and then insert a keyword into the original node, finally, modify the keywords of the sibling node in the parent node (because the keyword range of the sibling node has changed). If the sibling node is full, add a new node between the original and sibling nodes, copy 1/3 of the data each to the new node, and add a pointer to the new node at the parent node;
Therefore, the probability of B * tree allocating new nodes is lower than that of B + tree, and the space usage is higher.
For more information, see the July blog on the R tree.