B tree, B-tree, B + tree, B * tree, red/black tree, rbtree, binary sorting tree, trie tree, double array Dictionary Lookup tree
B tree
That is, the binary search tree:
1. All non-leaf nodes have at most two sons (left and right );
2. All nodes store a keyword;
3. The left pointer of a non-leaf node points to the subtree smaller than its keyword, And the right Pointer Points to the subtree larger than its keyword;
For example:
B-tree search starts from the root node. If the query keyword is the same as the node keyword, it hits; otherwise, if the query keyword is smaller than the node keyword, it enters the left son; if it is larger than the node keyword, it enters the right son; If the left son or right son pointer is empty, the report cannot find the corresponding keyword;
If the number of left and right Subtrees of all non-leaf nodes of Tree B is almost (balanced), the search performance of Tree B approaches binary search; however, it has the advantage of Binary Search over the continuous memory space: Changing the B-tree structure (inserting and deleting nodes) does not need to move large segments of memory data, or even constant overhead;
For example:
However, after multiple inserts and deletions, Tree B may have different structures:
The right side is also a B-tree, but its search performance is linear. The same keyword set may lead to different tree indexes. Therefore, to use Tree B, we should also consider keeping the structure of the Left graph as much as possible, and avoiding the structure of the right graph, that is, the so-called "balance" problem;
The B-tree actually used is based on the original B-tree with a balanced algorithm, that is, the "balanced binary tree". The key to balancing a binary tree is how to maintain a balanced distribution of B-Tree nodes; the balancing algorithm is a strategy for inserting and deleting nodes in Tree B;
B-tree
Is a multi-path search tree (not binary ):
1. Define any non-leaf node with a maximum of M sons; and M> 2;
2. The number of sons at the root node is [2, m].
3. The number of non-leaf nodes except the root node is [m/2, m];
4. Each node holds at least m/2-1 (rounded up) and at most M-1 keywords; (at least 2 keywords)
5. Number of keywords for non-leaf nodes = number of pointers to Son-1;
6. Non-leaf node keywords: K [1], K [2],…, K [M-1]; and K [I] <K [I + 1];
7. Non-leaf node pointer: P [1], p [2],…, P [m]; where P [1] points to a subtree with a keyword less than K [1], p [m] points to a subtree with a keyword greater than K [M-1, other P [I] points to the subtree where the keyword belongs (K [I-1], K [I;
8. All leaf nodes are on the same layer;
Example: (M = 3)
B-tree search: starts from the root node and performs a binary search for the keyword (ordered) sequence in the node. If hit, the query ends. Otherwise, the son node in the search keyword range is entered; repeat until the corresponding son pointer is null or is already a leaf node;
B-Tree features:
1. The set of keywords is distributed in the entire tree;
2. Any keyword appears only in one node;
3. The search may end at a non-leaf node;
4. The search performance is equivalent to performing a binary search in the complete set of keywords;
5. Automatic hierarchical control;
Because the non-leaf nodes except root nodes are restricted, at least m/2 sons are contained, and the minimum utilization of nodes is ensured. The lowest search performance is as follows:
M indicates the maximum number of Subtrees for non-leaf nodes and N indicates the total number of keywords;
Therefore, the performance of B-tree is always equivalent to binary search (irrelevant to m value), so there is no B-tree balance problem;
Due to the limitation of M/2, if the node is full when inserting the knot, you need to split the knot into two nodes each occupying M/2. When deleting the knot, merge two sibling nodes with less than M/2;
B + tree
The B + tree is a variant of the B-tree and also a multi-path Search Tree:
1. Its definition is basically the same as that of B-tree,:
2. The number of subtree pointers and keywords for non-leaf nodes is the same;
3. the subtree pointer P [I] for non-leaf nodes, pointing to the subtree with the key value [K [I], K [I + 1]) (B-tree is an open interval );
5. Add a chain pointer to all leaf nodes;
6. All keywords appear at the leaf node;
Example: (M = 3)
The search for B + is basically the same as that for B-trees. The difference is that B + trees hit only when they reach the leaf node (B-trees can hit non-leaf nodes ), its performance is also equivalent to performing a binary search in the full set of keywords;
Features of B +:
1. All keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list are exactly ordered;
2. It is impossible to hit non-leaf nodes;
3. Non-leaf nodes are equivalent to leaf node indexes (sparse indexes), and leaf nodes are equivalent to data layers that store (keywords) data;
4. More suitable for file index systems;
B * tree
Is a variant of the B + tree. In the non-root and non-leaf nodes of the B + tree, add a pointer to the sibling node;
The B * tree defines that the number of non-leaf node keywords should be at least (2/3) * m, that is, the minimum block usage is 2/3 (instead of 1/2 of B + tree );
Split of B + tree: When a node is full, allocate a new node, copy 1/2 of the data from the original node to the new node, and add a pointer to the new node in the parent node; the split of the B + tree only affects the original node and the parent node, but does not affect the sibling node, so it does not need to point to the sibling node;
B * tree split: When a node is full, if its next sibling node is not full, move part of the data to the sibling node, and then insert a keyword into the original node, finally, modify the keywords of the sibling node in the parent node (because the keyword range of the sibling node has changed). If the sibling node is full, add a new node between the original and sibling nodes, copy 1/3 of the data each to the new node, and add a pointer to the new node at the parent node;
Therefore, the probability of B * tree allocating new nodes is lower than that of B + tree, and the space usage is higher;
Summary
Tree B: Binary Tree. Each node stores only one keyword. If it is equal to or equal to a hit, it is smaller than the left node and greater than the right node;
B-tree: multi-path search tree. Each node stores M/2 to M keywords, and non-leaf nodes store subnodes that point to the keyword range;
All keywords appear in the entire tree only once, and can be hit by non-leaf nodes;
B + tree: On the basis of B-tree, add a linked list pointer to the leaf node. All keywords appear in the leaf node. Non-leaf nodes are used as the index of the leaf node; the B + tree always hits the leaf node;
B * tree: On the basis of B + tree, the linked list pointer is also added for non-leaf nodes to increase the node's lowest utilization rate from 1/2 to 2/3;
This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/manesking/archive/2007/02/09/1505979.aspx
Red/black tree rbtree binary sorting tree
MAP is stored in the red-black tree, and the red-black tree (rb tree) is a balanced binary tree. Its advantage is that the depth of the tree to the leaf node is consistent, and the search efficiency is the same as that of logn. the efficiency of searching, inserting, and deleting operations is the same. When all static data is used, there are not many advantages. Hash Tables may be suitable.
Hash_map is a hash table that occupies more memory and is more efficient in searching, but it takes a lot of time to hash.
In general, the hash_map search speed is faster than that of map, and the basic search speed and data volume size belong to the constant level, while the map search speed is at the log (n) level. Not necessarily, constants are smaller than log (n), and the time consumption of hash functions is also time-consuming. See, if you consider efficiency, especially when the number of elements reaches a certain order of magnitude, consider hash_map. However, if you are very strict with the memory usage and want the program to consume as little memory as possible, be careful. hash_map may embarrass you, especially when you have many hash_map objects, you cannot control it, and the construction speed of hash_map is slow.
Do you know how to choose? Weigh three factors: search speed, data volume, and memory usage.
Trie tree double array dictionary Search Tree
The trie tree can be used for both general dictionary search and index search.
Each node is equivalent to a status of DFA, and the end state is the end of the search. The process of sequential search is equivalent to constant state conversion.
For a given string A1, A2, A3,..., An, The trie tree search can be used to perform a search after N times. However, it seems that the search efficiency of Tree B is not high, and the complexity of the search algorithm of Tree B is logt (n + 1/2). When T tends to be large, the search efficiency becomes more efficient. No wonder DB2's access memory is set to a page size of the virtual memory, and the frame switching frequency is reduced, without frequent page switching.