The dynamic look-up trees in our presentation are: Two fork search tree (BST), Balanced binary search tree (AVL), Red black Tree (RBT), b~/b+ tree (b-tree). These four kinds of trees have the following advantages:
(1) are dynamic structures. There is no need to completely rebuild the original index tree when deleting or inserting an operation. The most is to perform a certain amount of rotation, discoloration operation to a limited extent to change the shape of the tree. And the cost of these operations is much less than rebuilding a tree. This advantage is mentioned in the "Find Structure special topic (1): An introduction to static search structures."
(2) The time complexity of the lookup is generally maintained at the O (log (N)) Order of magnitude. Some structures may be less efficient in the worst case, such as BST. This will be explained in detail in the following comparison.
Let's start by describing the four kinds of trees and comparing them to each other.
1. Two-fork search tree (binary search trees)
It is clear that the discovery of binary search trees is entirely due to the fact that static lookup structures are dynamically inserted, and the inability to delete nodes (at great cost) is apparent.
BST analysis of operational costs:
(1) Find the price: any one of the data in the search process needs to start from the root node, along a certain path toward the leaf node forward. So the number of data comparisons in the search is closely related to the tree morphology.
The tree height is logn when each node in the tree is roughly the same level as the left and right sub-tree. The average lookup length is proportional to the logn, and the average time complexity of the lookup is at the O (logn) Order of magnitude.
When successively inserted keywords are ordered, BST degenerate into a single tree structure. At this time the tree is high N. The average lookup length is (n+1)/2, and the average time complexity of the lookup is at the O (n) Order of magnitude.
(2) Insertion cost: The new node is inserted into the leaves of the tree without the need to change the structure of the original node in the tree. The cost of inserting a node is exactly the same as finding a non-existent data.
(3) Delete the cost: When deleting a node p, the first need to locate the node P, this process requires a lookup cost. Then change the shape of the tree a little bit. If only one of the left and right subtrees of the deleted node exists, the cost of changing the pattern is only O (1). If the left and right subtrees of the deleted node are present, just want to be the right child of the right child of P's left child ... The right leaf node and P interchange, in the change of some of the subtree can be. Therefore, the maximum time complexity of the delete operation is no more than O (Logn).
BST Efficiency Summary: find the best time complexity O (LOGN), the worst time complexity O (N).
Insert Delete operation algorithm is simple, time complexity and find almost
2. Balanced binary search tree (Balanced binary search trees)
Binary search trees are in the worst-case scenario and the sequential lookup efficiency is comparable, which cannot still be affected. It is also proven that the structure of the tree has a significant impact on the search efficiency of certain keywords when the data is large enough. Of course, the main reason for this is that BST is not well balanced (the height difference between the left and right sub-trees is too high).
That being the case, then we need to change the unbalanced tree into a balanced tree by some algorithm. As a result, the AVL tree was born.
AVL analysis of operational costs:
(1) Search cost: AVL is a strictly balanced BST (balance factor not exceeding 1). The lookup process is like BST, except that the AVL does not have the worst-case BST (single tree). Therefore, the search efficiency is the best, the worst case is O (logn) Order of magnitude.
(2) Insertion cost: AVL must be guaranteed to be strictly balanced (|bf|<=1), so each time the data is inserted, the balance factor of some nodes in the AVL must be rotated by more than 1. In fact, each insertion node operation of AVL requires a maximum of 1 rotations (single or double rotation). Therefore, the cost of the overall insert operation is still at the O (logn) level ( the insertion node needs to find the inserted position first ).
(3) Deletion cost: The algorithm for AVL to delete nodes can be found in the delete node of BST, but after deletion you must check the balance factor of all nodes from the beginning of the deletion node to the root node path. So the cost of the deletion is slightly greater. A maximum of O (LOGN) rotations is required for each delete operation. Therefore, the time complexity of the delete operation is O (LOGN) +o (Logn) =o (2logN)
AVL Efficiency Summary: the time complexity of finding is maintained at O (Logn), with no worst case
The AVL tree requires a maximum of 1 rotations per insert operation, with a time complexity of around O (Logn).
The AVL tree is slightly larger at the time of execution deletion, and the complexity of each delete operation requires O (2logN).
3. Red and black Trees (red-black tree)
The strict equilibrium strategy of the binary balance tree takes the cost of establishing a lookup structure (insert, delete), in exchange for a stable O (logn) lookup time complexity. But is it worth it?
Can you find a compromise strategy, that is, not sacrificing too much to establish the cost of finding structures, but also to ensure a stable and efficient search efficiency? The answer is: the red and black trees.
Key Features:
1. The two sub-nodes of each red node must be black. In other words: no two consecutive red nodes can be found on all paths from each leaf to the root
2. All paths from any node to each of its leaves contain the same number of black nodes
Therefore, the longest possible path from the root to the leaf is no more than twice times longer than the shortest possible path.
Operation cost Analysis of RBT:
(1) Search cost: Due to the nature of the red-black tree (the longest path length does not exceed the shortest path length of twice times), it can be said that the red and black trees, although not as the same as AVL is strictly balanced, but the balance of performance is better than BST. The search cost is basically maintained at around O (Logn), but in the worst case (the longest path is twice times less than the shortest path by 1), which is slightly inferior to AVL.
(2) Insertion cost: RBT when inserting a node, the rotation and discoloration actions are required. But because only need to guarantee the RBT basic balance to be possible. As a result, the insertion node requires a maximum of 2 rotations, as is the case with the AVL insert operation. Although the color-changing operation requires O (LOGN), the color-changing operation is very simple and cost less.
(3) Delete cost: RBT The cost of delete operation is much better than AVL, delete a node up to only 3 rotation operation.
RBT Efficiency Summary: find the efficiency best case time complexity is O (Logn), but in the worst case than AVL, but also far better than BST.
Insert and delete operations the probability of changing the balance of the tree is much smaller than the AVL (RBT is not highly balanced). As a result, the rotation required is less likely, and once a rotation is required, inserting a node will only need to be rotated 2 times, and the deletion will only need to be rotated 3 times (less than the number of rotations required by the AVL delete operation). Although the time-varying complexity of the color-changing operation is in O (Logn), in practice, the cost of this operation due to simplicity is minimal.
4. b~ Tree/b+ tree (b-tree)
The efficiency of the red-black tree is already very good for the in-memory lookup structure (in fact many of the actual applications are also optimized for RBT). But what if it's a very large amount of data? It is obviously impractical to put all this data into memory and organize it into a RBT structure. In fact, like the file directory store in the OS, the storage of the file index structure in the database .... It is not possible to establish a lookup structure in memory. This structure must be built on disk. So is RBT still a good choice in this context?
Organization of a lookup structure on disk, it is possible to read the disk data once from any node point to other nodes, and then write the data to memory for comparison. As we all know, the frequent disk IO operation, the efficiency is very low (mechanical motion is slower than the electronic motion to know how much). It is obvious that all of the two-fork tree lookup structures are inefficient on disk. Therefore, the B-tree solves this problem very well.
operation cost Analysis of B-tree:
(1) Search cost: B-tree as a balanced multi-path lookup tree (M-fork). B-Tree lookup is divided into two types: one is to find the address of another node from one node, the need to locate the disk address (find address), the search cost is very high. The other is to put an ordered keyword sequence in the node into memory, to optimize the search (can use binary), compared to the search cost is very low. The B-Tree is very small in height, so in this context, the B-tree is much more efficient than any binary structure search tree. and the B + tree, as a variant of this tree, is more efficient to find.
(2) Insertion cost: The insertion of b-tree will cause a split operation of the node. When the insert operation causes the splitting of the S node, the number of disk accesses is H (read nodes on the search path) +2s (write back two split new nodes) +1 (write back the new root node or the node that did not cause splitting after inserting). Therefore, the number of disk accesses required is h+2s+1, up to 3h+1. So the cost of inserting is very large.
(3) Deletion cost: the deletion of B-tree will cause a junction merge operation. Worst-Case disk access count is 3h= (it is necessary to find the element containing the deleted elements h
Read access) + (gets the 2nd to H layer of the nearest sibling needs h-1 Read access) + (merge on 3rd to H layer requires h-2 write
Access) + (3 Write access to the modified root node and two nodes on layer 2nd)
b-tree Efficiency Summary: because the disk storage structure is considered, the cost of finding, deleting and inserting the B-tree is much smaller than any binary tree (decreasing the number of read-write disks).
Comparison of dynamic find tree structures:
(1) Balance binary tree and red black tree [AVL PK RBT]
Both AVL and RBT are optimized for two-fork search trees. Its performance is much better than binary search tree. They all have their own advantages, and their application is different.
Structure comparison: AVL's structure is highly balanced and the structure of RBT is basically balanced. Balance of AVL > RBT.
Find comparisons: AVL finds the best time complexity and the worst case is O (logn).
RBT Find time complexity is best for O (logn), worst case is slightly worse than AVL.
Insert Delete comparison: 1. The insertion and deletion nodes of AVL can easily result in the imbalance of the tree structure, while the RBT balance requirement is lower. Therefore, in the case of large numbers of data insertions, the RBT need to regain balance by rotating the discoloration operation less frequently than AVL.
2. If balance processing is required, RBT has a color-changing operation more than AVL, and the time complexity of discoloration is at O (logn) Order of magnitude. But because of the simple operation, in practice this discoloration is still very fast.
3. When inserting a node causes the tree to be unbalanced, both AVL and RBT require up to 2 rotation operations. However, when an imbalance is removed from a node, the AVL requires a maximum of logn rotation, while the RBT requires up to 3 times. So the cost of inserting a node is similar, but the cost of deleting a node is RBT lower.
4. The cost of insertion and deletion of AVL and RBT is mostly spent on finding the node to be operated on. Therefore, the time complexity is basically proportional to O (logn).
Overall evaluation: A large number of data practices show that the overall statistical performance of RBT is better than that of balanced binary trees.
(2) b~ tree and B + Tree [ b~tree PK B+tree]
B + Tree is a variant of the b~ tree, and in the disk lookup structure, the B + tree is more suitable for the disk storage structure of the file system.
Structure comparison: The b~ tree is a balanced multi-path lookup tree, and all nodes contain valid information about the unknown origin keyword (such as a file disk pointer). If there are n keywords for each node, there are n+1 pointers to other nodes.
The B + Tree is strictly not a tree, and it has a pointer link between its leaf nodes. The non-endpoints of the B + tree do not contain information about the keywords, and all the information needed to find the keywords is contained on the leaf nodes. A non-endpoint exists only as an index of the leaf node keyword.
Find comparison: 1. Under the same number of unknown origin data, the B + Tree lookup process requires less disk IO operations than the normal b~ tree. Because the B-tree is located on the disk storage background, the discovery performance of the + + tree is better than the b~ tree.
2. B + trees are more stable to find, because all leaf nodes are in the same layer, and finding all the keywords must go through the whole process from the root node to the leaf node. So in the same B + tree, the number of search comparisons for any keyword is the same. B-tree is not necessarily, may find a non-endpoint is over.
Insert Delete comparison: the efficiency of the B + tree and the b~ tree in the insert delete operation is similar.
Overall evaluation: In the context of application, especially in the file structure storage. B + trees are more widely used and more efficient than b~ trees.
String Lookup structure
In this topic, BST, AVL, BRT, B~tree and so on can be competent to find any key word data. But for string lookup (string matching) structures, there are specialized structures and algorithms.
Trees: BST, AVL, red-black tree, B-tree, + + Tree