Java data structure and algorithm analysis (10)--2-3 Tree
Binary lookup trees are efficient in most cases of finding and inserting, but they are less efficient in the worst cases. The data structure of the balanced lookup tree guarantees LGN efficiency in the worst-case scenario, and to achieve this we need to ensure that the tree remains in balance after the insert completes, which is the balanced lookup tree (balanced search trees). In a tree with n nodes, we want the height of the tree to remain around LGN, so that we can ensure that we can find the desired value only if we need to lgn the comparison operation. Unfortunately, it is too expensive to maintain the balance of the tree after each element is inserted.
The 2-3 lookup tree (2-3 search trees) ensures that, in the worst case, both insertion and lookup efficiency can be performed within a logarithmic time complexity. 2-3 Lookup Tree Overview
The 2-3 tree is the simplest B-tree (or-tree) structure, with each non-leaf node having two or three children, all of which are on the unified layer. 2-3 tree is not a binary tree, its node can have 3 children. However, the 2-3 trees are similar to the two-forked trees.
Unlike two fork trees, 2-3 trees run each node to save 1 or two values. For an ordinary 2-node (2-node), he saves 1 keys and two points for himself. corresponding to the 3 node (3-node), the definition of two key,2-3 lookup trees is as follows:
For 2 nodes, the node holds a key and corresponding value, as well as two nodes pointing to the left and right nodes, and a node of 2-3, all values are smaller than key, nodes are a 2-3 node, and all values are larger than the key.
For 3 nodes, the node holds two keys and corresponding value, and three nodes that point to the left and right. The left node is also a 2-3 node, all values are smaller than the smallest key in the two key; the middle node is also a 2-3 node, the middle node key value is between two nodes key value and the right node is also a 2-3 node, All key values for a node are larger than the largest key in the two key.
A 2-3 lookup tree is either an empty tree or consists of the following nodes:
1) 2-node: Contains a key and two links, the left link to the 2-3 tree key is less than the node, the right link to the 2-3 tree in the key is greater than the node.
2) 3-node: Contains two keys and three links, the left link to the 2-3 tree of the key is less than the node, the link to the 2-3 tree in the key is located between the two keys of the node, the right link to the 2-3 tree in the key is greater than the node.
If the middle sequence traverses the 2-3 lookup tree, the sequence can be sorted. In a fully balanced 2-3 lookup tree, the distance between the root node and each empty node is the same.
Find
Before we can balance the 2-3 trees, let's assume we're in equilibrium, and we'll look at the basic lookup operations first.
The 2-3 tree lookup and the two Fork lookup tree are similar, to determine whether a tree belongs to a 2-3 tree, we first compare it to the node, and if it is equal, the lookup succeeds; otherwise, the recursive lookup in the left-right subtree of the comparison, if the node is found empty, is not found, otherwise it is returned. The lookup process is as follows:
Insert
Insert operations for 2-3 trees are generally divided into the following situations:
1. Insert a new key into a 2-node node. (The initial state of the tree)
2. Insert a new key into a tree that contains only one 3-node. (The initial state of the tree)
3. Inserts a new key into the 3-node of a parent node that is a 2-node. (Division of the subtree 1)
4. Inserts a new one into the 3-node of a parent node that is 3-node. (Sub-tree classification 2)
5. Decompose the root node. (The tree's upward-growing state) into a 2-node node
Insert an element into a 2-3 tree and insert an element into the binary lookup tree, first look for it, and then hang the node on the node that is not found. The reason why 2-3 trees are able to ensure efficiency in the worst cases is that they remain balanced after they are plugged in. If a node not found after the lookup is a 2-node node, it is easy to just put the new element into the 2-node node to make it a 3-node node. But if the lookup node ends up in a 3-node node, it might be a bit of a hassle.
INSERT into a 3-node node
Inserting a new node into a 3-node node may encounter many different situations, starting with a simple tree containing only one 3-node node.
Action 1: Contains only one 3-node node
As pictured above, assuming that the 2-3 tree contains only one 3-node node, this node has two keys, there is no space to insert the third key, the most natural way is to assume that the node can hold three elements, temporarily make it into a 4-node node, and he contains four child nodes. Then we elevate the middle element of the 4-node node, the left node as its left node, and the right element as its right node. The insert completes, becomes the balance 2-3 lookup tree, and the height of the tree changes from 0 to 1.
Action 2: Parent node: 2-node, child node: 3-node
As in the first case, we can also insert the new element into the 3-node node to make it a temporary 4-node node, and then elevate the intermediate element in that node to the parent node, the 2-node node, so that its parent node becomes a 3-node node, Then the left and right nodes are hung in the appropriate position of the 3-node node. Operate the following figure:
Action 3: Parent node: 3-node, child node: 3-node
When we insert a node that is 3-node, we split the node, the middle element is raised to the parent node, but at this point the parent node is a 3-node node, after which the parent node becomes the 4-node node, and then continues to elevate the middle element to its parent node. Until you encounter a parent node that is a 2-node node and then turn it into a 3-node, you do not need to continue with the split.
After the child node splits, an element is added to its parent node, but it also exceeds the storage capacity of the parent node, so it continues to split up until it has the parent node.
The above operation 2 and Operation 3 will not affect the depth of the tree, is affecting the depth of the tree is: only when the root node is 3-node, at this time there are elements inserted into the bottom, the continuous upward fission, unfortunately if the root node is affected. This is the case where the root node splits below.
Root node splitting
When the root node to the child node is the 3-node node, this is if we want to insert a new element at the byte point, we will always check to the node, in the last step, with the node into a 4-node node, this time, we need to check the nodes into two 2-node nodes, The height of the tree plus 1, this operation process is as follows:
Local conversion
Splitting a 4-node into 2-3node involves 6 possible operations. This 4-node may be with the node, or it may be the left child node of the 2-node or the right child node. Or it's a 3-node left, middle, right child node. All of these changes are local and do not need to be checked or modified by other parts of the node. So the 2-3-tree balance needs to be done just a few times.
Properties
These local operations maintain a balance of 2-3 trees. For the 4-node node, the height of the tree has not changed before and after the deformation of 2-3 nodes. Only when the node is a 4-node node, the height of the tree will be added one after the deformation. As shown in the following illustration:
Analysis
The fully balanced 2-3 lookup tree is the same as the distance from each root node to the leaf node:
The search efficiency of the 2-3 tree is closely related to the height of the tree:
1. In the worst-case scenario, where all nodes are 2-node nodes, the search efficiency is LGN
2. In the best case, all nodes are 3-node nodes, search efficiency is log3n approximately equal to 0.631lgN
In terms of distance, for 2-3 trees with 1 million nodes, the height of the tree is 12-20, and for the 2-3 tree of 1 billion nodes, the height of the tree is between 18-30.
For inserts, it takes only a few operations to complete, because he only needs to modify the nodes associated with the node, and does not need to check other nodes, so efficiency and lookup are similar. Here is the efficiency of the 2-3 lookup tree:
Finally, attach a 2-3-tree construction process: