In the previous article "common data structures and Complexity", we introduced some commonly used linear data structures in computer programming, including array, arraylist, sorted list <t>, list <t>, stack <t>, queue <t>, hashtable, and dictionary <t>. It also briefly introduces the internal implementation principles of these data structures and the computational complexity of common operations, and how to select an appropriate data structure. In this article, we will introduce common tree structures and the computational complexity of common operations.
We know that tree structures are used to organize data like genealogy and company organization structures. For example, the company organizational structure shown in.
A tree is a collection of multiple nodes. Each node has multiple associated child nodes ). A child node is a node directly under a node, while a parent node is above the node directly associated. The root of a tree refers to a single node without a parent node.
All trees present some common features:
- There is only one root node;
- Except the root node, all other nodes have only one parent node;
- No loops. If any node is used as the Start Node, no path is returned to the Start Node. (The first two features ensure the establishment of the loop .)
Binary Tree is a special type of tree. Each node can have up to two subnodes. The two subnodes are called left child and right child of the current node respectively ).
Tree (a) contains eight nodes. Node 1 is its root node. The left child of Node 1 is node 2, and the right child is node 3. Note that a node is not required to have both the left and right children. For example, in binary tree (A), node 4 has only one right child 6. In addition, nodes can also have no child nodes. For example, in binary tree (B), nodes 4, 5, 6, and 7 do not have child nodes.
A node without children is called a leaf node, and a node with children is called an internal node ). For example, nodes 6 and 8 in binary tree (a) are leaf nodes, and nodes 1, 2, 3, 4, 5, and 7 are inner nodes.
. Net does not directly provide binary tree implementation. We need to implement the binarytree class ourselves. In the article "Have you ever implemented a binary tree?", we have implemented a simple binary tree model based on generics.
We have learned that arrays are arranged consecutively in the memory, but Binary Trees are not stored in continuous memory. In fact, generally, binarytree instances only contain references from the root node instance, and the root node instance points to its left and right child node instances, and so on. Therefore, the key difference is that Node object instances that form a binary tree can be distributed to any location in the CLR hosting heap, and they do not need to be stored consecutively as array elements.
To access a node in a binary tree, you usually need to traverse the nodes in the binary tree one by one to locate the node. It does not directly access the specified node as an array. Therefore, the progressive time for finding a binary tree is linear O (N). In the worst case, you need to find all the nodes in the tree. That is to say, as the number of Binary Tree nodes increases, the number of steps to search for any node also increases accordingly.
So if the search time of a binary tree is linear and the positioning time is linear, what is the advantage over the array? After all, although the search time of the array is linear O (N), but the positioning time is constant O (1? This is true. Generally, a common binary tree does not provide better performance than an array. However, if we arrange elements in a binary tree according to certain rules, the query time and positioning time can be greatly improved.
Binary Search Tree)
Binary Search Tree (BST: Binary Search Tree) is a special binary tree that improves the search efficiency of Binary Tree nodes. The binary search tree has the following features:
For any node N,
- The value of each descendant node (descendant node) in the left subtree is smaller than the value of node n;
- The value of each descendant node in the right subtree is greater than the value of node n.
The subtree of node N can be considered as a tree with node N as the root node. All nodes of the subtree are the descendant of node N, while the root of the subtree is node n itself.
Two Binary Trees are displayed. Binary Tree (B) is a binary search tree (BST) that complies with the characteristics of the Binary Search Tree. Binary Tree (a) is not a binary search tree. Because node 10's right child node 8 is smaller than node 10, it appears in the right subtree of node 10. Similarly, node 4 of node 8 is smaller than node 8, but appears in its right subtree. No matter where it is, as long as it does not comply with the feature requirements of the binary search tree, it is not a binary search tree. For example, the left subtree of node 9 can only contain nodes with values smaller than node 9, that is, 8 and 4.
From the characteristics of the binary search tree, we can see that the data stored by each BST node must be able to be compared with other nodes. Given any two nodes, BST must be able to determine whether the values of the two nodes are smaller than, greater than or equal.
Suppose we want to find a node in BST. For example, in the binary search tree (B), we need to find the node with the value of 10.
We start from the root. The value of the root node is 7, which is less than the value of 10. Therefore, if node 10 exists, it must exist in its right subtree. Therefore, you should jump to node 11 to continue searching. If the value of node 10 is smaller than the value of node 11, node 10 must exist in the left subtree of node 11. Find the left child of node 11, and locate the target node 10.
What if the node we are looking for does not exist in the tree? For example, we want to find node 9. Repeat the preceding operation until it reaches node 10. If node 9 exists, it must exist in the left subtree of node 10. However, we can see that node 10 has no left child, so node 9 does not exist in the tree.
To sum up, the search algorithm process is as follows:
Suppose we want to find node N, starting from the root node of BST. The algorithm constantly compares the node value until it finds the node or determines that it does not exist. Each step processes two nodes: one node in the tree, called node C, and the node n to be searched, and then compares the values of C and N. At the beginning, node C is the root node of the BST. Perform the following steps:
- If the C value is null, n is not in BST;
- Compare the values of C and N;
- If the value is the same, the specified node N is found;
- If n is less than C, if n exists, it must be in the left subtree of C. Return to step 2 and use the left child of C as C;
- If n is greater than C, if n exists, it must be in the right subtree of C. Return to step 2 and use the right child of C as C;
You can use BST to find nodes. Ideally, the number of nodes to be checked can be halved. For example, the BST tree contains 15 nodes. Start from the root node and execute the search algorithm. The first comparison determines whether we move to the left or right subtree. In either case, the number of nodes to be accessed is halved, from 15 to 7. Similarly, the number of nodes accessed in the next step is reduced by half, from 7 to 3, and so on.
According to this feature, the time complexity of the search algorithm should be O (log 2n), abbreviated as O (lg n ). We have some time complexity descriptions in the article "algorithm complexity analysis. We can see that log is 2n = Y, which is equivalent to 2y = n. That is, if the number of nodes increases by N, the query time is slowly increased to log. It shows the difference between the growth rates of O (log-2n) and linear growth O (n. The algorithm running time with the time complexity of O (log 2n) is the line below.
It can be seen that the O (log-2n) curve is almost horizontal. With the increase of N value, the curve growth is very slow. For example, to search for an array with 1000 elements, You need to query 1000 elements and find a BST tree with 1000 elements, you only need to query less than 10 nodes (log21024 = 10 ).
In fact, the BST search algorithm relies heavily on the topology of nodes in the tree, that is, the relationship between nodes. It depicts the BST tree with a node Insertion Sequence of 20, 50, and 90,150,175,200. These nodes are inserted in ascending order, and the result is that the tree has no breadth (breadth. That is to say, its topology structure is actually to arrange nodes in an online line, rather than the fan structure, so the search time is O (n ).
When a node in the BST tree is dispersed in a fan-shaped structure, the optimal insertion, deletion, and search operations can reach the sub-linear runtime O (log2n ). Because when you search for a node in BST, the number of nodes is halved after each comparison operation. Even so, if the topology looks like, the running time will be reduced to the linear time O (n ). Because after each step of comparison, you still need to compare the remaining nodes one by one. That is to say, in this case, searching for nodes in BST is basically similar to searching in array.
Therefore,The BST algorithm's search time depends on the tree's topology. The best case is O (log-2n), and the worst case is O (n ).
We not only need to know how to find a node in the binary search tree, but also how to insert and delete a node in the binary search tree.
When a new node is inserted into the tree, the node is always used as a leaf node. Therefore, the most difficult part is how to find the parent node of the node. Similar to the description in the search algorithm, we call this new node N, and the current node traversed is called node C. At the beginning, node C is the root node of the BST. The steps for locating node n parent node are as follows:
- If node C is empty, the parent node of node C serves as the parent node of node n. If the value of node N is smaller than the value of the parent node, node n serves as the left child of the parent node; otherwise, node n serves as the right child of the parent node.
- Compare the values of node C and node n.
- If the value of node C is equal to the value of node N, it indicates that you are attempting to insert a duplicate node. The solution can be to directly discard node N or throw an exception.
- If the value of node N is smaller than the value of node C, it indicates that node n must be in the left subtree of node C. Set the parent node to node C, set node C to the left child of node C, and return to step 2.
- If the value of node N is greater than the value of node C, it indicates that node n must be in the right subtree of node C. Set the parent node to node C, set node C to the right child of node C, and return to step 2.
This algorithm ends when a proper node is found. So that the new node is put into the BST to become a suitable child node for a parent node.
The complexity of the BST Insertion Algorithm is the same as that of the search algorithm: the optimal condition is O (log-2n), and the worst case is O (n ).Because they have the same search and positioning policies for nodes.
Delete a node
Deleting a node from a BST is more difficult than inserting a node. To delete a non-leaf node, You must select another node to break the Tree Caused by deleting the node. If you do not select a node to fill in this break, it violates the characteristics requirements of BST.
The first step of deleting a node algorithm is to locate the node to be deleted. You can use the search algorithm described earlier, so the running time is O (log-2n ). Next, we should select an appropriate node to replace the location of the deleted node. There are three situations to consider.
- Case 1:If the deleted node does not have a right child, select its left child to replace the original node. The feature of the Binary Search Tree ensures that the Left subtree of the deleted node must conform to the feature of the Binary Search Tree. Therefore, the value of the Left subtree is either greater than or less than the value of the parent node of the deleted node, depending on whether the deleted node is a left child or a right child. Therefore, replacing the deleted node with the left subtree of the deleted node fully complies with the Binary Search Tree feature.
- Case 2:If the right child of the deleted node does not have a left child, the right child is used to replace the deleted node. Because the right child of the deleted node is greater than all the nodes in the left subtree of the deleted node and greater than or less than the parent node of the deleted node, it also depends on whether the deleted node is a left child or a right child. Therefore, replacing the deleted node with the right child meets the Binary Search Tree feature.
- Case 3:If the right child of the deleted node has a left child, replace it with the bottom node in the left subtree of the right child of the deleted node. That is to say, we will replace it with the minimum node in the right subtree of the deleted node.
We know that in BST, the minimum node is always on the leftmost, and the maximum node is always on the rightmost. Therefore, replacing the smallest node in the right subtree of the deleted node ensures that the node must be greater than all nodes in the left subtree of the deleted node. At the same time, make sure that it replaces the location of the deleted node, and the value of all nodes in the right subtree is greater than that of the deleted node. Therefore, this selection policy complies with the Binary Search Tree feature.
Similar to search and insert algorithms,The operation time of the deletion algorithm is also related to the topology structure of BST. The best case is O (log 2n), and the worst case is O (n ).
For a linear continuous array, the traversal array adopts a unidirectional iteration method. Each element is iterated backward from the first element. BST has three common traversal methods:
- Perorder Traversal)
- Inorder Traversal)
- Post-order traversal (postorder traversal)
Of course, the three traversal methods work in a similar way. They all start from the root node and then access its subnodes. The difference lies in the time duration. The Access Node itself and its subnodes are in different order.
The previous traversal starts from the current node (node C), then accesses its left child, and then accesses the right child. At the beginning, node C is the root node of the BST. The algorithm is as follows:
- Access Node C;
- Repeat Step 1st for the left child of node C;
- Repeat Step 1 for the right child of node C;
Then the result of tree traversal is 90, 50, 20, 5, 25, 75, 66, 80,150, 95, 92,111,175,166,200.
The central traversal starts from the left child of the current node (node C), then accesses the current node, and finally the right node. At the beginning, node C is the root node of the BST. The algorithm is as follows:
- Access the left child of node C;
- Repeat Step 1 for node C;
- Repeat Step 1st for the right child of node C.
Then the result of tree traversal is 5, 20, 25, 50, 66, 75, 80, 90, 92, 95,111,150,166,175,200.
Post-order traversal (postorder traversal)
The post-order traversal begins with access from the left child of the current node (node C), then the right child, and finally the current node itself. At the beginning, node C is the root node of the BST. The algorithm is as follows:
- Access the left child of node C;
- Repeat Step 1 for the right child of node C;
- Repeat Step 1 for node C;
Then the result of tree traversal is: 5, 25, 20, 66, 80, 75, 50, 92,111, 95,166,200,175,150, 90.
In fact, the running time of the BST operation is related to the height of the tree. The height of a tree refers to the longest path length that can be reached from the root of the tree. The height of a tree can be incrementally defined:
- If the node does not have a subnode, the height is 0;
- If a node has only one subnode, the height of the subnode is increased by 1;
- If a node has two subnodes, the height of the two subnodes is 1 plus the height of the two subnodes;
The height of the computing tree starts from the leaf node. First, the height of the leaf node is set to 0, and then the height of the parent node is calculated based on the above rules. Similarly, after all the node heights in the tree are marked, the height of the root node is the height of the tree.
Displays several BST trees with computed heights.
For example, if the number of nodes in a fruit tree is N, the height of a BST tree meeting the O (log2n) Progressive running time should be close to the maximum integer smaller than log2n.
Among the three BST trees, tree (B) has the best ratio of height to number of nodes. The height of the tree (B) is 3, and the number of nodes is 8. Therefore, log28 = 3, and the result is exactly the same as the height of the tree.
The number of nodes in the tree (a) is 10, and the height is 4, log210 = 3.3219. The maximum integer smaller than 3.3219 is 3, so the tree () the ideal height should be 3. We can move the farthest node to a non-leaf node in the middle to reduce the height of the number and optimize the ratio of the height of the tree to the number of nodes.
Tree (c) is the worst case. The number of nodes is 5, so log25 = 2.3219, the ideal height is 2, but actually 4.
In fact, the real problem we face is how to ensure that the topology of the BST always maintains the optimal ratio of the tree height to the number of nodes. Because the topology of BST is closely related to the insertion sequence of nodes, one way is to ensure that data is in disorder. If we can get the data before inserting a node into the tree, what if we cannot control the data source? For example, data comes from user input or real-time data from sensors. Basically, there is no need to ensure that data is out of order. The other solution is to maintain a balanced BST tree after the new node is inserted without trying to determine the data sequence of the data source ). This kind of data structure that can always maintain the tree balance is called the self-balancing Binary Search Tree ).
A balanced tree refers to a tree that maintains its height and breadth in a predefined proportion. Different data structures can define different proportions to maintain a balance, but all proportions tend to be log2n. Therefore, an auto-balancing BST also shows the gradual running time of O (log2n.
There are many different types of self-balancing BST data structures, such as AVL Tree and red-black tree) 2-3, 2-3-4, splay, and B. In this article, we will briefly introduce two types of trees: AVL and red/black.
In 1962, Russian mathematician G. M.ANdel 'son-VEl-SKII and E. M.LAndis invented the first self-balancing binary search tree called AVL Tree. The AVL tree must maintain the following Balance Conditions for each node N:
- The height of the Left subtree of node N is at most 1 different from that of the right subtree.
The height of the Left or right subtree of a node can be calculated using the steps described above. If a node has only one subnode, the height of the non-subnode side is-1.
Demonstrate the relationships that need to be maintained by the height of the Child trees on both sides of the AVL Tree node.
Below are some BST trees. The numbers in the node represent the value of the node, and the numbers on both sides represent the height of the left and right subtree. Tree (A) and tree (B) are valid AVL trees, while tree (C) and tree (d) are invalid, not all nodes in the tree meet AVL's balance requirements.
When creating an AVL Tree, the difficulty lies in how to ensure the balance of AVL features, rather than the specific operations on the tree. That is to say, whether to add or delete nodes to the tree, the most important thing is to maintain the tree balance. The AVL Tree uses "rotations" to maintain a balance between the trees. The rotation operation can reshape the tree's topology to restore the tree's balance. More importantly, the reconstructed tree still meets the requirements of the Binary Search Tree.
When a new node is inserted into an AVL tree, a two-phase process is required. First, the insert operation will use the same search algorithm as when inserting a new node into the BST tree. The new node is added as a leaf node to a suitable position in the tree to meet the requirements of BST. After a node is added, the structure of the tree may be in violation of the features and requirements of the AVL Tree. Therefore, in the second stage, the access path will be traversed to check the height of the left and right subtree of each node. If the height difference between left and right subtree of a node is greater than 1, you need to use the rotation operation.
This article describes how to rotate Node 3. After Phase 1 inserts a new node 2, the balance of the AVL Tree at node 5 has been damaged because the difference between the left and right subtree of node 5 is 2, greater than 1 required by the AVL Tree. To solve this problem, you need to rotate at the root node of the Left subtree of node 5, that is, node 3. This rotation operation not only restores the AVL Tree balance requirement, but also maintains the BST feature requirements.
In addition to the simple rotation operation described in the image, you may need to perform multiple rotation operations. In-depth discussions on group rotation operations are beyond the scope of this article, so we will not go into detail here. The most important thing is to realize that the insertion and deletion operations will undermine the AVL Tree balance, and the rotation operation is a magic weapon to solve these problems.
By ensuring that the deviation between left and right Subtrees of all nodes is less than or equal to 1, aVL ensures that the insertion, deletion, and search operations will always maintain the progressive running time of O (log2n, it is irrelevant to the order in which nodes are inserted or deleted.
In 1972, Rudolf Bayer, a computer scientist at Technical University of Munich, created a red-black tree data structure. In addition to data and left and right child nodes, nodes in the red and black trees also contain a special message-color. This color only contains two colors: red and black. In addition, a special type of node is added to the red/black tree, called the NIL node. The NIL node appears as a pseudo leaf node of the red/black tree. That is to say, all nodes with key data are called internal nodes, while all other external nodes point to nil nodes. This concept may be difficult to understand. I hope this figure below will be helpful.
The red/black tree (R-B tree) must meet the following requirements:
- The node color can only be red or black;
- The root node is black ;(Root features)
- The NIL node color is black;
- If the node color is red, its child nodes are black ;(Red features)
- The number of black nodes in the path from any node to any of its descendants is the same ;(Black features)
The previous features are well explained, and only the last one is hard to understand. Simply put, starting from any node in the tree, the number of black nodes in the path from this node to any nil node of its descendant must be the same. For example, taking the root node as an example, the number of black nodes from node 41 to any nil node is the same, that is, three nodes. For example, from node 41 to the NIL Node path in the lower left corner, the black nodes include 41, 2, and nil, so the number of black nodes is three.
Similar to the AVL Tree, the red/black tree is also a self-balancing Binary Search Tree. AVL Tree balancing is achieved by limiting the height of left and right sub-trees of nodes, while the red and black trees ensure the balance of the trees in a more visual way. If a tree meets the characteristics of a red-black tree and its total number of nodes is N, its height will always be less than 2 * log2 (n + 1 ). For this reason, the red/black tree ensures that all operations on the tree are within the O (log2n) Progressive running time range.
Like the AVL Tree, When you insert or delete nodes in the red/black tree, you need to make it still conform to the characteristics of the red/black tree. The AVL Tree uses rotations to restore the tree balance. The red and black trees are done by recoloring and rotating. This not only determines the color of the node's parent node, but also compares the color of the uncle node to make the recovery process of the red and black trees more complex.
When inserting a new node into the red/black tree, you need to consider many situations. Assume that there is a red/black tree T, and the new node to be inserted is K.
First, a special case is that if the fruit tree T is empty, you can directly set the node K as the root node, and the color is marked as black, this can meet all the requirements of the R-B tree.
If the fruit tree T is not empty, follow these steps:
- Use the BST Insertion Algorithm to insert node K into the tree T;
- Color node K in red;
- If needed, reshapes the properties of the R-B tree;
We know that the BST tree always adds a new node as a leaf node, so inserting node K into the tree T does not destroy the root feature. Adding a red leaf node does not affect the black feature of the tree T. In fact, adding a red leaf node only affects the red feature of the tree T. Therefore, we only need to check the red feature of the tree. If the red feature is violated, the tree structure needs to be reshaped to meet the requirements of the red and black trees.
The parent node of node K is called the node P (parent node), and the parent node of node P is called the node g (grandparent node ), the sibling node of node P is called the node S (sibling node ).
When a node K is inserted into a non-empty tree T, it is directly affected by the color of the parent node P. The following situations may occur.
Case 1: node P is black.
If P is black and node K is red, the tree T meets all the characteristics of the Red-black tree.
Case 2: node P is red.
If node P is red, P now has a new sub-node K, and K is also red, so it is against the red feature. To deal with these two red nodes, we need to consider the other sub-nodes of node g, namely the brother node s of node p. In this case, there are two situations:
Case 2a: the node S is black or empty.
If the node S is black or empty, you need to rotate the node K, P, and G. Based on K, P, and g order, there are four possibilities for rotation.
The first two possibilities are when P is the left subnode of G.
If S is empty, delete s directly.
The other two possibilities are that when P is the right subnode of G, it is exactly the opposite of the process shown above.
After the rotation operation is completed, the dual red nodes have been properly resolved.
Case 2b: node S is red.
If P's brother node S is red, you need to re-color P, S, G: Color P and S to black, and color g to red.
The re-coloring operation does not affect the black features of tree T, because when the color of P and G changes, the number of black nodes on all paths does not change. However, re-coloring may result in dual red for the parent nodes of G and G. In this case, the dual-red problem needs to be solved recursively and recursively from the parent nodes of G and G following the method of processing the parent nodes of K and K.
The in-depth discussion of the red and black trees is not in the scope of this article. I will not go into details here.
- An extensive examination of data structures using C #2.0
- Test the data structure-Part 3: Binary Tree and bsts [translation]
- Red-black tree
- Red/black tree algorithm Visualization
- Left-leaning red-black trees
- Red-black trees
- Introduction to algorithms: Lecture 10 balanced search trees
- Teach you a thorough understanding of the red and black trees
In this article, "binary tree structure and complexity" is published by Dennis Gao on the blog site blog. any human or crawler reposted without the permission of the author is a rogue.