In the Linux kernel, user-mode address space management uses the red-black tree data structure. Many people may be confused about this data structure, I have read a lot of materials to learn about the principles of the red and black trees. Recently I saw an article on a foreign website explaining the red and black trees. I thought it was quite good and did not dare to exclusive. So I translated the article into Chinese for reference by all kernel versions of the brothers. Due to my limited level, errors are inevitable. You are welcome to correct me. Original Website:Http://sage.mc.yu.edu/kbeen/teaching/algorithms/resources/red-black-tree.htmlAdd two link addresses: On-site use of the red/black tree Http://www.linuxforum.net/forum/showthreaded.php? Cat = & Board = program & Number = 556347 & page = 0 & view = collapsed & sb = 5 & o = 31 & fpart = & vc = Introduction to the splay tree Http://www.linuxforum.net/forum/showflat.php? Cat = & Board = linuxK & Number = 609842 & page = & view = & sb = & o = & vc = 1 Definition of the red/black tree As defined in CLRS (TRANSLATOR: CLRS refers to a famous algorithm book, Introduction to algorithms. The Chinese name should be referred to as an introduction to algorithms. CLRS is the author of the book cormen, leiserson, rivest and Stein), a red/black tree is a binary search tree that meets the following requirements ): 1. Each node is either black or red. 2. The root node is black. 3. Each leaf node (actually a null pointer) is black. 4. If a node is red, its two subnodes are black (that is, there cannot be two adjacent red nodes ). 5. For each node, the number of black nodes contained in the path from this node to all its child leaf nodes must be the same. Data items can only be stored in internal nodes ). The "leaf node" we refer to may only use a null pointer in its parent node, but it is also seen as an actual node to help describe the algorithm for inserting and deleting the red/black tree, all leaf nodes are black. Theorem: The height of a red/black tree with n internal nodes. H <= 2log (n + 1) (Translator: I think the proof of the above theorem in the original article is wrong. The following proof method is written by referring to the proof in CLRS .) Proof: First, define the black height of a red-black tree. bh: From the root node of the Red-black tree (but not including this root node) the number of black nodes included in the path to the leaf node (note, including the leaf node. In addition, the black height of the leaf node is 0. Below we first prove that a red and black tree with n internal nodes meet n> = 2 ^ Bh-1. This can be proved by mathematical induction and applied to the tree height H. When H = 0, this is equivalent to a leaf node. The black height BH is 0, and the number of internal nodes N is 0. At this time, 0> = 2 ^ 0-1 is true. Assuming that the tree height H <= t, n> = 2 ^ Bh-1 is true, we note that the number of internal nodes of the Left subtree of the root node of a red-black tree with t + 1 is NL, the number of inner nodes in the right subtree is Nr, and the black height of the two subtree is BH (note that the black height of the two subtree must be the same ), apparently the height of the two subtree is <= T, so NL> = 2 ^ BH '-1 and NR> = 2 ^ BH'-1, add these two inequalities with NL + nR> = 2 ^ (BH '+ 1)-2, add 1 to the left and right sides of the inequality, obtain n> = 2 ^ (BH '+ 1)-1, obviously BH' + 1> = BH, so the above inequality can be changed to n> = 2 ^ Bh-1, this proves that a red-black tree with n internal nodes satisfies n> = 2 ^ Bh-1. Next we will complete the remaining part of the proof, remember that the height of the Red-black tree is H. We first prove that BH> = H/2. In any path from the root node to the leaf node (excluding the root node, but including the leaf node), assuming that the number of nodes is m, note that the number of black nodes is BH. When M is an even number, we can see that each pair of adjacent nodes has at most one red node, so there is BH> = m/2. When M is an odd number, this path has an even number of nodes after the de-leaf node, so the black knots in these nodes B 'meet B'> = m-1)/2, add 1 before and after the inequality to obtain BH> = (m + 1)/2, and we can further obtain BH> M/2, when M is an even number, we can conclude that BH> = m/2, while M is equal to h in the tree height at the maximum, so we can prove that BH> = H/2. Replace BH> = H/2 with N> = 2 ^ Bh-1, and finally obtain h <= 2log (n + 1 ). The proof is complete. The rest of this article will explain how to insert and delete nodes without damaging the red/black tree, and why the number of insert and delete operations is proportional to the height of the tree, or O (log n ). Okasaki Insertion Method First, insert a node into the red/black tree in the same way as the binary search tree, and the color is red. (The child node of this new node is a leaf node. As defined, these leaf nodes are black .) In this case, we will either destroy the property 2 (the root node is black) or the property 4 (there cannot be two adjacent red nodes ). If the newly inserted node is the root node (this means that the red/black tree is empty before insertion), we only change the color of the node to black, and the insertion is complete. If attribute 4 is damaged, it must be because the parent node of the newly inserted node is also red. Because the root node of the Red-black tree must be black, the newly inserted node will certainly have a grandfather node, and the grandfather node of Nature 4 must be black. At this time, there are four possibilities for the structure of the subtree with the newly inserted parent node as the root node, I wrote it in my understanding. please correct me if it is incorrect .), As shown in the following illustration. In the Okasaki insertion method, each possible subtree is converted to the seed Tree form in the middle of the diagram. (A, B, C, and D represent any subtree. We once said that the child node of the newly inserted node must be a leaf node, but soon we will see the above diagram applicable to more general situations ). First, note that the order of <AxByCzD> remains unchanged during the transformation process. In addition, note that this transformation will not change the number of black nodes in the path from the parent node of the Child tree to any leaf node in the Child tree (the premise is that the child tree has a parent node, of course ). We once again encountered the following situation: that is, the red/black tree may only violate nature 2 (if y is the root node) or nature 4 (if y's parent node is red ), however, this transformation brings us a benefit that we are now two steps closer to the root node of the red/black tree. We can repeat this operation until the parent node of y is black. In this case, the insert operation is completed, or y becomes the root node, in this case, we dye y to black and then insert it. (If the root node is black, the same number of black nodes will be added for each path from the root node to the leaf node, therefore, if the property 5 is not damaged before the dyeing operation, the operation will not proceed .) The above steps maintain the property of the red and black trees, and the time spent is proportional to the height of the tree, that is, O (log n ). Rotate The structure adjustment operation in the red and black trees can often be expressed by the clearer term "rotation" operation, as shown in the figure below. Obviously, the <AxByC> sequence remains unchanged during the rotation operation. Therefore, if the tree is a binary search tree before the operation and only the rotation operation is used for structure adjustment, the tree is still a binary search tree after the adjustment. In the rest of this article, we will only adjust the tree using the rotation operation, so we do not need to explain how to keep the correct sorting of elements in the tree. In the following illustration, the transformation operation in the Okasaki insertion method is represented as one or two rotation operations. CLRS Insertion Method CLRS provides an insert method that is more complex but more efficient than the Okasaki insertion method. Its time complexity is still O (log n), but the constant in the large O is smaller. Like the Okasaki insertion method, the CLRS insertion method begins with the standard Binary Search Tree Insertion operation, and the newly inserted node is red, the difference between them is how to deal with the destroyed nature 4 (there cannot be two adjacent red nodes ). We need to distinguish two cases based on the color of the uncle node at the lower end of the red node. (The lower-end red node refers to the child node in a pair of child red parent nodes/red child nodes .) Let's first consider the case where the uncle node is black. Each red node is the Left or Right child node of its parent node. This situation can be divided into four sub-cases. The following illustration shows how to adjust the red and black trees and how to re-dye them. Here we are interested in comparing the methods in the above diagram with the Okasaki method. They are two different. The first point is about how to dye the final subtree (the subtree in the middle of the diagram. In the Okasaki method, the root node y of the subtree is dyed red and its child node is dyed black, however, in the CLRS method, y is dyed black and its child nodes are dyed red. Dyeing y to black means that the red-black tree property 4 (there cannot be two adjacent red nodes) will not be damaged at y, therefore, tree adjustment does not need to proceed to the root node direction. In this case, the CLRS insertion method requires up to two rotation operations to complete the insertion. The second difference is that in this case, the CLRS method must meet a prerequisite, that is, the uncle node at the bottom of the red node must be black. In the above illustration, we can clearly see that if the uncle node (that is, the root node of subtree A or D) is red, there will be two adjacent red nodes in the final tree. Therefore, this method cannot be used when the uncle node is red. Next we will consider the situation where the uncle node at the bottom of the red node is red. In this case, we dye the upper-End Red node and its brother node (that is, the lower-end red node's uncle node) into black and dye their parent node in red. The tree structure is not adjusted. In this case, four situations can be divided based on whether the lower-end red node is the Left or Right child node of its parent node, and whether the upper-End Red node is the left child node of its parent node or the right child node, however, these four cases are essentially the same. The following illustration only describes one situation: It is easy to see that the number of black nodes in the path from the root node of the tree to the leaf node does not change during this operation. After this operation, the property of the red/black number is damaged only when the root node of the subtree is also the root node of the entire tree or the parent node of the subtree is red. In other words, we will start to repeat the above operations, but we are two steps closer to the root node of the tree. Repeat this step until (I) z's parent node is black, and the insert operation ends. (ii) z becomes the root node, after we dye it in black, the insert operation ends. or (iii) when the uncle node at the bottom of the red node is black, at this time, we only need to perform one or two rotation operations to complete the insert. In the worst case, we must perform the dyeing operation on each node from the newly inserted node to the path of the root node. At this time, the required operand is O (log n ). Delete To delete a node from the red/black tree, we will start with the delete operation of a standard binary search tree (see CLRS, Chapter 12th ). Let's review three cases of deleting a standard binary search tree: 1. the node to be deleted does not have a subnode. In this case, we can simply delete it. If this node is the root node, the tree will become an empty tree; otherwise, the corresponding child node pointer in its parent node will be assigned NULL. 2. the node to be deleted has a subnode. Just like above, delete it directly. If it is a root node, its child node becomes the root node; otherwise, assign the corresponding child node pointer to the child node of the deleted node. 3. the node to be deleted has two subnodes. In this case, we first find the next node (successor) of this node, that is, the smallest node in its right subtree. Then we swap the data elements between the two nodes and then delete the subsequent nodes. Since this successor node cannot have a left subnode, the operation to delete the successor node will inevitably fall into one of the two situations above. Note: The deleted node in the tree is not necessarily the node that originally contains the data item to be deleted. But for the purpose of rebuilding the red and black trees, we only care about the node that is finally deleted. We call this node v and its parent node p (v ). At least one of the Child Nodes of v is a leaf node. If v has a non-leaf node, its position in the tree will be replaced by this subnode; otherwise, it will be replaced by a leaf node. We use u to represent the node that replaces the v position in the tree after the binary search tree is deleted. If u is a leaf node, we can determine that it is black. If the value of v is red, the delete operation is complete, because the deletion will not damage any properties of the red/black tree. Therefore, we assume that v is black. After deleting v, the paths of all child leaf nodes from the root node to the v node will have fewer black nodes than other paths from the root node to the leaf node in the tree, this will damage the property of the red/black tree 5. In addition, if p (v) and u are both red, the property 4 will also be damaged. However, in fact, our solution to the destruction of nature 5 can simultaneously solve the destruction of nature 4 without any additional work, so from now on, we will focus on the nature 5 issue. Let us put a black mark (black token) on u in our minds ). This mark indicates that a black node is missing from the root node to the path of all child leaf nodes with the Mark node (in the beginning, this is because v is deleted ). We will move this mark until property 5 is restored. In the following illustration, a black square is used to represent the mark. If the node with this mark is black, we call it a double black node ). Note that this mark is just a concept and there is no physical implementation in the data structure of the tree. We need to distinguish between four different situations. A. If the marked node is red or it is the root node of the tree (or both), you only need to dye it black to complete the deletion operation. Note that this will restore the property of the red/black tree 4 (there cannot be two adjacent red nodes ). Moreover, nature 5 will be restored, this mark indicates that a black node needs to be added to the path of all child leaf nodes from the root node to the node so that these paths are the same as the number of black nodes contained in other root node and leaf node paths.. By changing the red node to black, we add a black node to the path without a black node. If the marked node is the root node and black, you can simply discard the mark. In this case, the number of black nodes in each path from the root node to the leaf node in the tree is less than that before the delete operation, and the number of black nodes remains 5. In the remaining cases, we can assume that the marked node is black and not the root node. B. If the brother node and the two nephew nodes of the dual black node are black, we will dye the brother node in red and move the mark one step toward the root of the tree. The following illustration shows two possible sub-cases. The dotted line surrounding y indicates that here we don't care about the color of y, and the small circle above A, B, C, and D indicates that the root node of these subtree is black: note that this dual-black node will inevitably have two non-leaf node nephew nodes. This is because the mark of this double black node indicates that the number of black nodes in the path from the root node to all child leaf nodes of the node is less than the number of black nodes contained in the other root node to the leaf node path. 1, the double black node is itself a black node, so the number of black nodes on the path from its brother node to its child leaf node must be greater than 1, we can easily see that if any child node of the sibling node is a leaf node, this cannot be met, therefore, this dual-black node will inevitably have two non-leaf node nephew nodes ). If the sibling node is red, a black node is removed from the path of all child leaf nodes to the node, therefore, the number of black nodes on these paths is the same as the number of black nodes on the path to the child leaf node of the dual-black node. We move the mark up to y, which indicates that a black node is missing in the path of all the child leaf nodes to y. At this time, the problem is still not solved, but we have taken another step toward the root of the tree. Obviously, the above operations can be performed only when the two nephew nodes of the marked node are black, this is because if a node is red, this operation will lead to two adjacent red nodes. C. If the sibling node of the marked node is red, we will rotate it and change the color of the node. The following illustration shows two possible scenarios: Note that the above operation will not change the number of black nodes from the root node to any leaf node path, and it ensures that the brother node of this double black node is black after the operation, this makes subsequent operations either B or D. Because this mark is farther away from the root node of the tree before the operation, it seems that we have regressed backwards. Note that the parent node of this dual-black node is red now. Therefore, if the next operation is case B, the mark will be moved up to that red node, then we just need to dye it black. In addition, the following will show that in case D, we can always consume this mark to complete the delete operation. Therefore, this seemingly regressing phenomenon actually means that the deletion operation is almost complete. D. Finally, we encountered a black node with a black brother node and at least a nephew node with a red node. The following is a definition of the near nephew node of node x: If x is the left child of its parent node, then the left child node of the brother node of x is the near nephew node of x, otherwise the right child node of x is the near nephew node of x; the other nephew node is the far nephew node (far nephew) of x ). (As shown in the following illustration, the near-nephew node of x is closer to x than its far-nephew node .) Now we will encounter two seed cases: (I) the distant nephew node of the double black node is black, and its near nephew node must be red in this case; (ii) the distant nephew node is red. In this case, its near nephew node can be any color. As shown in the following illustration, subcondition (I) can be converted to subcondition (ii) by one rotation and discoloration, while in subcondition (ii) next, you can complete the delete operation by rotating and changing the color. Based on whether the dual-black node is the Left or Right child node of its parent node, the two lines in the diagram below show two symmetric forms. In this case, an extra black node is generated, the mark is discarded, and the delete operation is complete. From the above illustration, it is easy to see that the number of black nodes in the path of all child leaf nodes with the Mark node increases by 1, while the number of black nodes in other paths remains unchanged. Obviously, at this moment, the red and black trees are not damaged by any nature. By combining all the above conditions, we can see that in the worst case, we must perform the constant operation every time along the path from the leaf node to the root node, therefore, the time complexity of the delete operation is O (log n ). Compared with the AVL tree, I have been immersed in the AVL Tree of 2.4, but I don't know, it's already a New Year (Happy New Year !!), 2.6 you have used Red-Black-Tree. No, I don't understand. The world is getting too fast. Since it is an AVL Tree, compare it: Introduction: AVL Tree, also known as the highly balanced binary search tree, was proposed by two Russian mathematicians, G. M. Adel 'son-Vel, sky and E. M. Landis in 1962. The purpose of introducing Binary Trees is to improve the efficiency of binary tree search and reduce the average length of the tree search. Therefore, each binary tree must be inserted. Adjust the tree structure at a knot point to maintain a balance in binary tree search, which may reduce the tree height and the average tree search length. AVL Tree definition: An AVL Tree meets the following conditions: 1> both the left and right subtree are AVL trees. 2> the height difference between the left subtree and the right subtree cannot exceed 1 It can be seen from condition 1 that it is a recursive definition, like GNU. Nature: 1> the height of an AVL Tree with n nodes is kept at 0 (log2 (n) and does not exceed 3/2log2 (n + 1) 2> the average search length of an AVL Tree with n nodes is kept at 0 (log2 (n )). 3> the AVL Tree with n nodes deletes a node. The time required for balancing the rotation is 0 (log2 (n )). From this point of view, the red and black trees sacrifice the strict high-balance superiority condition. At the cost, the red and black trees can search, insert, and delete logs in the time complexity of O (log2 N. In addition, due to its design, any imbalance will be solved within three rotations. Of course, there are some better ones, but the more complex data structure can achieve a balance within one step of rotation, but the red and black trees can give us a relatively "cheap" solution. The time complexity of the algorithm is the same as that of AVL, but the statistical performance is higher than that of AVL. Let's see how people comment: AVL trees are actually easier to implement than RB trees because there are fewer cases. and AVL trees require O (1) rotations on an insertion, whereas red-black trees require O (lg n ). In practice, the speed of AVL trees versus red-black trees will depend on the data that you're inserting. if your data is well distributed, so that an unbalanced Binary Tree wocould generally be acceptable (I. e. roughly in random order), but you want to handle bad cases anyway, then red-black trees will be faster because they do less unnecessary rebalancing of already acceptable data. on the other hand, if a pathological insertion Order (e.g. increasing order of key) is common, then AVL trees will be faster, because the stricter balancing rule will reduce the tree's height. Splay trees might be even faster than either RB or AVL trees, depending on your data access distribution. and if you can use a hash instead of a tree, then that'll be fastest of all. Try to translate: <PRE> Because the AVL Tree has fewer types, it is easier to implement than the red and black trees, and the complexity required for the ALV tree to be inserted in rotation is 0 (1), while the red The complexity required by the black tree is 0 (lgn ). In fact, the speed of inserting AVL and red/black trees depends on the data you have inserted. if your data distribution is good, it is better to use the AVL Tree (for example, to generate a series of numbers at random), but if you want to handle messy situations, the red and black trees are faster, the red and black trees rebalance the processed data to reduce unnecessary operations. on the other hand, if an unusual insertion series is more common (for example, inserting a key series), the AVL Tree is faster, because its strict balancing rules will reduce the height of the tree. The Splay tree may be faster than the red, black, and AVL trees. It also depends on the data distribution you access. If you use a hash table to replace a tree Make the tree faster. </Pre> I am not very clear about what the Splay tree is. I have not checked it in detail. Feel the changes // --> <Pre> /* * Review the Old Huang Li (2.4) */ Struct vm_area_struct * find_vma (struct mm_struct * mm, unsigned long addr) { Struct vm_area_struct * vma = NULL; If (mm) { /* * Check the cache first. */ /* * (Check hit rate is typically around 35% .) */ /* * First, check whether the last accessed virtual address space is in the CACHE. */ Vma = mm-> mmap_cache; If (! (Vma & vma-> vm_end> addr & vma-> vm_start <addr )) { /* * If miss hit is not hit, search for a linear table or AVL Tree. */ If (! Mm-> mmap_val) { /* * Go though the liner list */ Vma = mm-> mmap; While (vma & vma-> vm_end <= addr) { Vma = vam-> vma_next; } } Else { /* * Then go though the AVL tree quickly */ Struct vm_area_struct * tree = mm-> mmap_avl; Vam = NULL; For (;;) { If (tree = vm_avl_empty) { /* * The node is empty and fails. */ Break; } If (tree-> vm_end> ADDR) { VMA = tree; If (tree-> vm_start <= ADDR) { /* * Find and quickly exit the loop */ Break; } Tree = tree-> vm_avl_left; } Else { Tree = tree-> vm_avl_right; } } } If (VMA) { /* * The search is successful and recorded in the cache. */ Mm-> mmap_cache = VMA; } } } Return VMA; } // <-- </PRE> /* * Paste the new token (2.6) */ // --> <Pre1> Struct vm_area_struct * find_vma (struct mm_struct * mm, unsigned long addr) { Struct vm_area_struct * vma = NULL; If (mm ){ /* Check the cache first .*/ /* (Cache hit rate is typically around 35% .)*/ /* * First, check whether the last accessed virtual address space is in the CACHE. */ Vma = mm-> mmap_cache; If (! (Vma & vma-> vm_end> addr & vma-> vm_start <= addr )){ Struct rb_node * rb_node; /* * If miss hit is not hit, search for the red-black tree directly. */ Rb_node = mm-> mm_rb.rb_node; Vma = NULL; While (rb_node ){ Struct vm_area_struct * vma_tmp; Vma_tmp = rb_entry (rb_node, Struct vm_area_struct, vm_rb ); If (vma_tmp-> vm_end> addr ){ Vma = vma_tmp; If (vma_tmp-> vm_start <= addr) Break; Rb_node = rb_node-> rb_left; } Else Rb_node = rb_node-> rb_right; } /* * The search is successful and recorded in the CACHE. */ If (vma) Mm-> mmap_cache = vma; } } Return vma; } // <-- </Pre1> Here we just made some small comparisons and real applications in the kernel. Many of them have not been analyzed yet. Hope you can correct and expand your skills. |