Chen Shuo (chenshuo.com)
2013-01-20
Std: set/std: map (represented by std: map below) is a commonly used associative container and also ADT (abstract data type ). That is to say, its interface (not an interface in the OO sense) not only specifies the operation function, but also specifies the operation complexity (cost/cost ). For example, set: insert (iterator first, iterator last) is usually O (NLogN),NIt is the length of the interval; but if [first, last) has sorted the order (in the key_compare sense), the complexity will be O (N).
Although the C ++ standard does not require the underlying data structure of std: map, all STL implementations now use a balanced binary tree to implement std: map according to the prescribed time complexity, and they all use red and black trees. Chapter 13 of Introduction to algorithms (version 2nd) describes the principles, properties, and pseudocode of the Binary Search Tree and the red/black tree, mr. Hou Jie's "STL source code analysis" Chapter 5th detailed analysis of the corresponding implementation of sgi stl. The implementation of the Red-black tree (rb_tree) in STLI asked a few more in-depth questions.And give my understanding. This article analyzesThe STL Implementation of G ++ 4.7 and its specific behavior, Which is slightly different from the version of STL source code analysis. For ease of reading, the variable names and class names in this article are slightly rewritten (for example, _ Rb_tree_node is changed to rb_tree_node ). This article does not talk about the balancing algorithm of the red and black trees. In my opinion, this is a "sub-branch" (see Chen Shuo's definition of "network programming Learning Experience ), therefore, we do not care about the specific color of the node.
Data structure review
First, let's review the structure of the Binary Search Tree mentioned in the data structure textbook. A Node generally has three data members (left, right, and data), and a Tree) one or two members (root and node_count ).
Class Node: def _ init _ (self, data): self. left = None self. right = None self. data = dataclass Tree: def _ init _ (self): self. root = None self. node_count = 0
In fact, the structure of STL rb_tree is slightly more complex than this, and the code I organized can be found in the https://gist.github.com/4574621#file-tree-structure-cc.
Node
Node has five members, except left, right, data, and color and parent.
C ++ implementation, in bits/stl_tree.h/*** Non-template code **/enum rb_tree_color {kRed, kBlack}; struct rb_tree_node_base {rb_tree_color _; rb_tree_node_base * parent _; rb_tree_node_base * left _; rb_tree_node_base * right _;};/*** template code **/template <typename Value> struct rb_tree_node: public rb_tree_node_base {Value value_field _;};
See.
The existence of color is easy to understand. each node in the red/black tree is black and needs to save its color (color only needs 1-bit data, one optimization measure to save memory is to embed the color into the highest or lowest Bit Of A pointer, And the rbtree in the Linux kernel is the lowest Bit embedded in the parent ); the existence of parent makes non-recursive traversal possible, which will be discussed later.
Tree
Tree has more members, including a complete rb_tree_node_base (color/parent/left/right), and node_count and key_compare.
Some default template parameters are omitted here, such as key_compare and allocator. Template <typename Key, typename Value> // key_compare and allocatorclass rb_tree {public: typedef std: less <Key> key_compare; typedef rb_tree_iterator <Value> iterator; protected: struct rb_tree_impl //: public node_allocator {key_compare _; rb_tree_node_base header _; size_t node_count _ ;}; rb_tree_impl impl _ ;}; template <typename Key, typename T> // key_compare and allocatorclass map {public: typedef std: pair <const Key, T> value_type; private: typedef rb_tree <Key, value_type> rep_type; rep_type tree _;};
See. This is an empty tree, where the shadow part is padding bytes, because key_compare is usually empty class. (Where is allocator ?)
The header in rb_tree is not of the rb_tree_node type, but rb_tree_node_base. Therefore, the size of rb_tree is 6 * sizeof (void *), which is irrelevant to the template type parameter. It is 24 bytes on 32-bit and 48 bytes on 64-bit, which can be easily verified by code. In addition, it is easy to verify that the sizeof () of std: set and std: map are the same.
Note that the header in the rb_tree is not a root node, and its left and right members do not point to the left and right subnodes, but to the leftmost node (left_most) and rightmost node (right_most ), the reason will be introduced later to meet the time complexity. Header. parent points to the root node, root. parent points to the header, the header is fixed to red, and the root is fixed to black. After a node is inserted, the data structure is shown in figure.
Insert two nodes. If the two nodes are on both left and right sides of the root node, the data structure is shown in figure (the parent pointer is not fully drawn because it points to obvious). Pay attention to the header. left points to the leftmost node, header. right points to the rightmost node.
Iterator
The iterator data structure of rb_tree is very simple. It only contains one rb_tree_node_base pointer, but its ++/-- operation is not easy (the specific implementation function is not in the header file, in the libstdc ++ library file ).
// defined in library, not in headerrb_tree_node_base* rb_tree_increment(rb_tree_node_base* node);// others: decrement, reblance, etc.template<typename Value>struct rb_tree_node : public rb_tree_node_base{ Value value_field_;};template<typename Value>struct rb_tree_iterator{ Value& operator*() const { return static_cast<rb_tree_node<Value>*>(node_)->value_field_; } rb_tree_iterator& operator++() { node_ = rb_tree_increment(node_); return *this; } rb_tree_node_base* node_;};
End () always points to the header node, and begin () points to the first node (if any ). Therefore, for an empty tree, begin () and end () both point to the header node. For a tree with one element, the iterator points to the following.
For the tree of the first three elements, the iterator points to the following.
Think about what it will get when performing dereference on std: set <int>: end? (According to the standard, this is undefined behaviour, but it does not matter .)
The increasing and decreasing operations of iterator of rb_tree are not simple. Consider the following tree. Assume that the iterator iter points to the green Node 3. After ++ iter, it should point to the gray node 4. After ++ iter, it should point to the yellow node 5, the two increment steps each walk through two pointers.
For a larger tree (), assuming that the iterator iter points to the green node 7, it should point to the gray node 8 after ++ iter, it should point to the yellow node 9, each of which goes through three pointers.
It can be seen that each increment or decrease of the rb_tree iterator cannot be a constant time, and the worst case may be a logarithm time (that is, proportional to the depth of the tree ). So can I use begin ()/end () to iterate over a tree or not O (N )? In other words, is the increment or decrement of the iterator a constant time after amortized?
Note that when iter points to the rightmost node (7 or 15), ++ iter should point to the header node, that is, end (). This step is O (log N ). Similarly, the complexity of end () is O (log N), which will be used later.
The implementation of the increasing and decreasing operations of the rb_tree iterator is not that clear at a glance. To traverse a binary tree from the beginning to the end (the first order, the middle order, the back order), the teaching material gives the way is to use recursion (or use stack to simulate recursion, the same nature), such as :( https://gist.github.com/4574621#file-tree-traversal-py)
Python:def printTree(node): if node: printTree(node.left) print node.data printTree(node.right)
If you consider universality, you can take the function as a parameter and then access each node through callback. The Code is as follows. The same is true for Java XML.
Python:def visit(node, func): if node: printTree(node.left) func(node.data) printTree(node.right)
To make it easier to use, the caller can use the for loop to traverse the tree from start to end, which is not so easy. In the coroutine-supported Python language, the yield keyword can be used in combination with recursion. The Code is as follows, which is not much different from the previous implementation.
Yield from can also be used in Python 3.3, which is written in Python 2.x. Def travel (root): if root. left: for x in travel (root. left): yield x yield root. data if root. right: for y in travel (root. right): yield y Caller: for y in travel (root): print y
However, in C ++, to achieve the final StAX Traversal method, the implementation of the iterator is much more troublesome. For details, see section 5.2.4 of STL source code analysis. This is also the reason for the existence of the parent pointer in the node, because in the incremental operation, if the current node does not have the right child node, You need to first return to the parent node, and then find.
Space complexity
Each rb_tree_node directly contains value_type, which is 4 * sizeof (void *) + sizeof (value_type ). During actual memory allocation, it also needs to round up the number of aligned bytes to allocator/malloc. Generally, 32-bit is 8 bytes, and 64-bit is 16 bytes. Therefore, set <int> each node is 24 bytes or 48 bytes, And the set <int> of the 1 million elements occupies 48 MB of memory on the x86-64. It is unwise to use set <int> to sort integers, regardless of time or space.
Considering the influence of malloc alignment, set <int64_t> and set <int32_t> occupy the same space. set <int> and map <int, int> occupy the same space, both 32-bit and 64-bit.
Why?
We have several questions about the data structure of rb_tree:
1. Why does rb_tree not contain allocator members?
2. Why can iterator pass-by-value?
3. Why should the header have a left Member and point to the left most node?
4. Why should the header have a right member and point to the right most node?
5. Why should the header have a color member and be fixed in red?
6. Why are there two layers of rb_tree_node and rb_tree_node_base structures? What is the purpose of introducing the node base class?
7. Why does the increasing and decreasing of iterator mean the constant time of amortized?
8. Why does the poeller of the muduo network library use std: map <int, Channel *> to manage file descriptors?
In my opinion, the answer to the seven questions in front of me during the interview is good enough to show that we have a good grasp of STL map/set. The answer is as follows.
Why does rb_tree not contain allocator members?
Because the default allocator is empty class (no data member or virtual table pointer vptr), empty base class optimization is used in STL to save the size of the rb_tree object. Specifically, std: map uses rb_tree as the member, rb_tree uses rb_tree_impl as the member, and rb_tree_impl inherits from allocator. If allocator is empty class, the size of rb_tree_impl is the same as that of no base class. Other STL containers also use the same optimization measures. Therefore, the std: vector object contains three words, and the std: list object contains two words. Boost compressed_pair also uses the same optimization.
In my opinion, the same optimization should be implemented for the default key_compare, so that each rb_tree only needs 5 words instead of 6 words.
Why can iterator pass-by-value?
After reading Objective C ++, you must remember that one of the terms is Prefer pass-by-reference-to-const to pass-by-value, that is, the object tries to pass parameters in the const reference mode. This clause also states that for built-in types, STL iterators, and STL imitation functions, pass-by-value is also acceptable, and there is generally no performance loss.
On the x86-64, for rb_tree iterator class with only one pointer member and no custom copy-ctor, pass-by-value can be implemented through registers (for example, the first four parameters of the function, and the by-value return object is regarded as a parameter), just like passing a common int and pointer. Therefore, it may be slightly faster than pass-by-const-reference because callee reduces the deference.
Similarly, the Date class and Timestamp class in muduo are also clearly designed to pass-by-value. They all have only one int/long member, and copying by value is no slower than pass reference. If you use pass-by-const-reference for an object without any mistakes, you may not know why.
Why should the header have a left Member and point to the left most node?
The reason is very simple. Let the begin () function be O (1 ). If only the parent does not have left in the header, the begin () will be O (log N), because the left most can be reached only after taking the log N step from the root. Now, you only need to use header. left to construct the iterator and return it.
Why does the header have a right member and point to the right most node?
This problem is not so obvious. End () is O (1), because you can directly construct an iterator using the header address, without using the right most node. There is such a comment in the source code:
bits/stl_tree.h // Red-black tree class, designed for use in implementing STL // associative containers (set, multiset, map, and multimap). The // insertion and deletion algorithms are based on those in Cormen, // Leiserson, and Rivest, Introduction to Algorithms (MIT Press, // 1990), except that // // (1) the header cell is maintained with links not only to the root // but also to the leftmost node of the tree, to enable constant // time begin(), and to the rightmost node of the tree, to enable // linear time performance when used with the generic set algorithms // (set_union, etc.) // // (2) when a node being deleted has two children its successor node // is relinked into its place, rather than copied, so that the only // iterators invalidated are those referring to the deleted node.
This statement means that if an element is inserted in the order of size, it will be a linear time rather than O (N log N ). That is, the running time of the following code is proportional to N:
// Insert the element std: set <int> s; const int N = 1000*1000 for (int I = 0; I <N; ++ I) s. insert (s. end (), I );
In the implementation of rb_tree, the general complexity of an element of insert (value) is O (log N ). However, insert (hint, value) can not only directly pass value_type, but also another iterator as the hint. If the actual insertion point is about hint, the complexity after the allocation is O (1 ). In this case, since end () is inserted every time and the inserted elements are larger than * (end ()-1), insert () is O (1 ). In the specific implementation, if the hint is equal to end () and the value is greater than the right most element, insert a new element directly to the right sub-node of right most. Here header. right means that we can get the element of right most node at constant time, so as to ensure the complexity of insert () (instead of having to start from the root to log N steps to reach right most ). See the https://gist.github.com/4574621#file-tree-bench-cc for specific run time testing. The test results are as follows. The ordinate is the time consumed for each element (microseconds), where the top red line is normal insert (value ), the blue line and the black line below are insert (end (), value). We can see roughly the relationship between O (log N) and O (1. For details, see question 17-4 in chapter 2nd of Introduction to algorithms (version 17th.
However, according to the test results, the comments referenced above are actually incorrect. The combination of std: inserter () and set_union () cannot implement O (N) complexity. The reason is that std: inserter_iterator performs ++ iter once after each insert, and iter points to the right most node, and its ++ operation is O (log N) complexity (as mentioned earlier, the decreasing of end () is O (log N), which is the same here ). So the whole algorithm is slowed down to O (N log N ). To set set_union () to linear complexity, we need to write our own inserter. For more information, see end_inserter and at_inserter in the above Code.
Why should the header have a color member and be fixed to red?
This is an implementation technique. If the iterator points to end () at the moment, it should go to the right most node. Since the iterator has only one data member, to determine whether the current point is to end (), you have to determine (node _-> color _ = kRed & node _-> parent _ = node.
Why is it divided into two layers: rb_tree_node and rb_tree_node_base? What is the purpose of introducing the node base class?
This is to move complex functions such as increasing and decreasing the iterator and re-balancing the tree to the library file from scratch, reduce the code expansion caused by the template (set <int> and set <string> can share these rb_tree basic functions), and slightly speed up compilation. After the rb_tree_node_base base class is introduced, these operations can take the base class pointer (irrelevant to the template parameter type) as the parameter, so the function definition does not need to be placed in the header file. This is why we cannot see the implementation of iterator ++/-- in the header file. They are located in the library file source code of libstdc ++. Note that the base class here is not for OOP, but purely an implementation technique.
Why does the increasing and decreasing of iterator mean the constant time of amortized?
Strictly prove that amortized analysis is required. I won't come here, but I don't have many people to write it out. Here I will illustrate this with a single inductive approach. Consider a special situation where the full binary tree (perfect binary tree) in the preceding figure is traversed from start to end to calculate the total number of steps (that is, the number of follow pointers) that the iterator has taken ), divide by the number of nodes N to get the average number of steps required for each increment. Since the red and black trees are balanced, this number is not far from the actual number of steps.
For a full binary tree with a depth of 1, there is one element. From begin () to end (), one step is required, that is, from root to header.
For a full binary tree with a depth of 2, there are three elements, from begin () to end (), four steps are required, namely 1-> 2-> 3-> header, the steps from 3 to header are two.
For a full binary tree with a depth of 3, there are 7 elements. From begin () to end (), you need to take 11 steps to traverse the left subtree (4 steps) first) 2 steps to reach the leftmost node of the right subtree, traverse the right subtree (4 steps), and finally step 1 to reach end (), 4 + 2 + 4 + 1 = 11.
For a full binary tree with a depth of 4, there are 15 elements, which take 26 steps from begin () to end. That is, traverse the left subtree (Step 11), walk three steps to reach the leftmost node of the right subtree, traverse the right subtree (Step 11), and finally walk one step to reach end (), 11 + 3 + 11 + 1 = 26.
The numbers below are 57, 120, and 247 in sequence.
For a full binary tree with a depth of n, there are 2 ^ n-1 elements. Step f (n) is required from begin () to end. Then f (n) = 2 * f (n-1) + n.
Then, use the recursive relationship to obtain f (n) = sum (I * 2 ^ (n-I) = 2 ^ (n + 1) -n-2 (this equation can be proved by induction ). That is, for a full binary tree with a depth of n, the number of steps traversed from start to end is less than 2 ^ (n + 1)-2, and the number of elements is 2 ^ n-1, it takes two steps to get an average value for each element. Therefore, it can be said that the increasing and decreasing of the rb_tree iterator is the constant time after the allocation.
It seems that there is a simpler way to prove, in the process of traversing from start to end, each edge (edge) can go back and forth at most once, a tree has N nodes, then there are N-1 side, take 2 * (N-1) + 1 steps at most, that is, the average of each node requires 2 steps, the same as the above results.
Let's talk a little bit about the problem.
Why does the poeller of the muduo network library use std: map <int, Channel *> to manage file descriptors?
Muduo uses EPollPoller for normal use, which is a simple encapsulation of epoll (4). std: map <int, channel *> channels _ to save the fd-to-Channel object ing. Instead of arrays, I use std: map for the following reasons:
- Epoll_ctl () is O (lg N) because the Red/black tree is used in the kernel to manage fd. Therefore, using arrays to manage fd does not reduce the time complexity, but may increase the memory usage (using hash is good ). However, considering the system call overhead, the actual speed difference between map vs. vector is not obvious. (Remark: it is always said that epoll is O (1) cloud. In fact, epoll_wait () is O (N), and N is the number of active fd. Both poll and select are O (N), but N has different meanings. After careful calculation, I am afraid only epoll_create () is O (1. Some people want to change epoll to array, but was rejected, because it is open history reversing https://lkml.org/lkml/2008/1/8/205 .)
- Channels _ is accessed only when the Channel is created and destroyed. Other times (the read/write events to be modified) are in assert () and used for the Debug mode assertion. Channel creation and destruction are accompanied by socket creation and destruction, involving system calls. channels _ operations account for a very small proportion. Therefore, optimization of channels _ is meaningless because it is an optimization of nop.
(. End .)