Abstract: This article lists several basic STL map and STL set questions. By answering these questions, we explain the data structure in the STL associated container, finally, we raised the question about the built-in balanced binary tree library functions and map and set selection in Unix/Linux, and analyzed the advantages of MAP and set. For those who want to learn STL in depth and who want to know the underlying data structure of related containers such as STL map, there is a certain reference value.
The use of STL map and set is not complex, but it is difficult to understand, for example:
Why is the insert/delete efficiency of MAP and set higher than that of other sequential containers?
Why does the previously saved iterator not expire after each insert operation?
Why can't map and set have a reserve function like a vector to pre-allocate data?
When the number of data elements increases (10000 to 20000 comparisons), how does the insert and search speeds of MAP and set change?
Some people may be able to answer the rough answer, but to thoroughly understand it, we also need to understand the underlying data structure of STL.
C ++ STL has been widely praised and used by many people. It not only provides convenient containers such as vector, string, and list, more importantly, STL encapsulates many complex data structure algorithms and a large number of common data structure operations. Vector encapsulation array, list encapsulates the linked list, MAP and set encapsulate Binary Trees, etc. When encapsulating these data structures, STL follows the programmer's usage habits, common Operations provided by member functions, such as insert, sort, delete, and search. It makes users feel familiar with STL usage.
In C ++ STL, the standard associated containers set, Multiset, map, and multimap use a very efficient balanced binary tree for retrieval: red/black, it also becomes a red-black tree ). The statistical performance of the RB tree is better than that of the general balanced binary tree (some books refer to them as AVL-trees based on the author's name, Adelson-velskii and Landis ), therefore, STL selects the internal structure of the associated container. This article will not introduce the implementation of AVL and Rb trees in detail and their advantages and disadvantages. For details about the implementation of Rb trees, see red and black trees: Theory and implementation (theoretical ). This article will give you a brief introduction to the underlying data structure of MAP and set based on the answers to the first few questions.
Why is the insert/delete efficiency of MAP and set higher than that of other sequential containers?
Most people say that it is very simple, because for associated containers, memory copying and memory moving are not required. That's true. All elements in the map and set containers are stored as nodes. The node structure is similar to the linked list, pointing to the parent node and the child node. The structure may be as follows:
A
/\
B c
/\/\
D E F G
Therefore, you only need to perform a slight transformation during the insertion, and point the node pointer to the new node. The deletion process is similar. After a slight transformation, it is okay to point the pointer to the deleted node to another node. All the operations here are pointer exchange, which has nothing to do with memory movement.
Why does the previously saved iterator not expire after each insert operation?
After reading the explanation of the above answer, you can easily explain this question. Iterator is equivalent to a pointer to a node. If the memory is not changed, how can the pointer to the memory become invalid? (of course, the deleted element itself is no longer valid ). The pointer may be invalid for each deletion or insertion of a vector, and push_back is called to insert data at the end. To ensure the continuous storage of internal data, the block pointed to by iterator may have been overwritten by other memory or the memory has been released. Even when push_back is used, the internal space of the container may not be enough. A new larger memory is required, and only the previous memory is released to apply for a new larger memory, copy the existing data elements to the new memory and put the elements to be inserted to the end. Then, the previous memory pointer is naturally unavailable. Especially when using algorithms such as find, remember this principle: Do not use expired iterator.
Why can't map and set have a reserve function like a vector to pre-allocate data?
I used to ask this question. In terms of its principle, the reason is that the stored in map and set is not the element itself, but the node containing the element. That is to say, the alloc used inside the map is not the alloc passed in from the parameter when the Map <key, Data, compare, alloc> is declared. For example:
Map <int, Int, less <int>, alloc <int> intmap;
In this case, the Allocator used in intmap does not use alloc <int>, but uses the converted alloc. The specific conversion method uses alloc <int> :: rebind redefined the new node distributor. For detailed implementation, refer to thoroughly studying Allocator in STL. In fact, you should remember that the Allocator in map and set has changed, and you should not expect the reserve method.
When the number of data elements increases (10000 and 20000 comparisons), how does the insert and search speeds of MAP and set change?
If you know the relationship between log2, you should thoroughly understand this answer. In map and set, binary search is used. That is to say, if there are 16 elements, the result can be found after a maximum of four comparisons. There are 32 elements and a maximum of five comparisons can be made. What about 10000? The maximum number of comparisons is log10000, and the maximum number is 14. What if it is 20000 elements? Up to 15 times. As you can see, when the data size doubles, the number of searches is only one more time, and the search time is 1/14 more. After you understand this truth, you can put elements in it with peace of mind.
Finally, for map and set winter, we need to mention their efficiency comparison with a C language packaging library. On many UNIX and Linux platforms, there is a library named ISC, which provides functions similar to the following declaration:
Void tree_init (void ** tree );
Void * tree_srch (void ** tree, INT (* compare) (), void * data );
Void tree_add (void ** tree, INT (* compare) (), void * data, void (* del_uar )());
Int tree_delete (void ** tree, INT (* compare) (), void * data, void (* del_uar )());
Int tree_trav (void ** tree, INT (* trav_uar )());
Void tree_mung (void ** tree, void (* del_uar )());
Many people think that using these functions directly is faster than STL map, because STL map uses many templates or something. In fact, they do not differ in algorithms, but in memory fragments. If you use these functions directly, you need to create new nodes. When there are many nodes and frequent deletion and insertion, the memory fragments will exist, STL uses its own allocator to allocate memory and manages the memory in a memory pool, which greatly reduces memory fragments and improves the overall performance of the system. Winter has tested it in his own system and replaced all previous Code directly using the ISC function with map. The program speed is basically the same. After running for a long time (such as a background Service Program), the advantages of map are shown. In addition, using map greatly reduces the difficulty of coding and increases the readability of the program. Why not?