[ZZ] C + + map and set comparison

Source: Internet
Author: User

map and set in the standard library
 Abstract: This article lists a few basic STL map and STL set problems, by answering these questions to explain the internal data structure of the STL Association container, finally put forward about the Unix/linux self-balanced binary tree library function and map, set selection problem, and analyzed the map, The advantages of set. For friends who want to learn more about STL and the underlying data structures of associated containers such as STL map, there is a certain reference value.
The use of STL map and set is not complex, but there are some difficult areas to understand, such as:
Why is the insertion and deletion efficiency of map and set higher than with other sequence containers?
Why does the previously saved iterator not expire after each insert?
Why can't map and set have a reserve function like a vector to pre-allocate data?
When data elements increase (10000 to 20,000 comparisons), how does the insertion and search speed of map and set change?
Maybe some people can answer the approximate reason, but to understand thoroughly, but also need to understand the STL's underlying data structure.
C + + STL has been widely praised, also used by many people, not only to provide such as vector, String, list and other convenient containers, more importantly, STL encapsulates a lot of complex data structure algorithms and a large number of commonly used data structure operations. Vector package Array, list encapsulates the linked list, map and set encapsulated two fork tree, in the encapsulation of these data structures, STL in accordance with the use of the programmer's habits, as a member function of the common operations, such as: Insert, sort, delete, find and so on. Let the user in the STL use process, do not feel unfamiliar.
The standard associative container set, multiset, map, Multimap inside C + + STL is a very efficient balanced retrieval binary tree: Red-black trees, also become RB trees (red-black tree). The statistical performance of RB trees is better than that of generalbalanced two-pronged tree(Some books, according to the author's name, Adelson-velskii and Landis, refer to it as the avl-tree), so the STL is chosen as the internal structure of the associative container. This article does not describe the implementation of the detailed AVL tree and RB tree, and their merits and demerits, the detailed implementation of the RB tree seeRed and Black trees: Theory and realization (theory)。 This paper gives a brief introduction to the underlying data structures of map and set for the answers to several questions that have been raised.
Why is the insertion and deletion efficiency of map and set higher than with other sequence containers?
Most people say it's simple because there's no need for memory copy and memory movement for associative containers. That's right, that's true. All elements within the map and set containers are stored as nodes, with a node structure that is similar to the linked list, pointing to the parent node and child nodes. The structure chart might look like this:
A
/ \
B C
/ \ / \
D E F G
As a result, you just need to make a little transformation, and point the pointer to the new node. Delete the same time, a little change after the pointer to the deletion of the node point to the other node is OK. All that is done here is that the pointer is swapped out, and the memory movement is not related.
Why does the previously saved iterator not expire after each insert?
Seeing the explanation of the above answer, you should already be able to explain the problem easily. Iterator here is the equivalent of a pointer to a node, the memory does not change, the pointer to the memory is how to invalidate it (of course, the deleted element itself has been invalidated). Each time the pointer is deleted and inserted, it is possible for the cursor to fail relative to the vector, and the call Push_back at the end of the insertion. Because in order to ensure the continuous storage of internal data, the iterator pointed to the block within the deletion and insertion process may have been overwritten by other memory or memory has been freed. Even when the push_back, the container internal space may not be enough, need a new larger memory, only the previous memory freed, request new larger memory, copy the existing data elements to the new memory, and finally put the elements need to be inserted into the last, then the previous memory pointer is naturally unusable. In particular, when working with algorithms such as find, keep this principle in mind: do not use outdated iterator.
Why can't map and set have a reserve function like a vector to pre-allocate data?
As I have said before, the rationale for this is that it is not the element itself that is stored inside the map and set, but rather the node that contains the element. That is to say, map internal use of the Alloc is not Map<key, Data, Compare, alloc> declaration of the time from the parameters of the incoming Alloc. For example:
Map<Int,Int,Less<Int, alloc<Int> > Intmap;
At this time the allocator used in Intmap is not alloc<int&gt, but rather through the conversion of the alloc, the specific conversion method when the new node allocator is redefined internally through Alloc<int>::rebind, See the detailed implementationthoroughly learn the allocator in STL。 In fact, you will remember that in the map and set inside the allocator has changed, the reserve method you do not expect.
How does the insertion and search speed of map and set change when the number of data elements increases (10000 and 20,000 comparisons)?
If you know log2 's relationship, you should have a thorough understanding of the answer. Finding in map and set is a binary lookup, that is, if there are 16 elements, you will need to compare 4 times to find the result, 32 elements, and a maximum of 5 times. So there are 10,000 of them? The maximum number of comparisons is log10000, up to 14, and 20,000 if it is a single element. Up to 15 times. See, when the amount of data increases by one time, the number of searches is only 1 more times, more than 1/14 of the search time. Once you understand this, you can safely put the elements inside.
Finally, for both the map and set winter, they are compared to the efficiency of a C-language packaging library. In many UNIX and Linux platforms, there is a library called ISC, which provides a function similar to the following declaration:
void Tree_init (void **tree);
void *tree_srch (void **tree,Int (*compare) (),void *data);
void Tree_add (void **tree,Int (*compare) (),void *data,void (*del_uar) ());
int Tree_delete (void **tree,Int (*compare) (),void *data,void (*del_uar) ());
int Tree_trav (void **tree,Int (*trav_uar) ());
void Tree_mung (void **tree, void (*del_uar) ());
Many people think that using these functions directly is faster than STL map, because many templates are used in the STL map. In fact, the difference is not in the algorithm, but in the memory fragmentation. If you use these functions directly, you need to go to the new node yourself, when the node is very large, and frequent deletion and insertion, the memory fragmentation will exist, and the STL uses its own allocator to allocate memory, the memory pool to manage the memory, will greatly reduce the memory fragmentation, This will improve the overall performance of the system. The author has done the test in my own system, and replaced all the previous code directly with the ISC function as map, the program speed is basically consistent. When the time has elapsed for a long time (such as a background service program), the advantages of map will be reflected. On the other hand, using map can greatly reduce your coding difficulty while increasing the readability of your program. Why not
However, when using set, set is automatically sorted, that is, whenever you modify the state of the collection, a sort operation is performed, and when the number of elements is large, the addition and deletion of the logarithm complexity can grow quickly. Therefore, if your operation is mostly to find the use of set is better, these are used in need of careful consideration, is a more interesting question, to be researched by the developer.

[ZZ] C + + map and set comparison

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.