Data Structure and underlying implementation of MAP and set in STL

Source: Internet
Author: User

Abstract: This article lists several basic STL map and STL set questions. By answering these questions, we explain the data structure in the STL associated container, finally, we raised the question about the built-in balanced binary tree library functions and map and set selection in Unix/Linux, and analyzed the advantages of MAP and set. For those who want to learn STL in depth and who want to know the underlying data structure of related containers such as STL map, there is a certain reference value.

Vector-a standard and secure array in STL. You can only add data in front of the vector.

Deque (double-ended Queue) -- similar to vector in function, but data can be added to both ends.

List-the cursor can only move one step at a time. If you are familiar with the linked list, the list in STL is a two-way linked list (each node has two pointers pointing to the front and back ).

Set (SET) -- contains sorted data. The values of these data must be unique.

Map (ing)-A sorted set of binary groups. Each element in map is composed of two values. The key (key value, the key value in a map must be unique.) It is used for sorting or search. Its value can be retrieved again in the container. The other value is the value associated with the element. For example, in addition to finding a data in Ar [43] = "overripe", map can also find a data in Ar ["banana"] = "overripe. If you want to obtain the element information, you can simply enter the full name of the element.

Multiset (Multi-replica set) -- similar to set, however, the values do not need to be unique (that is, they can be repeated ).

Multimap (Multi- ing) -- similar to ing (MAP), however, key values do not need to be unique (that is, they can be repeated ).

The use of STL map and set is not complex, but it is difficult to understand, for example:
# Why is the insertion and deletion efficiency of MAP and set more efficient than those of other sequence containers?
# Why does the previously saved iterator not expire after each insert operation?
# Why cannot map and set have a reserve function like a vector to pre-allocate data?
# When the number of data elements increases (10000 to 20000 comparisons), how does the insert and search speeds of MAP and set change?

Some people may be able to answer the rough answer, but to thoroughly understand it, we also need to understand the underlying data structure of STL.

C ++ STL has been widely praised and used by many people. It not only provides convenient containers such as vector, string, and list, more importantly, STL encapsulates many complex data structure algorithms and a large number of common data structure operations. Vector encapsulation array, list encapsulates the linked list, MAP and set encapsulate Binary Trees, etc. When encapsulating these data structures, STL follows the programmer's usage habits, common Operations provided by member functions, such as insert, sort, delete, and search. It makes users feel familiar with STL usage.

In C ++ STL, the standard associated containers set, Multiset, map, and multimap use a very efficient balanced binary tree for retrieval: red/black, it also becomes a red-black tree ). The statistical performance of the RB tree is better than that of the general balanced binary tree (some books refer to them as AVL-trees based on the author's name, Adelson-velskii and Landis ), therefore, STL selects the internal structure of the associated container. This article will not introduce the implementation of AVL and Rb trees in detail and their advantages and disadvantages. For details about the implementation of Rb trees, see red and black trees: Theory and implementation (theoretical ). This article will give you a brief introduction to the underlying data structure of MAP and set based on the answers to the first few questions.

Why is the insert/delete efficiency of MAP and set higher than that of other sequential containers?

Most people say that it is very simple, because for associated containers, memory copying and memory moving are not required. That's true. All elements in the map and set containers are stored as nodes. The node structure is similar to the linked list, pointing to the parent node and the child node. The structure may be as follows:

A
/\
B c
/\/\
D E F G

Therefore, you only need to perform a slight transformation during the insertion, and point the node pointer to the new node. The deletion process is similar. After a slight transformation, it is okay to point the pointer to the deleted node to another node. All the operations here are pointer exchange, which has nothing to do with memory movement.

Why does the previously saved iterator not expire after each insert operation?

After reading the explanation of the above answer, you can easily explain this question. Iterator is equivalent to a pointer to a node. If the memory is not changed, how can the pointer to the memory become invalid? (of course, the deleted element itself is no longer valid ). The pointer may be invalid for each deletion or insertion of a vector, and push_back is called to insert data at the end. To ensure the continuous storage of internal data, the block pointed to by iterator may have been overwritten by other memory or the memory has been released. Even when push_back is used, the internal space of the container may be insufficient and a larger memory is required. Only the previous memory is released and a larger memory is applied, copy the existing data elements to the new memory, and put the elements to be inserted at the end, so the previous memory pointer is naturally unavailable. Especially when using algorithms such as find, remember this principle: Do not use expired iterator.

Why can't map and set have a reserve function like a vector to pre-allocate data?

I used to ask this question. In terms of its principle, the reason is that the stored in map and set is not the element itself, but the node containing the element. That is to say, the alloc used inside the map is not the alloc passed in from the parameter when the Map <key, Data, compare, alloc> is declared. For example:

Map <int, Int, less <int>, alloc <int> intmap;

In this case, the Allocator used in intmap does not use alloc <int>, but uses the converted alloc. The specific conversion method uses alloc <int> :: rebind redefined the new node distributor. For detailed implementation, refer to thoroughly studying Allocator in STL. In fact, you should remember that the Allocator in map and set has changed, and you should not expect the reserve method.

When the number of data elements increases (10000 and 20000 comparisons), how does the insert and search speeds of MAP and set change?

If you know the relationship between log2, you should thoroughly understand this answer. In map and set, binary lookup is used. That is to say, if there are 16 elements, you can find the result by comparing up to 4 times. There are 32 elements, up to 5 times. What about 10000? The maximum number of comparisons is log10000, and the maximum number is 14. What if it is 20000 elements? Up to 15 times. As you can see, when the data size doubles, the number of searches is only one more time, and the search time is 1/14 more. After you understand this truth, you can put elements in it with peace of mind.

Finally, for map and set winter, we need to mention their efficiency comparison with a C language packaging library. On many UNIX and Linux platforms, there is a library named ISC, which provides functions similar to the following declaration:

Void tree_init (void ** tree );
Void * tree_srch (void ** tree, INT (* compare) (), void * data );
Void tree_add (void ** tree, INT (* compare) (), void * data, void (* del_uar )());
Int tree_delete (void ** tree, INT (* compare) (), void * data, void (* del_uar )());
Int tree_trav (void ** tree, INT (* trav_uar )());
Void tree_mung (void ** tree, void (* del_uar )());

Many people think that using these functions directly is faster than STL map, because STL map uses many templates or something. In fact, they do not differ in algorithms, but in memory fragments. If you use these functions directly, you need to create new nodes. When there are many nodes and frequent deletion and insertion, the memory fragments will exist, STL uses its own allocator to allocate memory and manages the memory in a memory pool, which greatly reduces memory fragments and improves the overall performance of the system. Winter has tested it in his own system and replaced all previous Code directly using the ISC function with map. The program speed is basically the same. After running for a long time (such as a background Service Program), the advantages of map are shown. In addition, using map greatly reduces the difficulty of coding and increases the readability of the program. Why not? Learning STL
Map, STL set: Data Structure basis.

Address: http://hi.baidu.com/csu_yx/item/72a2d4f8e758ae7f3d198bf2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.