Selection of MAP and hash_map containers in STL

Source: Internet
Author: User

First, let's take a look at the analysis made by alvin_lee's friends. I think it is quite correct.AlgorithmThe problem between them is explained!

In fact, this problem is not only encountered in C ++, but also in the implementation and selection of standard containers in all other languages. ApplyProgramYou may feel that the impact is not big, but you can write algorithms or the coreCodeBe careful. Today, I improved the code and, by the way, reviewed my basic lessons.

 

Do you still remember herb Sutter's delicious "C ++ Dialogue Series", where "generate real hash objects" tells the choice of map. By the way, let's take a look at my practical understanding.

Select the map container to quickly find related objects from keywords. Compared with the linear table container such as list, one can simplify the search algorithm, and the other can make any keyword index, and match with the target object to optimize the search algorithm. In the STL of C ++, map uses a tree for search. This algorithm is almost the same as the half-lookup efficiency of the List linear container. It is O (log2n ), the list is not as easy to customize and operate as map.

Compared with hash_map, hash_map uses hash tables to sort and match, while hash tables use keywords to calculate table positions. When the table size is appropriate and the calculation algorithm is appropriate, the complexity of the hash table algorithm is O (1), but this is ideal, if the keyword calculation of the hash table conflicts with the table location, the worst complexity is O (n ).

So with this understanding, how should we choose algorithms? View Python two days agoArticleI don't know which kid said that python's map is faster than C ++'s map and how it works. However, he does not know that python uses hash_map by default, and these language features are essentially written in C/C ++. The problem lies in algorithms and means, rather than the advantages and disadvantages of the language itself, you are familiar with various algorithms, details of various languages, and design ideas, it is also possible to make good or bad in this extreme way (one-sided and extreme looking at things only shows ignorance and ignorance, and everything has the value of existence, including technology ). Obviously, the STL of C ++ uses the tree structure by default to implement map.

Tree search is less efficient than hash tables in terms of total search efficiency, but it is stable and its algorithm complexity does not fluctuate. In a search, you can determine that the complexity of the worst case will not exceed O (log2n ). The hash table is different. It is O (1), O (N), or between them, you cannot grasp it. If you are developing an interface for external calls and you have internal keyword searches, but this interface is not called frequently, you want it to be fast but unstable, we still hope that the call time is average and stable. If your program needs to search for a keyword, this operation is very frequent and you want these operations to take a short time in general, the total time of hash table queries will be shorter than that of others, the average operation time is also short. Here we need to weigh.

Here, we will summarize whether to use map or hash_map. The key is to check the number of keyword queries and whether you need to ensure the overall query time or the time of a single query. If you need to perform many operations and require the overall efficiency, use hash_map for a short average processing time. For a few operations, the use of hash_map may lead to uncertain O (N), then the use of a map with a relatively slow average processing time and a constant processing time per time, the overall stability should be higher than the overall efficiency, because the premise is that the number of operations is small. If a few operations using hash_map generate a worst case O (n) in a process, the advantages of hash_map are also exhausted.

Let's take a look at a piece of code, from the Jay Kint of codeproject:

// Familiar month example used
// Mandatory contrived example to show a simple point
// Compiled using mingw GCC 3.2.3 with GCC-c-o file. o
// File. cpp

# Include <string>
# Include <EXT/hash_map>
# Include <iostream>

Using namespace STD;
// Some STL implementations do not put hash_map In std
Using namespace _ gnu_cxx;

Hash_map <const char *, int> days_in_month;

Class myclass {
Static int totaldaysinyear;
Public:
Void add_days (INT days) {totaldaysinyear + = days ;}
Static void printtotaldaysinyear (void)
{
Cout <"Total days in a year are"
<Totaldaysinyear <Endl;
}
};

Int myclass: totaldaysinyear = 0;

Int main (void)
{
Days_in_month ["January"] = 31;
Days_in_month ["February"] = 28;
Days_in_month ["March"] = 31;
Days_in_month ["April"] = 30;
Days_in_month ["may"] = 31;
Days_in_month ["June"] = 30;
Days_in_month ["July"] = 31;
Days_in_month ["August"] = 31;
Days_in_month ["September"] = 30;
Days_in_month ["October"] = 31;
Days_in_month ["November"] = 30;
Days_in_month ["December"] = 31;

// Error: This line doesn' t compile.
Accumulate (days_in_month.begin (), days_in_month.end (),
Mem_fun (& myclass: add_days ));

Myclass: printtotaldaysinyear ();

Return 0;
}

 

Of course, the above Code can be fully implemented using STL:

Reference

Standard C ++ Solutions
The standard C ++ library defines certain function adaptors, select1st, select2nd and compose1, that can be used to call a Single Parameter Function with either the key or the data element of a pair associative container.

Select1st and select2nd do pretty much what their respective names say they do. They return either the first or second parameter from a pair.

Compose1 allows the use of functional composition, such that the return value of one function can be used as the argument to another. compose1 (f, g) is the same as F (g (x )).

Using these function adaptors, we can use for_each to call our function.

Hash_map my_map;
For_each (my_map.begin (), my_map.end (),
Compose1 (mem_fun (& mytype: do_something ),
Select2nd mytype >:: value_type> ()));
Certainly, this is much better than having to define helper functions for each pair, but it still seems a bit cumbersome, especially when compared with the clarity that a comparable for loop has.

For (hash_map: iterator I =
My_map.begin ();
I! = My_map.end (), ++ I ){

I-> second. do_something ();
}
Considering it was avoiding the for loop for clarity's sake that was red the use of the STL algorithms in the first place, it doesn't help the case of algorithms. hand Written loops that the for loop is more clear and concise.

With_data and with_key
With_data and with_key are function adaptors that strive for clarity while allowing the easy use of the STL algorithms with pair Associative containers. they have been parameterized much the same way mem_fun has been. this is not exactly rocket science, but it is quickly easy to see that they are much cleaner than the standard function adaptor expansion using compose1 and select2nd.

Using with_data and with_key, any function can be called and will use the data_type or key_type as the function's argument respectively. this allows hash_map, map, and any other pair Associative containers in the STL to be used easily with the standard algorithms. it is even possible to use it with other function adaptors, such as mem_fun.

Hash_map my_vert_buffers;

Void releasebuffers (void)
{
// Release the vertex buffers created so far.
STD: for_each (my_vert_buffers.begin (),
My_vert_buffers.end (),
With_data (boost: mem_fn (
& Idirect3dvertexbuffer9: Release )));
}
Here boost: mem_fn is used instead of mem_fun since it recognizes the _ stdcall methods used by COM, if the boost_mem_fn_enable_stdcall macro is defined.

 

In addition, some practical examples are added:
Connection:
Http://blog.sina.com.cn/u/4755b4ee010004hm

The excerpt is as follows:

Reference

STL maps were used all the time until the data volume in the library increased sharply recently. When I heard from other retrieval personnel about hash_map, I always wanted to change it back, today, I made a good experiment to test the functions of hash_map and its performance compared with map.
First of all, the two data structures provide the key-value storage and search functions. however, the implementation is different. Map uses a red-black tree and the query time complexity is log (N), while hash_map uses a hash table. in theory, the query time complexity can be a constant, but the memory consumption is large. It is a storage-for-time method.
For applications, map is already a standard library of STL, but hash_map has not yet entered the standard library, but it is also a very common and important library.
In this test, we performed de-duplication for the million and file lists, that is, map the file name string!
Header files used:

 

# Include <time. h> // time performance
# Include <EXT/hash_map> // header file containing hash_map
# Include <map> // STL Map
Using namespace STD; // STD namespace
Using namespace _ gnu_cxx; // while hash_map Is In The namespace of _ gnu_cxx

// Test the three steps: Map efficiency, hash_map system Hash Function efficiency and self-writing Hash Function efficiency.

11 struct str_hash {// self-written hash function
12 size_t operator () (const string & Str) const
13 {
14 unsigned long _ H = 0;
15 for (size_t I = 0; I 16 {
17 _ H = 107 * _ H + STR [I];
18 }< br> 19 Return size_t (_ H);
20 }< br> 21 };

23 // struct str_hash {// string Hash Function
24 // size_t operator () (const string & Str) const
25 //{
26 // return _ stl_hash_string (Str. c_str ());
27 //}
28 //};

30 struct str_equal {// string judge equal Function
31 bool operator () (const string & S1, const string & S2) const
32 {
33 return S1 = S2;
34}
35 };

// When used
37 int main (void)
38 {
39 vector <string> filtered_list;
40 hash_map <string, Int, str_hash, str_equal> file_map;
41 Map <string, int> file2_map;
42 ifstream in ("/dev/SHM/List ");
43 time_t now1 = Time (null );
44 struct TM * curtime;
45 curtime = localtime (& now1 );
46 cout <now1 <Endl;
47 char ctemp [20];
48 strftime (ctemp, 20, "% Y-% m-% d % H: % m: % s", curtime );
49 cout <ctemp <Endl;
50 string temp;
51 int I = 0;
52 If (! In)
53 {
54 cout <"Open failed !~ "<Endl;
55}
56 while (in> temp)
57 {
58 string sub = temp. substr (0, 65 );
59 If (file_map.find (sub) = file_map.end ())
60 // If (file2_map.find (sub) = file2_map.end ())
61 {
62 file_map [sub] = I;
63 // file2_map [sub] = I;
64 filtered_list.push_back (temp );
65 I ++;
66 // cout <sub <Endl;
67}
68}
69 in. Close ();
70 cout <"the total unique file number is:" <I <Endl;
71 ofstream out ("./file_list ");
72 If (! Out)
73 {
74 cout <"failed open" <Endl;
75}
76 for (Int J = 0; j <filtered_list.size (); j ++)
77 {
78 out <filtered_list [J] <Endl;
79}
80 time_t now2 = Time (null );
81 cout <now2 <Endl;
82 curtime = localtime (& now2 );
83 strftime (ctemp, 20, "% Y-% m-% d % H: % m: % s", curtime );
84 cout <now2-now1 <"\ t" <ctemp <Endl;
85 return 0;
86}

 

Reference

The conclusion is: (There are 106 million file lists and 51 million after deduplication)
1. It takes 34 seconds for the map to complete deduplication.
2. hash_map uses the built-in functions of the system, which takes 22 seconds.
3. hash_map uses a self-written function, which takes 14 seconds.
The test results fully demonstrate the advantages of hash_map over map. In addition, different hash functions have different performance improvements. The above hash function is an empirical function obtained after testing n multiple data.
It is foreseeable that the larger the order of magnitude, the more obvious the advantages of hash_map !~

 

Of course, the author's conclusion is wrong. The principle of hash_map is wrong! The answer from the first friend can be used to answer this question!

Finally, for C ++ Builder users, add the following method:
# Include "stlport \ hash_map"
To use hash_map correctly.

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/skyremember/archive/2008/09/18/2941076.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.