[】] Introduction and use of hash_map

Source: Internet
Author: User
From: http://blog.csdn.net/holybin/article/details/26050897 0 Overview

Although hash_map and map are both part of STL, the current C ++ standard (C ++ 11) only contains map, but does not have hash_map, it can be said that STL is only partially included in the current C ++ standard. Mainstream gnu c ++ and msvc ++ have implemented hash_map for compiler extension purposes. SGI has implemented hash_map and boost has implemented unordered_map similar to hash_map, google has two implementations: dense hash_map and sparse hash_map (the former is time-intensive and the latter is space-intensive ). Listed as follows:

(1) hash_map of SGI: http://www.sgi.com/tech/stl/hash_map.html. In this articleTake SGI hash_map as an Example.

(2) hash_map of gnu c ++:

Https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.1/class____gnu__cxx_1_1hash__map.html

(3) hash_map of msvc ++: msdn.microsoft.com/en-us/library/bb398039.aspx (this is what comes with)

(4) boost unordered_map (header file: <boost/unordered_map.hpp> ):

Http://www.boost.org/doc/libs/1_55_0/doc/html/boost/unordered_map.html

(5) Google sparsehash and densehash: http://code.google.com/p/sparsehash/source/browse/trunk/src/google? R = 98 1 Comparison with map Definition

Similarities: both are associated containers in STL and both have the following two properties:

A. Key value form, that is, the element type is in the key value form: pair <const key, data>;

B. Key uniqueness, that is, no two elements have the same key: Key.

Differences:

Hash_map is a hashed associative container that associates the value data and key of an object ), map is a Sort Association container that associates the value data of an object with the key ). The so-called hash associated container is an associated container implemented by hash table, which is different from the general sorting associated container: the elements of the hash associated container do not have a specific order, most operationsThe worst time complexity is O (n), and the average time complexity is constant. Therefore, in applications that do not require sorting but only access, the efficiency of the hash associated container is much higher than that of the sort associated container.


2. Comparison with map implementation

The underlying layer of map isRed/black treeThe time complexity of the operation isO (log (N) levelThe underlying layer of hash_map isHash tableThe time complexity of the operation isConstant level. 3. Comparison with map applications

When the number of elements reaches a certain order of magnitude, if efficiency is required first, hash_map is used. Note: although the hash_map operation speed is faster than that of map, the hash function and Conflict Resolution require additional execution time, And the hash_map construction speed is slower than that of map. Secondly, because hash_map is based on hash table, it is obviously a space for time change, so hash_map consumes more memory than map. Therefore, three factors must be weighed: speed, data volume, and memory. 4. Introduce hash_map (1) Principles of hash_map

Hash_map uses hash table. First, it allocates memory to form many buckets for storing elements. Then, it uses the hash function to map the key of the elements and store them in the corresponding bucket. Here, the hash function is used for addressing, and the additional comparison function is used for resolving conflicts. This process can be described as follows:

A. Calculate the key of an element.

B. Map the key using the hash function (usually modulo) to obtain the hash value, that is, the corresponding bucket index.

C. Store the key and data of the element in the bucket.

The corresponding Query Process is:

A. Calculate the key of an element.

B. Map the key using the hash function (usually modulo) to obtain the hash value, that is, the corresponding bucket index.

C. Compare the key of the element in the bucket to be equal to the key. If the key is not equal, it is not found.

D. If they are equal, the data of the element is taken out.

So the two most important things to implement hash_map are the hash function and the comparison function. The following example describes SGI hash_map. (2) hash_map class definition

During map construction, only comparison functions (less than functions) are required. During hash_map construction, hash and comparison functions (equal to functions) must be defined ). In SGI, hash_map is defined in stl_hash_map.h and defined:

// Forward declaration of equality operator; needed for friend declaration.template <class _Key, class _Tp,  class _HashFcn  __STL_DEPENDENT_DEFAULT_TMPL(hash<_Key>),  class _EqualKey __STL_DEPENDENT_DEFAULT_TMPL(equal_to<_Key>),  class _Alloc =  __STL_DEFAULT_ALLOCATOR(_Tp) >  class hash_map;  ......  template <class _Key, class _Tp, class _HashFcn, class _EqualKey,  class _Alloc>  class hash_map  {  ......  }

Parameters 1 and 2 are keys and values respectively. Parameters 3 and 4 are hash functions and comparison functions. In fact, STL uses struct to encapsulate these two functions, you can customize these two struct or use the default value provided. Parameter 5 is allocator of hash_map for internal memory management.

The following three cases describe the use of these two functions:Default hash and comparison functions, custom hash functions, and custom comparison functions.(3) default hash and comparison functions

// SGI hash_map definition#include "hash_map.h"int main()  {  //use class as Compare    hash_map<const char*, int> months;      months["january"] = 31;      months["february"] = 28;      months["march"] = 31;      months["april"] = 30;      months["may"] = 31;      months["june"] = 30;      months["july"] = 31;      months["august"] = 31;      months["september"] = 30;      months["october"] = 31;      months["november"] = 30;      months["december"] = 31;  return 0;  }

From the above definition of hash_map, we can see that the default hash function (hash <_ key>) and comparison function (aggregate _to <_ key>) are used here. For this example:

hash_map<const char*, int> months;

It is equivalent

hash_map<const char*, int, hash<const char*>, equal_to< const char* >> months;
(4) custom hash functions

First, the STL of SGI provides these default hash functions, which are defined in stl_hash_fun.h:

// Default hash function struct hash <char *> struct hash <const char *> struct hash <char> struct hash <unsigned char> struct hash <signed Char> struct hash <short> struct hash <unsigned short> struct hash <int> struct hash <unsigned int> struct hash <long> struct hash <unsigned long>

Secondly, when a custom hash function is used, a struct is defined with an arbitrary name. Operator () is reloaded in the struct and the parameter is an object reference of the custom key type. When defining hash_map, pass the struct to the third parameter. Assume that the custom key type is keyclass, as shown below:

# Include "hash_map.h" struct my_hash {size_t operator () (const keyclass & X) const {......}}; // hash_map defines hash_map <keyclass ,..., my_hash,...> my_hash_map;

Based on the default example above, you can customize a string-type hash function:

# Include "hash_map.h" // directly call the system-defined string hash function "_ stl_hash_string": struct str_hash {size_t operator () (const string & Str) const {return _ stl_hash_string (Str. c_str () ;}}; // or write struct str_hash {size_t operator () (const string & Str) const {unsigned long _ H = 0; for (size_t I = 0; I <Str. size (); I ++) _ H = 5 * _ H + STR [I]; return size_t (_ H );}}; // In the above default example, months can be changed to hash_map <string, Int, str_hash> my_months;
(5) custom comparison functions

First, the STL of SGI provides the default comparison function, which is defined in stl_function.h:

// Default comparison function template <class _ TP> struct functions _to: Public binary_function <_ TP, _ TP, bool> {bool operator () (const _ TP & _ x, const _ TP & _ y) const {return _ x = _ y ;}}; // binary_function function declaration template <class _ arg1, class _ arg2, class _ result> struct binary_function {typedef _ arg1 first_argument_type; typedef _ arg2 second_argument_type; typedef _ result result_type ;};

Second, there are two methods for customizing comparison functions: Reload operator = Operator to achieve equal comparison of elements (key-value pairs); custom comparison function struct to reload operator () (similar to the custom hash function ).

The first method assumes that the elements to be compared have two fields: IID and Len:

struct my_element   {   int iID;   int len;   bool operator==(const my_element& e) const     {   return (iID==e.iID) && (len==e.len) ;       }   };

Method 2: The parameter is an object reference of two custom key types. In the function, compare the two objects. When defining hash_map, pass the struct of the comparison function to the fourth parameter. Assume that the custom key type is keyclass, the custom hash function struct is my_hash, And the custom comparison function struct is my_compare, as shown below:

# Include "hash_map.h" struct my _ compare {bool operator () (const keyclass & X, const keyclass & Y) const {// compare x and y }}; // hash_map defines hash_map <keyclass ,..., my_hash, my_compare> my_hash_map;

Similarly, based on the default example above, use the second method to customize the comparison function:

# Include "hash_map.h" // 1. use const char * as the key type // Note: You can directly use the default hash <const char *> function // define the comparison function of const char * struct str_compare {bool operator () (const char * S1, const char * S2) const {return strcmp (S1, S2) = 0 ;}; // The months in the above default example can be changed: hash_map <const char *, Int, hash <const char *>, str_compare> my_months; //////////////////////////////////////// /// // 2. use string as the key type // note: the string hash function struct str_hash {size_t operator () (const string & Str) const {return _ stl_hash_string (Str. c_str () ;}}; // defines the string comparison function struct str_compare {bool operator () (const string & S1, const string & S2) const {return strcmp (s1.c _ STR (), s2.c _ STR () = 0 ;}; // In the above default example, months can be changed to hash_map <string, Int, str_hash, str_compare> my_months;
(6) Other hash_map member functions

Hash_map functions are similar to map functions. For more information about the parameters and explanations of functions, see hash_map in SGI. Here we mainly introduce several common functions:

Hash_map (size_type N): If efficiency is required, this parameter must be set. N is mainly used to set the number of hash buckets in the hash_map container. The larger the number of buckets, the smaller the probability of hash function conflict, and the smaller the probability of re-applying for memory. The greater N, the higher the efficiency, but the larger the memory consumption.

Const_iterator find (const key_type & K) const: use search. The input is the key value and the return is the iterator.

Data_type & operator [] (constkey_type & K): Random Access to elements like arrays. Note: When the [Key] operator is used, if there is no key element in the container, this is equivalent to automatically adding a key element (equivalent to the insert operation ). If you only want to know whether the container has a key element, you can use the find function.

Insert function: when the container does not contain the key value, the insert function is similar to the [] operator. However, when there are more and more elements in the container, the elements in each bucket will increase. To ensure efficiency, hash_map will automatically apply for a larger memory to generate more buckets. Therefore, after insert, the previous iterator may be unavailable.

Erase function: During the Insert Process, when there are too many elements in each bucket, hash_map may automatically expand the container memory, but erase does not automatically recycle the memory in sgi stl. Therefore, after you call erase, The iterator of other elements is still available.

Finally, we will ignore the specific usage of hash_map. For details, refer to the STL hash_map series [blog.163.com/liuruigong_lrg/blog/static/27370306200711334341781]. 5. Other hash containers

For example, hash_set, hash_multimap, and hash_multiset. The differences between these containers and set, multimap, and Multiset are the same as those between hash_map and map. 6 hash_map performance test

[Reference: Performance Test Summary of various types of C ++ hashmap _ Yu Zhiyu _ Baidu space]

The implementation of different hash_map operations is mentioned at the beginning of this article. Here we test the performance of each hash_map operation for inserting data and searching data: Set the key and value of hash_map to use Int, that is, the form of Map <int, int>. After comparison, 10 million inserts and 10 thousand searches are performed. The performance of various implementations is as follows:

(The Red Circle takes the time to insert 10 million records, and the Green Circle takes the time to search for 10 thousand records)

As shown in, the time required for insertion is roughly 1 ~ Query takes less time between 4 seconds. Compared to the performance of various implementations, boost: unordered_map can provide better comprehensive performance. Google: dense_hash_map has excellent memory usage and extremely fast search speed. It is slower than boost: unordered_map. The implementation of STD: map and STD: hash_map that comes with Visual Studio 2010 can only be described as miserable.

[Remarks]

Test environment:

CPU: Duo t6600

Memory: 4 GB

Software Version:

Visual Studio 2010

Boost 1.48.0

Google sparsehhash 1.11

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.