C + + Note--stl's Hash_map

Source: Internet
Author: User
Tags map data structure

The following contents are reproduced: HTTP://STLCHINA.HUHOO.NET/BIN/VIEW.PL/MAIN/STLDETAILHASHMAP

Detailed explanation of STL Hash_map series
    • Detailed explanation of STL Hash_map series
      • 0 Why do I need hash_map
      • 1 data structure: hash_map principle
      • 2 Hash_map Use
        • 2.1 A simple example
        • The hash function of 2.2 hash_map
        • Comparison function of 2.3 hash_map
        • 2.4 Hash_map function
      • 3 Related hash containers
      • 4 other
        • What is the difference between 4.1 hash_map and map?
        • 4.2 When do I need to use hash_map and when do I need a map?
        • 4.3 How do I add a defined type to a hash_map?
        • 4.4 How to replace the existing map container in the program with Hash_map?
        • 4.5 Why is hash_map not a standard?
        • 4.6 Do you have any suggestions for learning to use Hash_map?
      • 5 Reference articles:

All roads lead to Rome, why don't you pick one?

0 Why do I need hash_map

Did you use map? Map provides a very common feature, which is to provide key-value storage and lookup capabilities. For example, I want to record a person name and the corresponding storage, and add at any time, to quickly find and modify:

Yeu Bu-The head of Huashan faction, called the Gentleman Sword Zhang Sanfeng-Wudang head, Taijiquan founder Oriental undefeated-first master, sunflower treasure ...

This information is not complicated if it is saved, but it is more troublesome to find. For example, I want to find "Zhang San Fung" information, the most stupid way is to get all the records, and then by the name of a comparison. If you want to be fast, you need to put these records in alphabetical order, and then follow the binary method to find them. But when you add records, you need to keep the records in order, so you need to insert a sort. With efficiency in mind, this requires a two-fork tree. It will be endless, if you use the STL map container, you can easily implement this function without worrying about its details. About map data structure details, interested friends can see Learning STL map, STL set data structure of the foundation. Look at the map implementation:

#include <map>#include <string>usingnamespace std;...map< string, string> Namemap; //Increase ... namemap[" Yeu Bu "]= " Huashan faction head, call the Gentleman Sword ", namemap[" Zhang Sanfeng "]= " Wudang Head, Taijiquan founder "; namemap[" Oriental undefeated "]=" first master, Sunflower Treasure book "; //Find:  If(Namemap.find (" Yeu Bu ")! = Namemap.end ()) {        ...}

Don't you think it's easy to use? And the efficiency is very high, 1 million records, at most also as long as 20 times the String.Compare comparison, can find you to find the record, 2 million records the matter, also only uses 21 times comparison.

Speed will never meet the needs of reality. If there are 1 million records, I need to search frequently, 20 comparisons will also become a bottleneck, if it can be reduced to one or two comparisons is possible? and is it possible to compare one or two times when the number of records is 2 million? And it needs to be as easy to use as map.

The answer is yes. Then you need to has_map. Although Hash_map is not currently included in the C + + Standard Template Library, almost every version of STL provides a corresponding implementation. And the application is very extensive. Before formally using Hash_map, take a look at the hash_map principle.

1 data structure: hash_map principle

This is a section to let you in-depth understanding of Hash_map's introduction, if you just want to swallowed, do not want to understand the principle, you can skip this section, but I still suggest you look, more understanding of some no harm.

The Hash_map is based on a hash table (hash table). The biggest advantage of the hash table is that the time to store and find data is greatly reduced, almost as a constant time, but at the cost of more memory than it consumes. However, with more and more memory available now, it is worthwhile to use space for time-swapping. In addition, the coding is easy is also one of its characteristics.

The rationale is to use an array with a larger subscript range to store the elements. You can design a function (a hash function, also called a hash function) so that each element's keyword corresponds to a function value (that is, the array subscript, hash value), and then use the array unit to store the element, or it can be simply understood, by the keyword for each element "classification", This element is then stored in the corresponding "class" corresponding to the place, called the bucket.

However, it is not possible to guarantee that each element's keyword corresponds to a function value of one by one, so it is very likely that there are different elements, but the same function values are computed, thus creating a "conflict", in other words, by splitting different elements into the same "class". In general, "direct addressing" and "conflict resolution" are two major features of the hash table.

Hash_map, first allocating a large amount of memory, forming many barrels. is to use the hash function, the key is mapped to different regions (buckets) to save. The insertion process is:

    1. Get key
    2. Hash value obtained by hash function
    3. Get the bucket number (usually hash value for the number of barrels to die)
    4. Store key and value inside the bucket.

The procedure for taking the value is:

    1. Get key
    2. Hash value obtained by hash function
    3. Get the bucket number (usually hash value for the number of barrels to die)
    4. Compares whether the inner elements of the bucket are equal to the key, and if they are not equal, they are not found.
    5. Remove the value of the equivalent record.

The direct address in Hash_map is generated by the hash function, resolving the conflict and solving it with the comparison function. As can be seen here, if there is only one element inside each bucket, then there is only one comparison when searching. Many queries are faster when there are no values in many buckets (when they are not found).

Thus, to implement a hash table, and the user is related to: hash function and comparison function. These two parameters are exactly the parameters we need to specify when using Hash_map.

2 Hash_map using 2.1 a simple instance

Don't worry about how to put "Yue-qun" with Hash_map, we first look at a simple example: randomly give you an ID number and ID number corresponding to the information, the ID number of the range is 31 times. How to quickly save a lookup.

#include #include <string>usingnamespace std; int Main () {        hash_map<int, string> mymap;        mymap[9527]= " Tong Pak Fu Point Autumn Incense ";        Mymap[1000000]= "The Life of Millionaires ";        mymap[10000]= " white-collar salary bottom line ";        ...        if (Mymap.find (10000)! = Mymap.end ()) {                ...        }

Simple enough to use the same method as the map. Then you might ask? What about hash functions and comparison functions? Don't you want to specify? You're right, but when you don't specify a hash function and a comparison function, you'll have a default function that looks at the Hash_map declaration, and you'll know better. The following is an SGI STL declaration:

Template <class class _HASHFCN = Hash<_key>,class _ Equalkey = Equal_to<_key>,class _alloc = __stl_default_allocator (_TP) >class hash_map{        ...}

That is, in the example above, there is the following equivalent relationship:

...hash_map<int, string> mymap; //equal to:hash_map<int, string, hash<int, equal_to<int> > Mymap

Alloc we should not take too much attention (want to learn more about allocator's friends can see what the standard library stl:allocator can do)

The hash function of 2.2 hash_map

What exactly is hash< int> like? Look at the source code:

struct hash<int> {        operator() (intconstreturn __x;}};

It turns out to be a function object. In SGI STL, the following hash functions are available:

structhash<Char*>structhash<Const Char*>structhash<Char>structhash<unsigned Char>structhash<signed Char>structhash< Short>structhash<unsigned  Short>structhash<int>structhash<unsigned int>structhash<Long>structhash<unsigned Long>

In other words, if your key is using one of the above types, you can use the default hash function. Of course you can also define your own hash function. For custom variables, you can only do so, for example, for string, you must customize the hash function. For example:

 struct  str_hash{size_t operator  () (const  string& str)        const                 {unsigned  long  __h = 0; for                 (size_t i = 0; i < str.size (); i + +) __h = 5*__h + str[i];        return  size_t (__h); }}; //if you want to take advantage of the system-defined string hash function, you can write this:  struct  str_hash{size_t operator  () ( Span style= "color: #a52a2a;" >const  string& str) const  {return  __stl_hash_string (Str.c_str ()); }};

Be aware of the following points when declaring your own hash function:

    1. Use a struct, and then overload operator ().
    2. Return is size_t
    3. The parameter is the type of key you want to hash.
    4. The function is of type Const.

If these are more difficult to remember, the simplest way is to Tiger, find a function to change it.

It is now possible to hash the "Yue Qun" at the beginning. Replace directly with the following declaration:

//change to:hash_map<string, String, str_hash> namemap;

No other usage is necessary. Of course, don't forget it. Str_hash declaration and header file changed to Hash_map.

You might ask: What about the comparison function? Don't worry, the comparison function in Hash_map is introduced here.

Comparison function of 2.3 hash_map

In the comparison function in map, the less function needs to be provided. If not provided, the default is less< key>. In Hash_map, you want to compare whether the data in the bucket is equal to the Key, so you need to be equal to the function:equal_to< key>. First look at the source code of EQUAL_TO:

//This code can be from SGI STL///First look at the Binary_function function declaration, but just define some types. Template<class_ARG1,class_ARG2,class_result>structbinary_function {typedef_arg1 First_argument_type;typedef_ARG2 Second_argument_type;typedef_result result_type;};//Look at the definition of equal_to:Template<class_tp>structEqual_to: PublicBINARY_FUNCTION&LT;_TP,_TP,BOOL>{BOOL operator()(Const_tp& __x,Const_tp& __y)Const{return__x = = __y; }};

If you use a custom data type, such as a struct mystruct, or a const char* string, how do I use the comparison function? There are two ways to use a comparison function. The first is: overloading = = operator, using Equal_to; see the following example:

struct mystruct{        int IID;        int  Len;        BOOL operator= = (constconst{                return (iid==my.iid) && (Len==my.len) ;        }};  

In this way, you can use equal_to< mystruct> as the comparison function. Another way is to use a function object. To customize a comparison function body:

struct compare_str{        booloperator() (constcharconst  Charconst{                return strcmp (P1,P2) ==0;        }};  

With the COMPARE_STR, you can use the hash_map.

typedef hash_map<constchar*, string, hash<constchar*> Compare_str> Strintmap; Strintmap namemap;namemap[" Yeu Bu "]= " Huashan faction head, call the gentleman Sword "; namemap[  "Zhang Sanfeng"]= " Wudang Head, Taijiquan founder "namemap["]=  "The first master, Sunflower Treasure book";
2.4 Hash_map function

The function of Hash_map is similar to the function of map. For specific function parameters and explanations, see: STL Programming Manual: Hash_map, here are a few of the most commonly used functions.

    1. Hash_map (size_type N) If you are efficient, this parameter must be set. n is used primarily to set the number of hash buckets in a hash_map container. The more buckets, the smaller the probability that the hash function will conflict, the less the probability of re-applying for memory. The greater the N, the higher the efficiency, but the greater the memory consumption.
    2. Const_iterator Find (const key_type& k) Const. With lookup, enter as a key value and return as an iterator.
    3. data_type& operator[] (const key_type& k). This is one of my most used functions. Because it is particularly handy, it can be used like an array. However, it is important to note that when you use the [key] operator, if there is no key element in the container, it is equivalent to automatically adding a key element. So when you just want to know if there's a key element in the container, you can use Find. If you want to insert the element, you can use the [] operator directly.
    4. Insert function. The Insert function is similar to the function of the [] operator when the container does not contain a key value. But as the elements in the container grow more and more, the elements in each bucket increase, and to ensure efficiency, Hash_map automatically requests larger memory to generate more buckets. As a result, the previous iterator may not be available after the insert.
    5. The erase function. In the insert process, Hash_map may automatically expand the container's memory when the elements of each bucket are too long. In SGI STL, however, erase does not automatically reclaim memory. So after you call erase, the iterator of the other elements is still available.

3 Related hash containers

Hash container In addition to Hash_map, there are Hash_set, Hash_multimap, has_multiset, these containers use and set, Multimap, multiset the difference between hash_map and map is the same, I don't think I need one by one to elaborate.

4 other

Here are a few common questions that should help you understand and use Hash_map.

What is the difference between 4.1 hash_map and map?
    • constructor function. Hash_map requires a hash function, equals a function, and a map requires only a comparison function (less than a function).
    • Storage structure. Hash_map use hash table storage, map is generally used red-black tree (RB tree) to achieve. So its memory data structure is not the same.
4.2 When do I need to use hash_map and when do I need a map?

Overall, the Hash_map lookup speed is faster than map, and the lookup speed is basic and the data data size is the constant level, and the map lookup speed is the log (n) level. Not necessarily the constant is smaller than log (n), hash and hash function time-consuming, understand, if you consider efficiency, especially when the element reaches a certain order of magnitude, consider hash_map. But if you are particularly strict about memory usage, and you want the program to consume as little memory as possible, be careful, hash_map may embarrass you, especially if your hash_map object is so special, you're more out of control, and Hash_map's construction is slow.

Do you know how to choose now? Weigh three factors: find speed, amount of data, memory usage.

Here's a little story about hash_map and map, see: http://dev.csdn.net/Develop/article/14/14019.shtm

4.3 How do I add a defined type to a hash_map?

You just have to do two things, define the hash function, and define the equals comparison function. The following code is an example:

-bash-2.05b$ Cat My.cpp#include#include<string>#include<iostream>using namespaceStd//define the classclassclassa{ Public: ClassA (intA): C_a (a) {}intGetValue ()Const{returnC_a;}voidSetValueinta) {c_a;}Private:intC_a;};//1 Define the hash functionstructhash_a{size_toperator()(Const classClassA & A)Const{//Return hash<int> (Classa.getvalue ());                returnA.getvalue (); }};//2 Define the equal functionstructequal_a{BOOL operator()(Const classClassA & A1,Const classClassA & A2)Const{returnA1.getvalue () = = A2.getvalue (); }};intMain () {Hash_map<classa, String, hash_a, equal_a> hmap;        ClassA A1 (12); Hmap[a1]= "I am";        ClassA A2 (198877); Hmap[a2]= "I am 198877";        cout<return0;} -bash-2.05b$ make myc++-o-pipe-march=pentiumpro my.cpp-o my-bash-2.05b$./myi am 12I am 198877
4.4 How to replace the existing map container in the program with Hash_map?

This is easy, but requires you to have a good programming style. It is recommended that you use TypeDef as much as possible to define your type:

typedef Map<key, value> KeyMap;

When you want to use Hash_map to replace, you only need to modify:

typedef Hash_map<key, value> KeyMap;

The rest is basically the same. Of course, you need to be aware of the key types of hash and comparison functions.

4.5 Why is hash_map not a standard?

Specifically why not the standard, I do not know, there is an explanation that when STL joined the standard C + +, the Hash_map series was not fully implemented, and should become the standard. If anyone knows a more reasonable explanation, he also wants to tell me. But I want to express that it is because hash_map is not standard, so many platforms installed g++ compiler, not necessarily have hash_map implementation. I have encountered such an example. So when using these non-standard libraries, be sure to test them in advance. Also, if you consider a platform transplant, it is better to use less.

4.6 Do you have any suggestions for learning to use Hash_map?

Hash Chinese is hashed, also become hash, hear others say hash container don't blame yourself ignorant. Learn about the hash series, you can also look at this article: effective STL 25: familiar with non-standard hash containers, it is also recommended to view the source code. If there is a problem, then you can ask questions on the STL forum, there will be experts to answer you.

5 Reference articles:
    1. Detailed explanation of STL sorting (sort)
    2. Detailed explanation of STL string

C + + Note--stl's Hash_map

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.