C ++'s understanding of hash_map custom hash functions and comparison functions

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First of all, I'm a cainiao, a real cainiao, not a modest one. So there are errors in many places that need to be pointed out. I just want to record and deepen my understanding by the way, not to show off anything.

I learned how to use hash_map over the past two days and searched the internet. I didn't find any details about the working principle of hash_map (maybe my search method is incorrect ), then, copy the code of another user, modify it, and use it. Because it is copying other people's code, there will be more of these lessons ..

The source code for the experiment is as follows:

# Include "stdafx. H" # include <iostream> # include 
Using STD: vector; using stdext: hash_map;
Class hash_wchar_t {public: // I don't quite understand what the static const size_t bucket_size = 4 is; // guess that this variable is the size of the initialization hash_map static const size_t min_buckets = 8; // guess that this variable is the size of each capacity expansion. // The above guess is based on the vector, in fact, I have basically never used STL, but I have seen it on C ++ primer, which is very rough.
Size_t operator () (const wchar_t & gbword) const {return gbword % 100; // the hash function algorithm below is suitable for Chinese characters I search online. // The specific suitability is not suitable for me and I don't know, here the test can use a simple // return (unsigned char) GBword-176) * 94 + (unsigned char) (gbword> 8)-161 ;}
Bool operator () (const wchar_t & S1, const wchar_t & S2) const {// when copying others' code, because the key type is a char string, so it is written in this way // return 0 = strcmp (S1, S2); // for my own key type, after modifying the parameter format, this is the key to the problem !! // Write it in this form. The problem persists when you can find it in the following test. // The cause is that the P Pointer Points to an uninitialized memory area, so data cannot be retrieved // The specific principle is explained after the code return S1 = S2;
// The final correct usage // return S1 <S2; // or return S2> S1 ;}};
Int main () {hash_map <const wchar_t, vector <uint> *, hash_wchar_t> lonamemap; vector <uint> * lpovecuint = NULL; lpovecuint = new vector <uint>; lpovecuint-> push_back (2 );
Lonamemap [L 'C'] = lpovecuint; lonamemap [L 'a'] = lpovecuint; lonamemap [L 'B'] = lpovecuint;
Vector <uint> * P = lonamemap [L 'a']; // test whether STD: cout can be found <p-> size () <STD: Endl; return 1;} int main () {hash_map <const wchar_t, vector <uint> *> lonamemap; vector <uint> * lpovecuint = NULL; lpovecuint = new vector <uint>; lpovecuint-> push_back (2 );
Lonamemap [L 'C'] = lpovecuint; lonamemap [L 'a'] = lpovecuint; lonamemap [L 'B'] = lpovecuint;
Vector <uint> * P = lonamemap [L 'a']; // test whether STD: cout can be found <p-> size () <STD: Endl; return 1 ;}

Many comments in the Code are added to add instructions and try to demonstrate the debugging process as much as possible (this is what I understand with the help of a big brother ). If you do not understand it, you can read the following sections of the Code after reading my understanding of the custom hash and comparison functions: (here, we will explain the above hash_map, you do not need to use custom hash functions or comparison functions)

First, the User-Defined hash function usually uses key as the parameter and uses the hash function to calculate a series of key values, then, hash_map stores the corresponding key_value pair in the corresponding bucket based on the hash value. That is to say, the hash value is the index of the bucket. This is why the hash function can make the search efficiency a constant (note that it is a constant level, but not necessarily 1, but it is very close to 1 ).

Then the key point is the custom comparison function (that is, the function that is passed in two parameters when the above overload (). The comparison function here does not mean that each insert is performed, or when the [] operator is used, this function will be called to compare with existing elements (because, if yes, any operations involving searching and inserting are involved, I used to think that this function is used to determine whether two key_value pairs are equal.) instead, when searching and inserting operations are performed, if the new key_value is written as newvalue)
It is the same as the hash value of an existing key_value (which is written as oldvalue later) (this is a hash conflict). According to the hash function described above, these two elements are placed in the same bucket. That is to say, when newvalue and oldvalue have the same hash value, hash_map needs to judge internally, and the newvalue should be taken away or inside the bucket, sort keys in ascending order or in ascending order. This comparison function is needed in this judgment process.

Here is another example of calling this comparison function. Every time hash_map enters an existing bucket (because the hash value already exists), compare_function (oldvalue. key, newvalue. key). If this function returns true, it considers it oldvalue. key <newvalue. key, that is, this comparison function is used internally in hash_map to determine the oldvalue. key and newvalue. key. When true is returned, it is considered as oldvalue. key <newvalue. key. When false is returned, compare_function (newvalue. key, oldvalue. key ). If false is returned, it means that newvalue. Key is not less than oldvalue. Key, and the relationship between them must be newvalue. Key = oldvalue. Key. In this way, newvalue will not be inserted into this bucket. If the final relation is newvalue. Key! = Oldvalue. Key, then hash_map is generated based on the comparison function you call.
The resultSuitableInsert the new element.

At this point, we have already completed the role of hash_map to call this comparison function. To facilitate understanding, the form of writing pseudo code is as follows:

If (! Compare_function (oldvalue. Key, newvalue. Key ))

{Description oldvalue. Key! <Newvalue. Key; Continue judgment

If (! Compare_function (newvalue. Key, oldvalue. Key)

{

Newvalue. Key! <Oldvalue. Key, you can determine newvalue. Key = oldvalue. Key; here you can directly return a reference to oldvalue. value.

}

Else

{

Determine newvalue. Key <oldvalue. Key, depeat other elements in the bucket, and insert the element in the ascending order of the bucket.

}

Else

{

Determine oldvalue. Key <newvalue. Key; then depeat the other elements in the bucket, and insert the elements in the ascending order of the bucket.

}

The above is the oldvalue. Key! = Newvalue. next, we will use the comparison function to find a suitable position for insertion. The appropriate position here will be different because of the Implementation logic of the comparison function, the simplest way is to write the return S1 <S2; through this comparison function, hash_map can sort the elements in the bucket in the ascending order (the key is indeed in the ascending order). If it is written as return
S1> S2; because your judgment logic is opposite to the default judgment logic in the hash function, the final result is that the hash function sorts the elements in the bucket from "small" to "large" (in fact, from large to small ). finally, if you write return S1 = S2, then true will be returned only when S1 is indeed equal to S2, that is, only when newvalue does exist, hash_map will get the result of S1 <s2 when calling the comparison function for the first time, and inserts newvalue into oldvalue, And if newvalue! = Oldvalue, new elements are not inserted.

If some other logic is written in the comparison function, it is easy to know what the elements in the bucket are based on this idea.

Written at the end: I have never written such a detailed article (I am talking about what I wrote earlier). I feel so tired, but I have gained a lot, it is indeed more clear about the working mechanism of hash_map than before writing. I admire the technical explanations written by the technical experts. I will continue to work hard in the future ..

At the end, I still hope that someone will correct the error ~~~~~ Thank you.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

C ++'s understanding of hash_map custom hash functions and comparison functions

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support