STL hash_map Series

Source: Internet
Author: User
STL hash_map Series
  • STL hash_map Series

    • 0 Why hash_map
    • 1. Data Structure: hash_map Principle
    • 2 Use hash_map
      • 2.1 A simple instance
      • 2.2 Hash Functions of hash_map
      • 2.3 comparison functions of hash_map
      • 2.4 hash_map Function
    • 3 related hash containers
    • 4 others
      • 4.1 what is the difference between hash_map and map?
      • 4.2 When do I need hash_map and map?
      • 4.3 how to add a custom type to hash_map?
      • 4.4 how to replace the existing map container in the program with hash_map?
      • 4.5 why is hash_map not standard?
      • 4.6 are there any suggestions for learning to use hash_map?
    • 5 References:

All the major roads lead to Rome. Why don't you choose one?

0 Why hash_map

Have you used map? Map provides a very common function, that is, the key-value storage and search function. For example, if I want to record a person's name and the corresponding storage, and add it at any time, I need to quickly search for and modify it:

Yue Bu Qun-head of Huashan school, known as junzijian
Zhang Sanfeng-head of Wudang, founder of Taijiquan
Invincible in the East-the first master, Kwai Hua baodian
...

It is not complicated to save the information, but it is difficult to find it. For example, if I want to find information about "Zhang Sanfeng", the most silly way is to retrieve all the records and compare them one by name. If you want to speed up, you need to sort these records in alphabetical order and then search by binary method. However, when adding records, you must keep the records in order, so you need to insert the sorting. Considering the efficiency, the binary tree is used. It will be endless. If you use the STL map container, you can easily implement this function without worrying about its details. For details about the data structure of map, if you are interested, refer to learning STL map and STL set as the data structure basis. Let's look at the implementation of MAP:

# Include <map>
# Include <string>
Using namespace STD;
...
Map <string, string> namemap;

// Add...
Namemap ["Yue Bu Qun"] = "the manager of the Huashan school, called the gentleman sword ";
Namemap ["Zhang Sanfeng"] = "head of Wudang, founder of Taijiquan ";
Namemap ["invincible in the east"] = "first master, sunflower Collection ";
...

// Search ..
If (namemap. Find ("Yue Bu group ")! = Namemap. End ()){
...
}

Don't you think it's easy to use? In addition, the efficiency is very high, 1 million records, up to 20 strings. compare comparison, you can find the record you are looking for; 2 million record items, as long as 21 comparisons are used.

The speed will never meet the actual needs. If there are 1 million records and I need to search frequently, 20 comparisons will also become a bottleneck. Is it possible to reduce the number to one or two comparisons? And when the number of records reaches 2 million, it is also one or two comparisons. Is it possible? It also needs the same convenience as map.

The answer is yes. At this time, you need has_map. Although hash_map is not currently included in the C ++ standard template library, almost every version of STL provides the corresponding implementation. It is also widely used. Before using hash_map, let's take a look at the principles of hash_map.

1. Data Structure: hash_map Principle

This section gives you an in-depth understanding of hash_map. If you just want to skip this section and do not want to understand its principles, you can skip this section, but I suggest you check it out, learn more about the disadvantages.

Hash_map is based on the hash table ). The biggest advantage of a hash table is that it takes much time to store and search data, which can almost be regarded as a constant time. The cost is only a large amount of memory consumption. However, given the increasing amount of memory available, it is worthwhile to change the space for time. In addition, encoding is easy and one of its features.

The basic principle is to use an array with a large subscript range to store elements. You can design a function (a hash function, also called a hash function) so that the keywords of each element correspond to a function value (that is, an array submark and a hash value, therefore, this array unit is used to store this element. It can also be simply understood as "classification" for each element based on the keyword ", this element is then stored in the bucket where the corresponding "class" is applied.

However, it is not guaranteed that the keywords of each element correspond to the function values one by one. Therefore, it is very likely that the same function value is calculated for different elements, in this way, "Conflict" occurs. In other words, different elements are divided into the same "class. In general, "direct addressing" and "resolving conflicts" are two major features of a hash table.

Hash_map first allocates a large block of memory to form many buckets. The hash function maps keys to different regions (buckets) for saving. The insert process is as follows:

  1. Get key
  2. Use the hash function to obtain the hash value.
  3. Get the bucket number (generally all evaluate the modulo of the bucket number for the hash value)
  4. Store the key and value in the bucket.

The value process is:

  1. Get key
  2. Use the hash function to obtain the hash value.
  3. Get the bucket number (generally all evaluate the modulo of the bucket number for the hash value)
  4. Compare whether the internal elements of the bucket are equal to the key. If none of them are equal, no results are found.
  5. Obtain the value of the same record.

In hash_map, the direct address is generated using the hash function to resolve the conflict, and the comparison function is used to solve the problem. It can be seen that if each bucket has only one element, only one comparison is performed during search. When there are no values in many buckets, many queries will be faster (when not found ).

It can be seen that the implementation of the hash table is related to the user: Hash function and comparison function. These two parameters are exactly the ones we need to specify when using hash_map.

2 hash_map use 2.1 as a simple instance

Don't worry about how to use hash_map to represent the "Yue Bu group". Let's take a look at a simple example: Randomly give you an ID number and the corresponding information of the ID number. The ID number ranges from 1 ~ Power 31 of 2. How to quickly save the search.

# Include # Include <string>
Using namespace STD;
Int main (){
Hash_map <int, string> mymap;
Mymap [9527] = "Tang bohu: Qiu Xiang ";
Mymap [1000000] = "Life of a millionaire ";
Mymap [10000] = "white-collar salary bottom line ";
...
If (mymap. Find (10000 )! = Mymap. End ()){
...
}

Simple enough, just like map. Then you may ask? What about hash and comparison functions? Are you sure you want to specify it? You are right, but when you do not specify hash and comparison functions, you will have a default function. Let's look at the Declaration of hash_map, and you will understand it better. The declaration of sgi stl is as follows:

template <class _Key, class _Tp, class _HashFcn = hash<_Key>,
class _EqualKey = equal_to<_Key>,
class _Alloc = __STL_DEFAULT_ALLOCATOR(_Tp) >
class hash_map
{
...
}

That is to say, in the above example, there are the following equivalent relationships:

...
Hash_map <int, string> mymap;
// Equivalent:
Hash_map <int, String, hash <int>, comment _to <int> mymap;

In alloc, we should not focus too much on it. (For more information about Allocator, see the standard library STL: What Can Allocator do)

2.2 Hash Functions of hash_map

What does hash <int> look like? Look at the source code:

struct hash<int> {
size_t operator()(int __x) const { return __x; }
};

It turns out to be a function object. In sgi stl, the following hash functions are provided:

struct hash<char*>
struct hash<const char*>
struct hash<char>
struct hash<unsigned char>
struct hash<signed char>
struct hash<short>
struct hash<unsigned short>
struct hash<int>
struct hash<unsigned int>
struct hash<long>
struct hash<unsigned long>

That is to say, if your key is one of the above types, you can use the default hash function. Of course, you can also define your own hash function. You can only do this for custom variables. For example, for strings, you must customize the hash function. For example:

Struct str_hash {
Size_t operator () (const string & Str) const
{
Unsigned long _ H = 0;
For (size_t I = 0; I <Str. Size (); I ++)
_ H = 5 * _ H + STR [I];
Return size_t (_ H );
}
};
// If you want to use the system-defined string hash function, you can write as follows:
Struct str_hash {
Size_t operator () (const string & Str) const
{
Return return _ stl_hash_string (Str. c_str ());
}
};

Pay attention to the following points when declaring your own hash function:

  1. Use struct, and then reload operator ().
  2. The returned value is size_t.
  3. The parameter is the type of the key you want to hash.
  4. The function is of the const type.

If these are hard to remember, the simplest way is to look at them and find a function to change it.

Now we can hash the "yueda group" at the beginning. replace it with the following statement:

Map <string, string> namemap;
// Change:
Hash_map <string, String, str_hash> namemap;

Edge is not used in other usage cases. Of course, do not forget to change the str_hash declaration and header file to hash_map.

You may ask: What about comparison functions? Don't worry. Here we will introduce the comparison functions in hash_map.

2.3 comparison functions of hash_map

For comparison functions in map, the less function is required. If not, the default value is less <key>. In hash_map, You need to compare whether the data in the bucket is equal to the key, so you need a function that is equal to or greater than _to <key>. Let's take a look at the mongo_to source code:

// This code can be used from SGI STL
// First, let's look at the binary_function Declaration. In fact, it just defines some types.
Template <class _ arg1, class _ arg2, class _ result>
Struct binary_function {
Typedef _ arg1 first_argument_type;
Typedef _ arg2 second_argument_type;
Typedef _ result result_type;
};
// See the definition of performance_to:
Template <class _ TP>
Struct functions _to: Public binary_function <_ TP, _ TP, bool>
{
Bool operator () (const _ TP & _ x, const _ TP & _ y) const {return _ x = _ y ;}
};

If you use a custom data type, such as struct mystruct, or const char *, how do you use a comparison function? There are two methods to use the comparison function. The first method is to use the = operator with the overload and use the equal_to; operator. Let's look at the example below:

struct mystruct{
int iID;
int len;
bool operator==(const mystruct & my) const{
return (iID==my.iID) && (len==my.len) ;
}
};

In this way, you can use performance_to <mystruct> as a comparison function. Another method is to use function objects. Customize a comparison function body:

struct compare_str{
bool operator()(const char* p1, const char*p2) const{
return strcmp(p1,p2)==0;
}
};

With compare_str, you can use hash_map.

Typedef hash_map <const char *, String, hash <const char *>, compare_str> strintmap;
Strintmap namemap;
Namemap ["Yue Bu Qun"] = "the manager of the Huashan school, called the gentleman sword ";
Namemap ["Zhang Sanfeng"] = "head of Wudang, founder of Taijiquan ";
Namemap ["invincible in the east"] = "first master, sunflower Collection ";
2.4 hash_map Function

Hash_map functions are similar to map functions. For more information about the parameters and explanations of the functions, see the STL Programming Manual hash_map. Here we mainly introduce several common functions.

  1. Hash_map (size_type N) This parameter must be set for efficiency purposes. N is mainly used to set the number of hash buckets in the hash_map container. The larger the number of buckets, the smaller the probability of hash function conflict, and the smaller the probability of re-applying for memory. The greater N, the higher the efficiency, but the larger the memory consumption.
  2. Const_iterator find (const key_type & K) Const. use search. The input is the key value and the return is the iterator.
  3. Data_type & operator [] (const key_type & K). This is my most commonly used function. Because of its convenience, it can be used as an array. However, when you use the [Key] operator, if there is no key element in the container, it is equivalent to automatically adding a key element. So when you just want to know whether the container has a key element, you can use find. If you want to insert this element, you can directly use the [] operator.
  4. Insert function. When the container does not contain the key value, the insert function is similar to the [] operator. However, when there are more and more elements in the container, the elements in each bucket will increase. To ensure efficiency, hash_map will automatically apply for a larger memory to generate more buckets. Therefore, after insert, the previous iterator may be unavailable.
  5. Erase function. During the Insert Process, when there are too many elements in each bucket, hash_map may automatically expand the memory of the container. However, in sgi stl, erase does not automatically recycle the memory. Therefore, after you call erase, The iterator of other elements is still available.
3 related hash containers

In addition to hash_map, hash containers also include hash_set, hash_multimap, and has_multiset. These containers are used in the same way as set, multimap, and Multiset, I don't need to go into details.

4 others

It is helpful for you to understand and use hash_map.

4.1 what is the difference between hash_map and map?
  • Constructor. Hash_map requires the hash function, which is equal to the function; map only needs the comparison function (less than the function ).
  • Storage Structure. Hash_map uses hash table storage, and map generally uses the red/black tree (rb tree. Therefore, the memory data structure is different.
4.2 When do I need hash_map and map?

In general, the hash_map search speed is faster than that of map, and the basic search speed and data volume size belong to the constant level, while the map search speed is at the log (n) level. Not necessarily, constants are smaller than log (n), and the time consumption of hash functions is also time-consuming. See, if you consider efficiency, especially when the number of elements reaches a certain order of magnitude, consider hash_map. However, if you are very strict with the memory usage and want the program to consume as little memory as possible, be careful. hash_map may embarrass you, especially when you have many hash_map objects, you cannot control it, and the construction speed of hash_map is slow.

Do you know how to choose? Weigh three factors: search speed, data volume, and memory usage.

Here is a little story about hash_map and map, look at: http://dev.csdn.net/Develop/article/14/14019.shtm

4.3 how to add a custom type to hash_map?

You only need to do two things: Define the hash function and define the comparison function. The following code is an example:

-bash-2.05b$ cat my.cpp
#include #include <string>
#include <iostream>

using namespace std;
//define the class
class ClassA{
public:
ClassA(int a):c_a(a){}
int getvalue()const { return c_a;}
void setvalue(int a){c_a;}
private:
int c_a;
};

//1 define the hash function
struct hash_A{
size_t operator()(const class ClassA & A)const{
// return hash<int>(classA.getvalue());
return A.getvalue();
}
};

//2 define the equal function
struct equal_A{
bool operator()(const class ClassA & a1, const class ClassA & a2)const{
return a1.getvalue() == a2.getvalue();
}
};

int main()
{
hash_map<ClassA, string, hash_A, equal_A> hmap;
ClassA a1(12);
hmap[a1]="I am 12";
ClassA a2(198877);
hmap[a2]="I am 198877";

cout< cout<return 0;
}
-bash-2.05b$ make my
c++ -O -pipe -march=pentiumpro my.cpp -o my
-bash-2.05b$ ./my
I am 12
I am 198877
4.4 how to replace the existing map container in the program with hash_map?

This is easy, but you need a good programming style. We recommend that you use typedef to define your type as much as possible:

typedef map<Key, Value> KeyMap;

When you want to use hash_map for replacement, you only need to modify:

typedef hash_map<Key, Value> KeyMap;

The rest remains unchanged. Of course, you need to check whether there are hash and comparison functions of the key type.

4.5 why is hash_map not standard?

I am not sure why it is not standard. I have explained that when STL is added to Standard C ++, The hash_map series were not fully implemented yet and will become a standard in the future. If anyone knows a more reasonable explanation, they also want to tell me. But what I want to express is that it is precisely because hash_map is not a standard, so g ++ compiler is installed on many platforms and does not necessarily implement hash_map. I met this example. Therefore, you must test these non-standard libraries in advance. In addition, it is better to take platform migration into account and use less.

4.6 are there any suggestions for learning to use hash_map?

Hash Chinese is a hash and also a hash. When you hear other people say that the hash container should not blame yourself for being ignorant. To learn about the hash series, you can also read this article: Negative STL 25: familiar with non-standard hash containers. We also recommend that you check the source code. If you have any questions, you can ask questions on the STL forum.

5 References:
  1. STL sorting (SORT)
  2. STL string
  3. Simplified STL Chinese Version
  4. Forum discussion: STL Forum: detailed explanation of STL hash_map Series

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.