Hash_map Introduction to the Hebrew table in C + + STL

Source: Internet
Author: User
Tags map data structure

0 Why do I need hash_map

Did you use map? Map provides a very common feature, which is to provide key-value storage and lookup capabilities. For example, I want to record a person name and the corresponding storage, and add at any time, to quickly find and modify:

Yeu Bu-The head of Huashan faction, called the Gentleman Sword Zhang Sanfeng-Wudang head, Taijiquan founder Oriental undefeated-first master, sunflower treasure ...
This information is not complicated if it is saved, but it is more troublesome to find. For example, I want to find "Zhang San Fung" information, the most stupid way is to get all the records, and then by the name of a comparison. If you want to be fast, you need to put these records in alphabetical order, and then follow the binary method to find them. But when you add records, you need to keep the records in order, so you need to insert a sort. With efficiency in mind, this requires a two-fork tree. It will be endless, if you use the STL map container, you can easily implement this function without worrying about its details. About map data structure details, interested friends can see Learning STL map, STL set data structure of the foundation. Look at the map implementation:
#include <map> #include <string>using namespace std;...map<string, string> namemap;//Increase ... namemap["Yue not Group"] = "Huashan faction head, called The Gentleman Sword"; namemap["Zhang Sanfeng"] = "Wudang Head, Taijiquan founder"; namemap["oriental unbeaten"] = "First Master, Sunflower Treasure Book"; ...//find: if (Namemap.find ("Yue Qun")! = Namemap.end ()) {        ...}
Don't you think it's easy to use? And the efficiency is very high, 1 million records, at most also as long as 20 times the String.Compare comparison, can find you to find the record, 2 million records the matter, also only uses 21 times comparison.

Speed will never meet the needs of reality. If there are 1 million records, I need to search frequently, 20 comparisons will also become a bottleneck, if it can be reduced to one or two comparisons is possible? and is it possible to compare one or two times when the number of records is 2 million? And it needs to be as easy to use as map.

The answer is yes. Then you need to has_map. Although Hash_map is not currently included in the C + + Standard Template Library, almost every version of STL provides a corresponding implementation. And the application is very extensive. Before formally using Hash_map, take a look at the hash_map principle.

1 data structure: hash_map principle

This is a section to let you in-depth understanding of Hash_map's introduction, if you just want to swallowed, do not want to understand the principle, you can skip this section, but I still suggest you look, more understanding of some no harm.

The Hash_map is based on a hash table (hash table). The biggest advantage of the hash table is that the time to store and find data is greatly reduced, almost as a constant time, but at the cost of more memory than it consumes. However, with more and more memory available now, it is worthwhile to use space for time-swapping. In addition, the coding is easy is also one of its characteristics.

The rationale is to use an array with a larger subscript range to store the elements. You can design a function (a hash function, also called a hash function) so that each element's keyword corresponds to a function value (that is, the array subscript, hash value), and then use the array unit to store the element, or it can be simply understood, by the keyword for each element "classification", This element is then stored in the corresponding "class" corresponding to the place, called the bucket.

However, it is not possible to guarantee that each element's keyword corresponds to a function value of one by one, so it is very likely that there are different elements, but the same function values are computed, thus creating a "conflict", in other words, by splitting different elements into the same "class". In general,"direct addressing" and "Conflict resolution " are two major features of the hash table.

Hash_map, first allocating a large amount of memory, forming many barrels. is to use the hash function, the key is mapped to different regions (buckets) to save. The insertion process is:

    1. Get key
    2. Hash value obtained by hash function
    3. Get the bucket number (usually hash value for the number of barrels to die)
    4. Store key and value inside the bucket.

The procedure for taking the value is:

    1. Get key
    2. Hash value obtained by hash function
    3. Get the bucket number (usually hash value for the number of barrels to die)
    4. Compares whether the inner elements of the bucket are equal to the key, and if they are not equal, they are not found.
    5. Remove the value of the equivalent record.

The direct address in Hash_map is generated by the hash function, and the conflict resolution is solved by comparison function. As can be seen here, if there is only one element inside each bucket, then there is only one comparison when searching. Many queries are faster when there are no values in many buckets (when they are not found).

Thus, to implement a hash table, and the user is related to:hash function and comparison function . These two parameters are exactly the parameters we need to specify when using Hash_map.

2 Hash_map using 2.1 a simple instance

Don't worry about how to put "Yue-qun" with Hash_map, we first look at a simple example: randomly give you an ID number and ID number corresponding to the information, the ID number of the range is 31 times. How to quickly save a lookup.

#include 
int main () {        hash_map<int, string> mymap;        mymap[9527] = "Tong Pak Fu points Autumn Incense";        MYMAP[1000000] = "The Life of a Millionaire";        MYMAP[10000] = "white-collar salary bottom line";        ...        if (Mymap.find (10000)! = Mymap.end ()) {                ...        }
Simple enough to use the same method as the map. Then you might ask? What about hash functions and comparison functions? Don't you want to specify? You're right, but when you don't specify a hash function and a comparison function , you'll have a default function that looks at the Hash_map declaration, and you'll know better. The following is an SGI STL declaration:
Template <class _key, class _tp, class _HASHFCN = Hash<_key>,class _equalkey = Equal_to<_key>,class _Alloc = __stl_default_allocator (_TP) >class hash_map{        ...}
That is, in the example above, there is the following equivalent relationship:
Hash_map<int, string> mymap;//equivalent to: Hash_map<int, String, Hash<int>, equal_to<int> > Mymap;
Alloc we should not take too much attention (want to learn more about allocator's friends can see what the standard library stl:allocator can do)
The hash function of 2.2 hash_map

What exactly is hash< int> like? Look at the source code:

struct Hash<int> {        size_t operator () (int __x) const {return __x;}};
It turns out to be a function object. In SGI STL, the following hash functions are available:
struct hash<char*>struct hash<const char*>struct hash<char> struct hash<unsigned char> struct hash<signed char>struct hash<short>struct hash<unsigned short> struct hash<int> struct hash <unsigned int>struct hash<long> struct hash<unsigned long>
In other words, if your key is using one of the above types, you can use the default hash function. Of course you can also define your own hash function. For custom variables, you can only do so, for example, for string, you must customize the hash function. For example:
struct hash_string{        size_t operator () (const string& str) const        {                unsigned long __h = 0;                for (size_t i = 0; i < str.size (); i + +)                __h = 5*__h + str[i];                Return size_t (__h);        }};/ /If you wish to take advantage of the system-defined string hash function, you can write: struct hash_string{        size_t operator () (const string& str) const        {                Return __stl_hash_string (Str.c_str ());}        ;
Under Visual Studio, hash function and equal function are in a structure and do not want SGI to be separate.
struct hash_string
{
static const size_t bucket_size = 4;
static const size_t Min_buckets = 8;
1. Define the hash function
size_t operator () (const string& str) const
{
unsigned long __h = 0;
for (size_t i = 0; i < str.size (); i + +)
__h = 5*__h + str[i];
Return size_t (__h);
}
1. Define the hash function
size_t operator () (const string& str) const
//{
Return __stl_hash_string (Str.c_str ());
//}
2. Define the Equal function
BOOL Operator () (const string& P1, const string& p2) const{
return P1 = = P2;
}
};
Be aware of the following points when declaring your own hash function:
    1. Use a struct and then overload operator ();
    2. return is size_t;
    3. The parameter is the type of key you want to hash;
    4. The function is of type Const.

If these are more difficult to remember, the simplest way is to Tiger, find a function to change it.

It is now possible to hash the "Yue Qun" at the beginning. Replace directly with the following declaration:

hash_string> Namemap;
No other usage is necessary. Of course, don't forget it. hash_string declaration and header file changed to Hash_map.

You might ask: What about the comparison function? Don't worry, the comparison function in Hash_map is introduced here.

Comparison function of 2.3 hash_map

In the comparison function in map, the less function needs to be provided. If not provided, the default is less< key>. In Hash_map, you want to compare whether the data in the bucket is equal to the Key, so you need to be equal to the function:equal_to< key>. First look at the source code of EQUAL_TO:

This code can take a look at the Binary_function function declaration from SGI stl//, but only defines some types. Template <class _arg1, class _arg2, class _result>struct binary_function {        typedef _arg1 first_argument_type;< C1/>typedef _arg2 Second_argument_type;        typedef _result result_type;};/ /Look at the definition of equal_to: template <class _tp>struct equal_to:public binary_function<_tp,_tp,bool>{        bool Operator () (const _tp& __x, const _tp& __y) Const {return __x = = __y;};
If you use a custom data type, such as a struct mystruct, or a const char* string, how do I use the comparison function?

There are two ways to use a comparison function.

The first is: overloading = = operator , using equal_to; see the following example:

struct mystruct{        int iID;        int  Len;        BOOL operator== (const MYSTRUCT & my) const{                return (IID==MY.IID) && (Len==my.len);        };
In this way, you can use equal_to< mystruct> as the comparison function.

Another way is to use a function object . To customize a comparison function body :

struct compare_str{        bool operator () (const char* P1, const CHAR*P2) const{                return strcmp (P1,P2) ==0;}        };
With the COMPARE_STR, you can use the hash_map.
typedef hash_map<const char*, String, Hash<const char*>, compare_str> Strintmap; Strintmap namemap;namemap["Yue not Group"]= "Huashan faction head, called The Gentleman Sword", namemap["Zhang three Fung"]= "Wudang Head, the founder of Taijiquan"; namemap["Oriental Undefeated"]= "first master, Sunflower Treasure Book";
2.4 Hash_map function

the function of Hash_map is similar to the function of map. for specific function parameters and explanations, see: STL Programming Manual: Hash_map, here are a few of the most commonly used functions.

    1. Hash_map (size_type N) If you are efficient, this parameter must be set. n is used primarily to set the number of hash buckets in a hash_map container. The more buckets, the smaller the probability that the hash function will conflict, the less the probability of re-applying for memory. The greater the N, the higher the efficiency, but the greater the memory consumption.
    2. const_iterator Find (const key_type& k) Const. With lookup, the input is a key value and is returned as an iterator.
    3. data_type& operator[] (const key_type& k) . This is one of my most used functions. Because it is particularly handy, it can be used like an array. However, it is important to note that when you use the [key] operator, if there is no key element in the container, it is equivalent to automatically adding a key element. So when you just want to know if there's a key element in the container, you can use Find. If you want to insert the element, you can use the [] operator directly.
    4. Insert Function . The Insert function is similar to the function of the [] operator when the container does not contain a key value. But as the elements in the container grow more and more, the elements in each bucket increase, and to ensure efficiency, Hash_map automatically requests larger memory to generate more buckets. As a result, the previous iterator may not be available after the insert.
    5. The erase function . In the insert process, Hash_map may automatically expand the container's memory when the elements of each bucket are too long. In SGI STL, however, erase does not automatically reclaim memory. So after you call erase, the iterator of the other elements is still available.

3 Related hash containers

In addition to Hash_map , the hash container also has hash_set, hash_multimap, has_multiset, these containers are used and set, Multimap, the difference between multiset and Hash_map and map is the same, I think I do not need to elaborate one by one.

4 other

Here are a few common questions that should help you understand and use Hash_map.

What is the difference between 4.1 hash_map and map?
    • constructor function . Hash_map requires a hash function, equals a function, and a map requires only a comparison function (less than a function).
    • storage structure . Hash_map use hash table storage, map is generally used red-black tree (RB tree) to achieve. So its memory data structure is not the same.
4.2 When do I need to use hash_map and when do I need a map?

Overall, the Hash_map lookup speed is faster than map, and the lookup speed is basic and the data data size is the constant level, and the map lookup speed is the log (n) level. Not necessarily the constant is smaller than log (n), hash and hash function time-consuming, understand, if you consider efficiency, especially when the element reaches a certain order of magnitude, consider hash_map. But if you are particularly strict about memory usage, and you want the program to consume as little memory as possible, be careful, hash_map may embarrass you, especially if your hash_map object is so special, you're more out of control, and Hash_map's construction is slow.

Do you know how to choose now? weigh Three factors : find speed, amount of data, memory usage.

Here's a little story about hash_map and map, see: http://dev.csdn.net/Develop/article/14/14019.shtm

4.3 How do I add a defined type to a hash_map?

You just have to do two things, define the hash function , and define the equals comparison function . The following code is an example:

-bash-2.05b$ Cat My.cpp
#include <string> #include <iostream>using namespace std;
//Just for "#include  #if __gnuc__>2
#include <ext/hash_map>
Using __gnu_cxx::hash_map;
#else
#include #endif
//0 define the class ClassA {
Public:        ClassA (int a): C_a (a) {}        int getvalue () const {return c_a;}        void SetValue (int a) {c_a = A;}
Private:        int c_a;}; //1 Define the hash function hash_a {        size_t operator () (const class ClassA & A) const{                //  return  hash<int> (Classa.getvalue ());                return A.getvalue ();        }}; //2 define the equal function equal_a {        bool operator () (const Class ClassA & A1, Const class ClassA & A2) const{                return  a1.getvalue () = = A 2.getvalue ();        }}; int main () {        hash_map<ClassAhash_aequal_a> hmap;        ClassA A1 (n);        hmap[a1]= "I am";        ClassA A2 (198877);        hmap[a2]= "I am 198877";                cout<
   return 0;}
-bash-2.05b$ make myc++  -o-pipe-march=pentiumpro  my.cpp  -O my-bash-2.05b$ ./myi am 12I Am 198877
Under Visual Studio, hash function and equal function are in a structure and do not want SGI to be separate.
Class   MyClass   
{
....
};

struct My_hash
{
Static const size_t bucket_size = 4;
Static const size_t min_buckets = 8;
size_t operator () (const myclass& Key) const
{
size_t Hash = 999;
For (size_t i = 0; I < 100000; i++)
Hash = "hash function";
return Hash;
}

BOOL operator () (const myclass& C1, const myclass& c2) const
{
return "equal function";
}
};

int Main ()
{
Hash_map<myclass, int, my_hash> my;
......
}

4.4 How to replace the existing map container in the program with Hash_map?

This is easy, but requires you to have a good programming style. It is recommended that you use typedef as much as possible to define your type :

typedef map<key, value> KeyMap;
When you want to use Hash_map to replace, you only need to modify:
typedef hash_map<key, value> KeyMap;
The rest is basically the same. Of course, you need to be aware of the key types of hash and comparison functions .
4.5 Why is hash_map not a standard?

Specifically why not the standard, I do not know, there is an explanation that when STL joined the standard C + +, the Hash_map series was not fully implemented, and should become the standard. If anyone knows a more reasonable explanation, he also wants to tell me. But I want to express that it is because hash_map is not standard, so many platforms installed g++ compiler, not necessarily have hash_map implementation. I have encountered such an example. So when using these non-standard libraries, be sure to test them in advance. Also, if you consider a platform transplant, it is better to use less .

5 Hash Map

Usage is the same as map.

Windows is different from Linux under namespaces.

#ifdef WIN32
using namespace Stdext;
#else
using namespace __gnu_cxx;
#endif

Hash_map Introduction to the Hebrew table in C + + STL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.