C + + series: Unordered_map

Source: Internet
Author: User

1. Conclusion
The new version of the Hash_map are unordered_map, here only unordered_map and map.
Operational efficiency: The UNORDERED_MAP is the highest, while map efficiency is low but provides a stable efficiency and orderly sequence.
Memory footprint: Map memory consumption is slightly lower, unordered_map memory consumption is slightly higher, and is linearly proportional.
Need unordered container, quick Find Delete, not worry about slightly higher memory with unordered_map; orderly container Stable find delete efficiency, memory is very concerned about time with map.
2. Principle


The internal implementation of map is a two-fork balance tree (red-black tree); Hash_map internal is a hash_table generally by a large vector,vector element node can hook up the list to resolve the conflict, to achieve.

Hash_map its insertion process is:
Get key
Hash value obtained by hash function
Get the bucket number (usually hash value for the number of barrels to die)
Store key and value inside the bucket.

The procedure for taking the value is:
Get key
Hash value obtained by hash function
Get the bucket number (usually hash value for the number of barrels to die)
Compares whether the inner elements of the bucket are equal to the key, and if they are not equal, they are not found.
Remove the value of the equivalent record.

The direct address in Hash_map is generated by the hash function, resolving the conflict and solving it with the comparison function.
3. Memory Footprint Test

Test code:
Under test Conditions window, VS2015 C + +. string is key and int is value.
1.UNORDERMAP:
[CPP] View plain copy
#include <unordered_map>
#include <string>
#include <iostream>
#include <windows.h>
#include <psapi.h>
#pragma comment (lib, "PSAPI.lib")
using namespace Std;
using namespace Stdext;
void Showmemoryinfo (void)
{
HANDLE HANDLE = GetCurrentProcess ();
Process_memory_counters PMC;
Getprocessmemoryinfo (handle, &AMP;PMC, sizeof (PMC));
cout << "Memory use:" << PMC. workingsetsize/1024.0f << "kb/" << PMC. peakworkingsetsize/1024.0f << "KB, Virtual Memory use:" << PMC. pagefileusage/1024.0f << "kb/" << PMC. peakpagefileusage/1024.0f << "KB" << Endl;
}

Define the class
/*-------------------------------------------*/
/* Function class
* As a hash function of Hash_map
*string does not have a default hash function
*/
Class Str_hash {
Public
size_t operator () (const string& str) const
{
unsigned long __h = 0;
for (size_t i = 0; i < str.size (); i++)
__h = 5 * __h + str[i];
Return size_t (__h);
}
};

/*-------------------------------------------*/
/* Function class
* As a comparison function of Hash_map)
* (different keys tend to use the same hash value when looking for
*/
Class Str_compare
{
Public
BOOL Operator () (const string& STR1, const string& str2) const
{
return str1 = = str2;
}
};

struct Charless:public binary_function<const string&, const string&, bool>
{
Public
Result_type operator () (const first_argument_type& _left, const second_argument_type& _right) const
{
Return (_left.compare (_right) < 0 true:false);
}
};

int main ()
{

cout << "Test HashMap (Unorder map) Memory use Start ..." << Endl;
VC under Custom Type
unordered_map<string, int, hash_compare<string, charless> > Charhash;
for (int i = 0; i < 10000000; i++)
{
String key = To_string (i);
Charhash[key] = i;
}
cout << "Test HashMap (Unorder map) Memory use End." << Endl;
Showmemoryinfo ();
while (1);
return 0;
}


2.map:
[CPP] View plain copy
#include <iostream>
#include <map>
#include <string>
#include <windows.h>
#include <psapi.h>
#pragma comment (lib, "PSAPI.lib")
using namespace Std;
void Showmemoryinfo (void)
{
HANDLE HANDLE = GetCurrentProcess ();
Process_memory_counters PMC;
Getprocessmemoryinfo (handle, &AMP;PMC, sizeof (PMC));
cout << "Memory use:" << PMC. workingsetsize/1024.0f << "kb/" << PMC. peakworkingsetsize/1024.0f << "KB, Virtual Memory use:" << PMC. pagefileusage/1024.0f << "kb/" << PMC. peakpagefileusage/1024.0f << "KB" << Endl;
}


int main ()
{
cout << "Test Map (red-black Tree) Memory use Start ..." << Endl;
VC under Custom Type
Map<const char*, int, hash_compare<const char*, charless> > Charhash;
Map<string, int> Charmap;
for (int i = 0; i < 10000000; i++)
{
String key = To_string (i);
Charmap[key] = i;
}
cout << "Test Map (red-black Tree) Memory use End." << Endl;
Showmemoryinfo ();
while (1);
return 0;
}

Test results:
1000 elements:
Map

Unorder_map:

100,000 elements:
Map

Unorder_map:

10 million elements:
Map

Unorder_map:

You can see that the unordermap is always larger than the map memory footprint and is linearly proportional.

4. Performance Features

Non-frequent queries with map more stable, frequent queries with hash_map efficiency will be higher, c++11 in the UNORDERED_MAP query efficiency is higher but the memory consumption is slightly larger than hash_map. Unordered_map is the HASH_MAP implementation in boost.

In fact, Stl::map is for TreeMap in Java, and Boost::unordered_map corresponds to HashMap in Java.
The map in Python is implemented by HASHMAP, so query efficiency is faster than C + + map queries. (Java,python the official version of the virtual machine is implemented in C language, so the internal ideas and methods are universal.) )

If the order is considered, the query speed is stable, the container element quantity is less than 1000, the non-frequent query consider using map.
If very high-frequency query (more than 100 elements, unordered_map will be faster than the map), the internal elements can be non-orderly, the data is larger than 1k or even hundreds of thousands of million times to consider the use of unordered_map (the element tens of millions of 4GB of memory will worry about the memory is not enough, Requires that the database stored procedure be moved to disk).
Hash_map compared to unordered_map is tens more than the memory consumption of less than 15MB, hundreds of millions of times memory consumption of 300MB, under million is unordered_map occupy less memory,
and unordered_map Insert Delete is faster than hash_map, the search efficiency is similar to hash_map, or only a little faster than about 1/50 to 1/100.
A combination of non-ordered or stable maps should use the Unordered_map,set type as well.
Unordered_map find efficiency five times times faster, insert faster, save a certain amount of memory. If there is no need to sort, try to use Hash_map (Unordered_map is the HASH_MAP implementation in boost).
5. Using Unordered_map

Unordered_map requires overloading the Hash_value function and overloading the operator = = operator.
For a detailed reference (thank Orzlzro for writing such a good article):
http://blog.csdn.net/orzlzro/article/details/7099231
6. Questions to be aware of when using hash_map

/**
*\author Peakflys
*\brief Demo Hash_map Key value change caused by the problem
*/
#include <iostream>
#include <ext/hash_map>
struct UNIT
{
Char name[32];
unsigned int score;
Unit (const char *_name,const unsigned int _score): Score (_score)
{
strncpy (name,_name,32);
}
};
int main ()
{
typedef __gnu_cxx::hash_map<char*,unit*> UHMAP;
typedef uhmap::value_type HMTYPE;
typedef uhmap::iterator Hmiter;
Uhmap Hmap;
Unit *unit1 = new unit ("Peak", 100);
Unit *unit2 = new unit ("Joey", 20);
Unit *unit3 = new unit ("Rachel", 40);
Unit *UNIT4 = new unit ("Monica", 90);
Hmap[unit1->name] = unit1;
Hmap[unit2->name] = unit2;
Hmap.insert (Hmtype (UNIT3-&GT;NAME,UNIT3));
Hmap.insert (Hmtype (UNIT4-&GT;NAME,UNIT4));
For (Hmiter It=hmap.begin (); It!=hmap.end (); ++it)
{
std::cout<<it->first<< "\ t" <<it->second->score<<std::endl;//normal operation
}
For (Hmiter It=hmap.begin (); It!=hmap.end (); ++it)
{
Unit *unit = it->second;
Hmap.erase (it++);
Delete Unit; Delete frees node memory, but Hmap is not removed, causing hmap internal confusion, possibly downtime
}
Hmiter it = Hmap.begin ();
strncpy (It->first, "CC", 32);//Forced change
For (Hmiter It=hmap.begin (); It!=hmap.end (); ++it)
{
std::cout<<it->first<< "\ t" <<it->second->score<<std::endl;//dead loop, reason to participate above + + operation instructions
The/*operator++ operation starts from _m_cur, takes precedence _m_cur->_m_next, iterates through the vector until it finds a _m_cur non-empty node, and needs to take its corresponding bucket position when traversing the vector (see above Hash_ Map value process), the value of key in _m_bkt_num_key (key) is the modified value, if you change the key value, through this function to get the bucket position before your current element, which creates a dead loop.
*/
}
return 0;
}
7.VC Reference Instance

[CPP] View plain copy
#include "stdafx.h"

Storage process: Key->hash function->hash value to the bucket number to get the bucket number (the bucket has a value to resolve the conflict), storing key and value in the bucket
Retrieval process: Key->hash function->hash value to the bucket number to get the bucket number (the bucket has a value to resolve the conflict), compare the key in the bucket is equal,
If not equal returns an empty iterator, otherwise returns an iterator.

1.hash_map defines the hash addressing function (used from the key to the hash value) and the hash comparison function (for resolving conflicts) for the following types of keys.
struct hash<char*>
struct Hash<const char*>
struct hash<char>
struct hash<unsigned char>
struct hash<signed char>
struct hash<short>
struct hash<unsigned short>
struct hash<int>
struct hash<unsigned int>
struct hash<long>
struct hash<unsigned long>
The built-in type is directly hash_map<int, string> Mymap, and can be used just like a normal map.

2. Custom hash functions and comparison functions
Be aware of the following points when declaring your own hash function:

Use a struct, and then overload operator ().
Return is size_t
The parameter is the type of key you want to hash.
The function is of type Const.

To define your own comparison function:
Use a struct, and then overload operator ().
Return is bool
parameter is the two constant parameter of the type of key you want to hash, used for comparison.
The function is of type Const.

Use of custom hash functions and comparison functions:
Hash_map<classa, String, hash_a, equal_a> hmap;

Common functions used by 3.hash_map

The function of Hash_map is similar to the function of map. For specific function parameters and explanations, see: STL Programming Manual: Hash_map, here are a few of the most commonly used functions.
//
Hash_map (size_type N) If you are efficient, this parameter must be set. n is used primarily to set the number of hash buckets in a hash_map container.
The more buckets, the smaller the probability that the hash function will conflict, the less the probability of re-applying for memory. The greater the N, the higher the efficiency, but the greater the memory consumption.
//
Const_iterator Find (const key_type& k) Const. With lookup, enter as a key value and return as an iterator.
//
data_type& operator[] (const key_type& k). This is one of my most used functions. Because it is particularly handy, it can be used like an array.
However, it is important to note that when you use the [key] operator, if there is no key element in the container, it is equivalent to automatically adding a key element.
So when you just want to know if there's a key element in the container, you can use Find. If you want to insert the element, you can use the [] operator directly.
//
Insert function. The Insert function is similar to the function of the [] operator when the container does not contain a key value. But when there are more and more elements in the container,
The elements in each bucket are incremented, and in order to ensure efficiency, HASH_MAP will automatically request larger memory to generate more buckets. So after the insert,
The previous iterator may not be available.
//
The erase function. In the insert process, Hash_map may automatically expand the container's memory when the elements of each bucket are too long.
In SGI STL, however, erase does not automatically reclaim memory. So after you call erase, the iterator of the other elements is still available.

#include #include <string>
#include <iostream>
using namespace Std;
using namespace Stdext;
Define the class
/*-------------------------------------------*/
/* Function class
* As a hash function of Hash_map
*string does not have a default hash function
*/
Class str_hash{
Public
size_t operator () (const string& str) const
{
unsigned long __h = 0;
for (size_t i = 0; i < str.size (); i + +)
__h = 5*__h + str[i];
Return size_t (__h);
}
};

/*-------------------------------------------*/
/* Function class
* As a comparison function of Hash_map)
* (different keys tend to use the same hash value when looking for
*/
Class Str_compare
{
Public
BOOL Operator () (const string& STR1,CONST string& str2) const
{return str1==str2;}
};

struct Charless:public binary_function<const char*, const char*, bool>
{
Public
Result_type operator () (const first_argument_type& _left, const second_argument_type& _right) const
{
Return (strcmp (_left, _right) < 0 true:false);
}
};

int main ()
{
Built-in type
Hash_map<int,string> Myhashmap;
Myhashmap[0] = "Jessecen";
MYHASHMAP[1] = "Ozz";
Hash_map<int,string>::iterator Itrhash = myhashmap.find (0);
if (Itrhash! = Myhashmap.end ())
{
cout<< "My Name is:" <<itrhash->second.c_str () <<endl;
}

VC under Custom Type
Hash_map<const char*, int, hash_compare<const char*, charless> > Charhash;
Charhash["a"] = 123;
Charhash["B"] = 456;
Hash_map<const char*, int, hash_compare<const char*, charless> >::iterator Itrchar = CharHash.find ("B");
if (Itrchar! = Charhash.end ())
{
cout<< "The Find number is:" << itrchar->second<<endl;
}

return 0;
}

C + + series: Unordered_map

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.