Hash table for STL source code analysis

Last Update:2014-08-21 Source: Internet

Author: User

Tags table definition

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly analyzes the implementation of hash tables in G ++ STL. In STL, apart from the map and set with the underlying storage structure of the red and black trees, there are also hash_map and hash_set implemented using hash tables. The query time of MAP and set is in the logarithm level, while hash_map and hash_set are faster and can reach the constant level. However, the hash table requires more memory space, which is used to change the time by space, it is not easy to select a good hash function.

I. Basic concepts of hash tables

A hash table, also known as a hash table, is a data structure that directly accesses the memory based on keywords. Through the hash function, the key value ing is converted into the position in the array, you can access the data within the O (1) time. For example, there is a hash table that stores household information and queries the information of their home by name. The hash function is F (), and the array info [N] is used for storage, the information of Michael Jacob is stored in info [F (James. As a result, you do not need to know how many people there are in Michael's home, each of which is several mu of land, and there are several cows in the ground. Right soon, right? But sometimes F (zhangsan) is equal to F (Li Si). This is called hash collision. Collision is caused by a hash function. A good hash function can only reduce the probability of a hash collision, but cannot be completely avoided. There are two methods to handle conflicts:

1. Open address Method

I first stored Michael's information. When I found that there was a record in this location, what should I do, then a new location is found. This method is too much. It can be placed in the next position. If there is still a location, it will be placed in the next one, and so on. This is called linear detection; it may be too slow to find one location, so we can find it at intervals of 12, 22, 32. This is called the square test, or we can call another hash function g () get a new location, which is called re-hash...

2. Chain Opening Method

It would be too troublesome for Li Si to find a new trap. He would like to put it together with Zhang San and connect it through a linked list. This is the chain opening method. Multiple records may be stored in one location in the chain opening method.

The ratio of the number of elements in a hash table to the length of an array is called the load factor of the hash table. The array space of the open address method is fixed, and the load factor is not greater than 1. When the load factor increases, the collision probability increases. When the load factor exceeds 0.8, the cache hit rate increases according to the exponential curve, so the load factor should be strictly controlled below 0.7-0.8, and the length of the array should be extended when the limit is exceeded. The load factor of the open chain method can be greater than 1, the expected time for data insertion is O (1), the expected time for data query is O (1 + a), and a is the load factor, when a is too large, you also need to extend the array length.

Ii. STL hash table structure

STL uses the Open Chain Method to implement hash tables. Each hash node contains data and next pointers,

template<class _Val>    struct _Hashtable_node    {      _Hashtable_node* _M_next;      _Val _M_val;    };

The array size N must be specified for the hash table definition, but the actually allocated array length is a prime number calculated based on N,

void _M_initialize_buckets(size_type __n)      {        const size_type __n_buckets = _M_next_size(__n);        _M_buckets.reserve(__n_buckets);        _M_buckets.insert(_M_buckets.end(), __n_buckets, (_Node*) 0);        _M_num_elements = 0;      } inline unsigned long  __stl_next_prime(unsigned long __n)  {    const unsigned long* __first = _Hashtable_prime_list<unsigned long>::_S_get_prime_list();    const unsigned long* __last = __first + (int)_S_num_primes;    const unsigned long* pos = std::lower_bound(__first, __last, __n);    return pos == __last ? *(__last - 1) : *pos;  }

Find the first number greater than N from prime_list. List is a calculated static array containing 29 prime numbers.

template<typename _PrimeType> const _PrimeType  _Hashtable_prime_list<_PrimeType>::__stl_prime_list[_S_num_primes] =    {      5ul,          53ul,         97ul,         193ul,       389ul,      769ul,        1543ul,       3079ul,       6151ul,      12289ul,      24593ul,      49157ul,      98317ul,      196613ul,    393241ul,      786433ul,     1572869ul,    3145739ul,    6291469ul,   12582917ul,      25165843ul,   50331653ul,   100663319ul,  201326611ul, 402653189ul,      805306457ul,  1610612741ul, 3221225473ul, 4294967291ul    };

For example, if the length of a table is 50, 53 is actually allocated, 100 is specified, and 193 is actually allocated. we can find that in the _ stl_prime_list array, the last number is always about twice the previous one. This is not a coincidence. When inserting data, if the number of all elements is greater than the length of the hash table array, in order to make the load factor of the hash table always less than 1, you must call resize to re-allocate, the growth speed is similar to that of the vector, the length of each allocated array is almost doubled.

template<class _Val, class _Key, class _HF, class _Ex, class _Eq, class _All>    void    hashtable<_Val, _Key, _HF, _Ex, _Eq, _All>::    resize(size_type __num_elements_hint)    {      const size_type __old_n = _M_buckets.size();      if (__num_elements_hint > __old_n)        {          const size_type __n = _M_next_size(__num_elements_hint);          if (__n > __old_n)            {              _Vector_type __tmp(__n, (_Node*)(0), _M_buckets.get_allocator());              __try                {                  for (size_type __bucket = 0; __bucket < __old_n; ++__bucket)                    {                      _Node* __first = _M_buckets[__bucket];                      while (__first)                        {                          size_type __new_bucket = _M_bkt_num(__first->_M_val,                                                              __n);                          _M_buckets[__bucket] = __first->_M_next;                          __first->_M_next = __tmp[__new_bucket];                          __tmp[__new_bucket] = __first;                          __first = _M_buckets[__bucket];                        }                    }                  _M_buckets.swap(__tmp);                }              __catch(...)                {                  for (size_type __bucket = 0; __bucket < __tmp.size();                    ++__bucket)                    {                      while (__tmp[__bucket])                        {                          _Node* __next = __tmp[__bucket]->_M_next;                          _M_delete_node(__tmp[__bucket]);                          __tmp[__bucket] = __next;                        }                    }                  __throw_exception_again;                }            }        }    }

Each newly inserted element is placed before the first node of the linked list.

template<class _Val, class _Key, class _HF, class _Ex, class _Eq, class _All>    pair<typename hashtable<_Val, _Key, _HF, _Ex, _Eq, _All>::iterator, bool>    hashtable<_Val, _Key, _HF, _Ex, _Eq, _All>::    insert_unique_noresize(const value_type& __obj)    {      const size_type __n = _M_bkt_num(__obj);      _Node* __first = _M_buckets[__n];            for (_Node* __cur = __first; __cur; __cur = __cur->_M_next)        if (_M_equals(_M_get_key(__cur->_M_val), _M_get_key(__obj)))          return pair<iterator, bool>(iterator(__cur, this), false);            _Node* __tmp = _M_new_node(__obj);      __tmp->_M_next = __first;      _M_buckets[__n] = __tmp;      ++_M_num_elements;      return pair<iterator, bool>(iterator(__tmp, this), true);    }

Iii. Hash Functions

The hash function is used to calculate the position of an element in the array. m_bkt_num_key encapsulates the hash function and returns the position of the element in the array from the remainder of the array length.

size_type      _M_bkt_num_key(const key_type& __key, size_t __n) const      { return _M_hash(__key) % __n; }

_ M_hash is defined in

inline size_t  __stl_hash_string(const char* __s)  {    unsigned long __h = 0;    for ( ; *__s; ++__s)      __h = 5 * __h + *__s;    return size_t(__h);  }  template<>    struct hash<char*>    {      size_t      operator()(const char* __s) const      { return __stl_hash_string(__s); }    };  template<>    struct hash<const char*>    {      size_t      operator()(const char* __s) const      { return __stl_hash_string(__s); }    };  template<>    struct hash<char>    {       size_t      operator()(char __x) const      { return __x; }    };template<>    struct hash<int>    {       size_t       operator()(int __x) const       { return __x; }    };  template<>    struct hash<unsigned int>    {       size_t      operator()(unsigned int __x) const      { return __x; }    };  template<>    struct hash<long>    {      size_t      operator()(long __x) const      { return __x; }    };……

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More