Hash table for STL source code analysis

Source: Internet
Author: User
Tags table definition

This article mainly analyzes the implementation of hash tables in G ++ STL. In STL, apart from the map and set with the underlying storage structure of the red and black trees, there are also hash_map and hash_set implemented using hash tables. The query time of MAP and set is in the logarithm level, while hash_map and hash_set are faster and can reach the constant level. However, the hash table requires more memory space, which is used to change the time by space, it is not easy to select a good hash function.

I. Basic concepts of hash tables

A hash table, also known as a hash table, is a data structure that directly accesses the memory based on keywords. Through the hash function, the key value ing is converted into the position in the array, you can access the data within the O (1) time. For example, there is a hash table that stores household information and queries the information of their home by name. The hash function is F (), and the array info [N] is used for storage, the information of Michael Jacob is stored in info [F (James. As a result, you do not need to know how many people there are in Michael's home, each of which is several mu of land, and there are several cows in the ground. Right soon, right? But sometimes F (zhangsan) is equal to F (Li Si). This is called hash collision. Collision is caused by a hash function. A good hash function can only reduce the probability of a hash collision, but cannot be completely avoided. There are two methods to handle conflicts:

1. Open address Method

I first stored Michael's information. When I found that there was a record in this location, what should I do, then a new location is found. This method is too much. It can be placed in the next position. If there is still a location, it will be placed in the next one, and so on. This is called linear detection; it may be too slow to find one location, so we can find it at intervals of 12, 22, 32. This is called the square test, or we can call another hash function g () get a new location, which is called re-hash...

2. Chain Opening Method

It would be too troublesome for Li Si to find a new trap. He would like to put it together with Zhang San and connect it through a linked list. This is the chain opening method. Multiple records may be stored in one location in the chain opening method.

The ratio of the number of elements in a hash table to the length of an array is called the load factor of the hash table. The array space of the open address method is fixed, and the load factor is not greater than 1. When the load factor increases, the collision probability increases. When the load factor exceeds 0.8, the cache hit rate increases according to the exponential curve, so the load factor should be strictly controlled below 0.7-0.8, and the length of the array should be extended when the limit is exceeded. The load factor of the open chain method can be greater than 1, the expected time for data insertion is O (1), the expected time for data query is O (1 + a), and a is the load factor, when a is too large, you also need to extend the array length.

Ii. STL hash table structure

STL uses the Open Chain Method to implement hash tables. Each hash node contains data and next pointers,

template<class _Val>    struct _Hashtable_node    {      _Hashtable_node* _M_next;      _Val _M_val;    };

The array size N must be specified for the hash table definition, but the actually allocated array length is a prime number calculated based on N,

void _M_initialize_buckets(size_type __n)      {        const size_type __n_buckets = _M_next_size(__n);        _M_buckets.reserve(__n_buckets);        _M_buckets.insert(_M_buckets.end(), __n_buckets, (_Node*) 0);        _M_num_elements = 0;      } inline unsigned long  __stl_next_prime(unsigned long __n)  {    const unsigned long* __first = _Hashtable_prime_list<unsigned long>::_S_get_prime_list();    const unsigned long* __last = __first + (int)_S_num_primes;    const unsigned long* pos = std::lower_bound(__first, __last, __n);    return pos == __last ? *(__last - 1) : *pos;  }

Find the first number greater than N from prime_list. List is a calculated static array containing 29 prime numbers.

template<typename _PrimeType> const _PrimeType  _Hashtable_prime_list<_PrimeType>::__stl_prime_list[_S_num_primes] =    {      5ul,          53ul,         97ul,         193ul,       389ul,      769ul,        1543ul,       3079ul,       6151ul,      12289ul,      24593ul,      49157ul,      98317ul,      196613ul,    393241ul,      786433ul,     1572869ul,    3145739ul,    6291469ul,   12582917ul,      25165843ul,   50331653ul,   100663319ul,  201326611ul, 402653189ul,      805306457ul,  1610612741ul, 3221225473ul, 4294967291ul    };

For example, if the length of a table is 50, 53 is actually allocated, 100 is specified, and 193 is actually allocated. we can find that in the _ stl_prime_list array, the last number is always about twice the previous one. This is not a coincidence. When inserting data, if the number of all elements is greater than the length of the hash table array, in order to make the load factor of the hash table always less than 1, you must call resize to re-allocate, the growth speed is similar to that of the vector, the length of each allocated array is almost doubled.

template<class _Val, class _Key, class _HF, class _Ex, class _Eq, class _All>    void    hashtable<_Val, _Key, _HF, _Ex, _Eq, _All>::    resize(size_type __num_elements_hint)    {      const size_type __old_n = _M_buckets.size();      if (__num_elements_hint > __old_n)        {          const size_type __n = _M_next_size(__num_elements_hint);          if (__n > __old_n)            {              _Vector_type __tmp(__n, (_Node*)(0), _M_buckets.get_allocator());              __try                {                  for (size_type __bucket = 0; __bucket < __old_n; ++__bucket)                    {                      _Node* __first = _M_buckets[__bucket];                      while (__first)                        {                          size_type __new_bucket = _M_bkt_num(__first->_M_val,                                                              __n);                          _M_buckets[__bucket] = __first->_M_next;                          __first->_M_next = __tmp[__new_bucket];                          __tmp[__new_bucket] = __first;                          __first = _M_buckets[__bucket];                        }                    }                  _M_buckets.swap(__tmp);                }              __catch(...)                {                  for (size_type __bucket = 0; __bucket < __tmp.size();                    ++__bucket)                    {                      while (__tmp[__bucket])                        {                          _Node* __next = __tmp[__bucket]->_M_next;                          _M_delete_node(__tmp[__bucket]);                          __tmp[__bucket] = __next;                        }                    }                  __throw_exception_again;                }            }        }    }

Each newly inserted element is placed before the first node of the linked list.

template<class _Val, class _Key, class _HF, class _Ex, class _Eq, class _All>    pair<typename hashtable<_Val, _Key, _HF, _Ex, _Eq, _All>::iterator, bool>    hashtable<_Val, _Key, _HF, _Ex, _Eq, _All>::    insert_unique_noresize(const value_type& __obj)    {      const size_type __n = _M_bkt_num(__obj);      _Node* __first = _M_buckets[__n];            for (_Node* __cur = __first; __cur; __cur = __cur->_M_next)        if (_M_equals(_M_get_key(__cur->_M_val), _M_get_key(__obj)))          return pair<iterator, bool>(iterator(__cur, this), false);            _Node* __tmp = _M_new_node(__obj);      __tmp->_M_next = __first;      _M_buckets[__n] = __tmp;      ++_M_num_elements;      return pair<iterator, bool>(iterator(__tmp, this), true);    }

Iii. Hash Functions

The hash function is used to calculate the position of an element in the array. m_bkt_num_key encapsulates the hash function and returns the position of the element in the array from the remainder of the array length.

size_type      _M_bkt_num_key(const key_type& __key, size_t __n) const      { return _M_hash(__key) % __n; }

_ M_hash is defined in

inline size_t  __stl_hash_string(const char* __s)  {    unsigned long __h = 0;    for ( ; *__s; ++__s)      __h = 5 * __h + *__s;    return size_t(__h);  }  template<>    struct hash<char*>    {      size_t      operator()(const char* __s) const      { return __stl_hash_string(__s); }    };  template<>    struct hash<const char*>    {      size_t      operator()(const char* __s) const      { return __stl_hash_string(__s); }    };  template<>    struct hash<char>    {       size_t      operator()(char __x) const      { return __x; }    };template<>    struct hash<int>    {       size_t       operator()(int __x) const       { return __x; }    };  template<>    struct hash<unsigned int>    {       size_t      operator()(unsigned int __x) const      { return __x; }    };  template<>    struct hash<long>    {      size_t      operator()(long __x) const      { return __x; }    };……

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.