-Hash_map Summary

Source: Internet
Author: User

0 Why hash_map

Map provides a very common function to achieve the use of a red/black tree, high efficiency, 1 million records, up to 20 comparisons, you can find the record you are looking; only 21 comparisons are required for 2 million records.

Has_map only needs to be compared once or twice. Although hash_map is not currently included in the C ++ standard template library, almost every version of STL provides the corresponding implementation. It is also widely used.

 

1. Data Structure: hash_map Principle

Hash_map is based on the hash table ). The biggest advantage of a hash table is that it takes much time to store and search data, which can almost be regarded as a constant time. The cost is only a large amount of memory consumption.

The basic principle is to use an array with a large subscript range to store elements. You can design a function (a hash function, also called a hash function) so that the keywords of each element correspond to a function value (that is, an array subscript and a hash value, therefore, this array unit is used to store this element. It can also be simply understood as "classification" for each element based on the keyword ", this element is then stored in the bucket corresponding to the corresponding "class.

However, it is not guaranteed that the keywords of each element correspond to the function values one by one. Therefore, it is very likely that the same function value is calculated for different elements, in this way, "Conflict" occurs. In other words, different elements are divided into the same "class. In general, "direct addressing" and "resolving conflicts" are two major features of a hash table.

Hash_map first allocates a large block of memory to form many buckets. The hash function maps keys to different regions (buckets) for saving. The insert process is as follows:

      1. Get key
      2. Use the hash function to obtain the hash value.
      3. Get the bucket number (generally all evaluate the modulo of the bucket number for the hash value)
      4. Store the key and value in the bucket.

    The value process is:

        1. Get key
        2. Use the hash function to obtain the hash value.
        3. Get the bucket number (generally all evaluate the modulo of the bucket number for the hash value)
        4. Compare whether the internal elements of the bucket are equal to the key. If none of them are equal, no results are found.
        5. Obtain the value of the same record.

      In hash_map, the direct address is generated using the hash function to resolve the conflict, and the comparison function is used to solve the problem. It can be seen that if each bucket has only one element, only one comparison is performed during search. When there are no values in many buckets, many queries will be faster (when not found ).

      It can be seen that the implementation of the hash table is related to the user: Hash function and comparison function. These two parameters are exactly the ones we need to specify when using hash_map.

       

      2 Use hash_map

      2.1 A simple instance

      # Include # Include <string>
      Using namespace STD;
      Int main (){
      Hash_map <int, string> mymap;
      Mymap [9527] = "Tang bohu: Qiu Xiang ";
      Mymap [1000000] = "Life of a millionaire ";
      Mymap [10000] = "white-collar salary bottom line ";
      ...
      If (mymap. Find (10000 )! = Mymap. End ()){
      ...
      }

      Same as map. When you do not specify a hash function or a comparison function, you will have a default function. Let's look at the Declaration of hash_map, and you will understand it better. The declaration of sgi stl is as follows:

       

      Template <class _ key, class _ TP, class _ hashfcn = hash <_ key>,
      Class _ primary key = required _to <_ key>,
      Class _ alloc = _ stl_default_allocator (_ TP)>
      Class hash_map
      {
      ...
      }

       

      2.2 Hash Functions of hash_map

       

      Struct hash <int> {
      Size_t operator () (INT _ x) const {return _ x ;}
      };

      It turns out to be a function object. In sgi stl, the following hash functions are provided:

       

      Struct hash <char *>
      Struct hash <const char *>
      Struct hash <char>
      Struct hash <unsigned char>
      Struct hash <signed Char>
      Struct hash <short>
      Struct hash <unsigned short>
      Struct hash <int>
      Struct hash <unsigned int>
      Struct hash <long>
      Struct hash <unsigned long>

      That is to say, if your key is one of the above types, you can use the default hash function. Of course, you can also define your own hash function. You can only do this for custom variables. For example, for strings, you must customize the hash function. For example:

       

      Struct str_hash {
      Size_t operator () (const string & Str) const
      {
      Unsigned long _ H = 0;
      For (size_t I = 0; I <Str. Size (); I ++)
      _ H = 5 * _ H + STR [I];
      Return size_t (_ H );
      }
      };
      // If you want to use the system-defined string hash function, you can write as follows:
      Struct str_hash {
      Size_t operator () (const string & Str) const
      {
      Return _ stl_hash_string (Str. c_str ());
      }
      };

      Pay attention to the following points when declaring your own hash function:

          1. Use struct, and then reload operator ().
          2. The returned value is size_t.
          3. The parameter is the type of the key you want to hash.
          4. The function is of the const type.

         

        2.3 comparison functions of hash_map

        For comparison functions in map, the less function is required. If not, the default value is less <key>. In hash_map, You need to compare whether the data in the bucket is equal to the key, so you need a function that is equal to or greater than _to <key>. Let's take a look at the mongo_to source code:

         

        // BookCodeFrom SGI STL
        // First, let's look at the binary_function Declaration. In fact, it just defines some types.
        Template <class _ arg1, class _ arg2, class _ result>
        Struct binary_function {
        Typedef _ arg1 first_argument_type;
        Typedef _ arg2 second_argument_type;
        Typedef _ result result_type;
        };
        // See the definition of performance_to:
        Template <class _ TP>
        Struct functions _to: Public binary_function <_ TP, _ TP, bool>
        {
        Bool operator () (const _ TP & _ x, const _ TP & _ y) const {return _ x = _ y ;}
        };

        If you use a custom data type, such as struct mystruct, or const char *, how do you use a comparison function? There are two methods to use the comparison function. The first method is to use the = operator with the overload and use the equal_to; operator. Let's look at the example below:

         

        Struct mystruct {
        Int IID;
        Int Len;
        Bool operator = (const mystruct & my) const {
        Return (IID = My. IID) & (LEN = My. Len );
        }
        };

        In this way, you can use performance_to <mystruct> as a comparison function. Another method is to use function objects. Customize a comparison function body:

         

        Struct compare_str {
        Bool operator () (const char * P1, const char * P2) const {
        Return strcmp (P1, P2) = 0;
        }
        };

        With compare_str, you can use hash_map.

         

        Typedef hash_map <const char *, String, hash <const char *>, compare_str> strintmap;
        Strintmap namemap;

        2.4 hash_map Function

        Hash_map functions are similar to map functions. The parameters and explanations of specific functions are described here.

            1. Hash_map (size_type N) This parameter must be set for efficiency purposes. N is mainly used to set the number of hash buckets in the hash_map container. The larger the number of buckets, the smaller the probability of hash function conflict, and the smaller the probability of re-applying for memory. The greater N, the higher the efficiency, but the larger the memory consumption.
            2. Const_iterator find (const key_type & K) Const. use search. The input is the key value and the return is the iterator.
            3. Data_type & operator [] (const key_type & K). This is my most commonly used function. Because of its convenience, it can be used as an array. However, when you use the [Key] operator, if there is no key element in the container, it is equivalent to automatically adding a key element. So when you just want to know whether the container has a key element, you can use find. If you want to insert this element, you can directly use the [] operator.
            4. Insert function. When the container does not contain the key value, the insert function is similar to the [] operator. However, when there are more and more elements in the container, the elements in each bucket will increase. To ensure efficiency, hash_map will automatically apply for a larger memory to generate more buckets. Therefore, after insert, the previous iterator may be unavailable.
            5. Erase function. During the Insert Process, when there are too many elements in each bucket, hash_map may automatically expand the memory of the container. However, in sgi stl, erase does not automatically recycle the memory. Therefore, after you call erase, The iterator of other elements is still available.

           

          3 related hash containers

          In addition to hash_map, hash containers also include hash_set, hash_multimap, and has_multiset. These containers are used in the same way as set, multimap, and Multiset.

           

          4 others

          It is helpful for you to understand and use hash_map.

          4.1 what is the difference between hash_map and map?

                • Constructor. Hash_map requires the hash function, which is equal to the function; map only needs the comparison function (less than the function ).
                • Storage Structure. Hash_map uses hash table storage, and map generally uses the red/black tree (rb tree. Therefore, the memory data structure is different.

           

          4.2 When do I need hash_map and map?

          In general, the search speed of hash_map is faster than that of map, and the search speed is basically independent of the data size, which belongs to the constant level, while the search speed of map is log (n) level. Not necessarily, constants are smaller than log (n), and the time consumption of hash functions is also time-consuming. See, if you consider efficiency, especially when the number of elements reaches a certain order of magnitude, consider hash_map. However, if you are strict with the memory usageProgramBe sure to consume as little memory as possible. hash_map may embarrass you, especially when there are too many hash_map objects, you will not be able to control them anymore, in addition, the construction speed of hash_map is slow.

          Do you know how to choose? Weigh three factors: search speed, data volume, and memory usage.

           

          4.3 how to add a custom type to hash_map?

          You only need to do two things: Define the hash function and define the comparison function. The following code is an example:

           

          -Bash-2.05b $ cat my. cpp
          # Include # Include <string>
          # Include <iostream>

          Using namespace STD;
          // Define the class
          Class classa {
          Public:
          Classa (int A): c_a (){}
          Int getvalue () const {return c_a ;}
          Void setvalue (int A) {c_a ;}
          PRIVATE:
          Int c_a;
          };

          // 1 Define the hash function
          Struct hash_a {
          Size_t operator () (const class classa & A) const {
          // Return hash <int> (classa. getvalue ());
          Return A. getvalue ();
          }
          };

          // 2 define the equal Function
          Struct defaults _a {
          Bool operator () (const class classa & A1, const class classa & A2) const {
          Return a1.getvalue () = a2.getvalue ();
          }
          };

          Int main ()
          {
          Hash_map <classa, String, hash_a, equal_a> hmap;
          Classa A1 (12 );
          Hmap [a1] = "I am 12 ";
          Classa A2 (198877 );
          Hmap [a2] = "I am 198877 ";

          Cout Cout Return 0;
          }
          Bash-2.05b $ make my
          C ++-o-pipe-March = pentiumpro my. cpp-O my
          Bash-2.05b $./My
          I am 12
          I am 198877

           

          4.4 why is hash_map not standard?

          I do not know why it is not a standard. I have explained that when STL is added to the Standard C ++, The hash_map series were not fully implemented yet and will become a standard in the future. If anyone knows a more reasonable explanation, they also want to tell me. But what I want to express is that it is precisely because hash_map is not a standard, so g ++ compiler is installed on many platforms and does not necessarily implement hash_map. This is an example. Therefore, you must test these non-standard libraries in advance. In addition, it is better to take platform migration into account and use less.

           

          4.5 comparison of performance with map

          When the key is int, the insertion and query performance is as follows (x-axis represents the data size, Y-axis represents the time, in microseconds)

           

          When the key is string, the insertion and query performance is as follows:

           

          Conclusion: when the data size is small (about 100), the time efficiency of hash_map and map is almost the same. As the data size increases, the gap between the two is also growing, especially when the data volume reaches the order of millions, the performance gap between the two is highlighted.

          Contact Us

          The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

          If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

          A Free Trial That Lets You Build Big!

          Start building with 50+ products and up to 12 months usage for Elastic Compute Service

          • Sales Support

            1 on 1 presale consultation

          • After-Sales Support

            24/7 Technical Support 6 Free Tickets per Quarter Faster Response

          • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.