hash algorithm (HashMap implementation principle)

Source: Internet
Author: User

Hash, the general translation to do "hash", there is a direct transliteration of "hash", is the arbitrary length of the input (also known as pre-mapping, pre-image), through the hash algorithm, transformed into a fixed-length output, the output is the hash value . This conversion is a compression map, that is, the space of the hash value is usually much smaller than the input space, the different inputs may be hashed to the same output, but not from the hash value to uniquely determine the input value. Simply, a function that compresses messages of any length to a message digest of a fixed length.

Hash is mainly used in the field of information security encryption algorithm, it has a number of different lengths of information into a cluttered 128-bit encoding, these coded values are called hash values. It can also be said that the hash is to find a data content and data storage address mapping between

For example, the hash algorithm for string hello

char* value = "Hello"; int key = (((((27* (int) ' H ' +27) * (int) ' E ') + +) * (int) ' l ') + +) * (int) ' L ' +27) * +) + (int) ' O ';.

The characteristics of the array are: easy addressing, insertion and deletion difficulties, and the list is characterized by: difficult to address, insert and delete easy. Can we combine the characteristics of both to make a data structure that is easy to address, insert and delete? The answer is yes, this is the hash table we are going to mention, the hash table has a number of different implementations, and I will explain the most common method-the Zipper method, which we can understand as "array of linked lists",


HashMap is actually a linear array, so it can be understood that the container where the data is stored is a linear array. This may be confusing to us, how does a linear array implement key-value pairs to access data? Here HashMap has to do some processing.

1. First HashMap inside the implementation of a static internal class entry its important properties are key, value, next, from the property key,value we can clearly see entry is the HashMap key value of the implementation of a basic bean, What we said above is that the basis of hashmap is a linear array, which is the contents of Entry[],map are stored in entry[].

2. Since it is a linear array, why is random access possible? Here HashMap uses a small algorithm, which is generally implemented as follows:

Java code
    1. When storing:
    2. int hash = Key.hashcode ();--> This hashcode method is not detailed here, as long as understanding the hash of each key is a fixed int value
    3. int index = hash% Entry[].length;
    4. Entry[index] = value;
    5. When values are taken:
    6. int hash = Key.hashcode ();
    7. int index = hash% Entry[].length;
    8. return Entry[index]

Here we easily understand the basic principle of hashmap access through key-value pairs.

3. Question: If two key through the hash% entry[].length to get the same index, will there be a risk of coverage?

Here HashMap uses a concept of the chain data structure. We mentioned above that there is a next property in the entry class that refers to a entry down. For example, the first key value to a comes in, by calculating the hash of its key to get the index=0, remember to do: entry[0] = A. After a while, a key value pair B, by calculating its index is also equal to 0, what to do now? HashMap will do this: B.next = a,entry[0] = B, if it comes in again C,index is equal to 0, then C.next = b,entry[0] = C; So we find that the index=0 place actually accesses the A,b,c three key-value pairs, They are linked by the next attribute. So don't worry about it.

So far, the general realization of HASHMAP, we should have been clear.

Of course HashMap also contains some optimization aspects of the implementation, here also wordy.

For example: entry[] The length of a certain, with the map inside the data more and more long, so that the same index chain will be very long, will affect performance?

HashMap inside a factor (also called a factor), as the size of the map becomes larger, entry[] will be extended with a certain length of rules.

Solution to hash conflict

1) Open addressing method (linear detection re-hash, two-time detection and re-hash, pseudo-random detection and hashing)

2) Re-hash method

3) Chain Address method

4) Create a public overflow area

The solution to HashMap in Java is to use the chain address method

hash algorithm (HashMap implementation principle)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.