In-depth Java Collection Learning series: HashMap's implementation principle--turn

Source: Internet
Author: User
Tags volatile concurrentmodificationexception

Original from: Http://www.cnblogs.com/xwdreamer/archive/2012/06/03/2532832.html1. HashMap Overview:

HashMap is a non-synchronous implementation of a hash table-based map interface (Hashtable is like HashMap, the only difference being that the method in Hashtalbe is thread-safe, that is, synchronous). This implementation provides all the optional mapping operations and allows NULL values and NULL keys to be used. This class does not guarantee the order of the mappings, especially because it does not guarantee that the order is constant.

2. HASHMAP Data structure:

In the Java programming language, the most basic structure is two, one is an array, the other is an analog pointer (reference), all the data structures can be constructed with these two basic structure, HASHMAP is no exception. HashMap is actually a data structure of a "list of linked lists," where each element holds an array of chain header nodes , that is, the combination of arrays and linked lists.

As you can see, the bottom of the HashMap is an array structure, and each item in the array is a linked list. When a new HashMap is created, an array is initialized. The source code is as follows:

View Code

As you can see, Entry is the element in the array, and each map.entry is actually a key-value pair, which holds a reference to the next element, which forms the list.

3. HashMap Access implementation:

1) storage:

Public V put (K key, V value) {    //HashMap allows for null key and null value.    //When key is null, the Putfornullkey method is called, and value is placed in the first position of the array.    if (key = = null)        return Putfornullkey (value);    The hash value is recalculated based on the hashcode of the key.    int hash = hash (Key.hashcode ());    Searches for the index in the table that corresponds to the specified hash value.    int i = indexfor (hash, table.length);    If the Entry at the I index is not NULL, the next element of the E element is traversed continuously through the loop. For    (entry<k,v> e = table[i]; E! = null; e = e.next) {        Object K;        if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {            V oldValue = e.value;            E.value = value;            E.recordaccess (this);            return oldValue;        }    }    If the entry at the I index is null, there is no entry here.    //Modcount record the number of times the structure was modified in HashMap    modcount++;    Adds the key, value, to the I index.    addentry (hash, key, value, I);    return null;}

As can be seen from the source code above: When we put the element in the HashMap, we first recalculate the hash value according to the key's hashcode, according to the hash is worth the position of the element in the array (that is, subscript), if the array is already stored in the position of other elements, Then the elements in this position will be stored in the form of a linked list, the newly added ones placed in the chain head, the first to join in the end of the chain. If the array has no elements at that position, the element is placed directly at that position in the array.

  The addentry (hash, key, value, I) method places the Key-value pair at the I index of the array table, based on the calculated hash value. AddEntry is a way for HashMap to provide a package access (that is, there is no public,protected,private to these three access modifiers, which are the default access rights, But the code does not have this default), the code is as follows:

View Code

When the system decides to store the Key-value pair in the HashMap, it does not take into account the value in entry, but only calculates and determines the storage location of each entry based on key. We can take the value of the Map collection as a subsidiary of the key, and when the system determines where the key is stored, value is stored there.

The hash (int h) method recalculates the hash once based on the hashcode of the key. This algorithm joins the high-level calculation to prevent the low-level and high-level changes, resulting in hash collisions.

View Code

We can see that in the HashMap to find an element, we need to base the hash value of the key to obtain the position in the corresponding array. How to calculate this position is the hash algorithm. Previously said HASHMAP data structure is the combination of arrays and linked lists, so we certainly hope that the hashmap inside the element location as far as possible, so that the number of elements in each position is only one , then when we use the hash algorithm to obtain this position, It is immediately possible to know that the corresponding position element is what we want, without having to traverse the linked list , which greatly optimizes the efficiency of the query.

For any given object, the hash code value computed by the program call hash (int h) method is always the same as long as its hashcode () return value is the same. The first thing we think about is the hash value of the array length modulo operation, so that the distribution of elements is relatively uniform. However, the consumption of the modulo operation is still relatively large, as is done in HashMap: Call the indexfor (int h, int length) method to evaluate which index of the table array the object should be stored in. The code for the indexfor (int h, int length) method is as follows:

View Code

This method is very ingenious, it through H & (table.length-1) to get the object's save bit, and hashmap the length of the underlying array is always 2 of the N-square, which is hashmap on the speed of optimization. The following code is in the HashMap constructor:

View Code

This code guarantees that the capacity of HashMap at initialization is always 2 of the N-square, that is, the length of the underlying array is always 2 of the n-th.

  When the length is always 2 of the N-time,h& (length-1) operation is equivalent to the length of the modulo, that is, H%length, but the & ratio% has a higher efficiency.

This looks very simple, actually more mysterious, we give an example to illustrate:

Assuming that the array length is 15 and 16 respectively, the optimized hash code is 8 and 9, then the result of the & operation is as follows:

H & (table.length-1) hash table.length-1

       8 & ( -1):                                   0100                    &               1110                    =                 0100

9 & (1): 0101 & 1110 = 0100

--------------------------------------------------------------------------------------------------------------- --------

       8 & ( -1):                                   0100                    &               1111                    =                 0100

9 & (1): 0101 & 1111 = 0101

--------------------------------------------------------------------------------------------------------------- --------

As can be seen from the above example: when the 8, 92, and (15-1) 2= (1110) perform "and Arithmetic &", they produce the same result, which is 0100, which means they are positioned in the same position in the array, which creates a collision, 8 and 9 will be placed in the same position in the array to form a linked list, then the query will need to traverse the list, get 8 or 9, which reduces the efficiency of the query. At the same time, we can also find that when the array length is 15, the hash value and (15-1) 2= (1110) "and Operation &", then the last one is always 0, and 0001,0011,0101,1001,1011, 0111,1101 These positions can never be stored elements, space waste is quite large, and worse, in this case, the array may use a lot smaller than the array length, which means that further increase the probability of collisions, slowing down the efficiency of the query!

And when the array length is 16 o'clock, that is 2 of the N-side, the 2n-1 gets the binary number of each bit of the value is 1 (such as (24-1) 2=1111), which makes & at low position, and the original hash of the same low, plus the hash (int h) method to further optimize the hashcode of key, adding a high-level calculation, so that only the same hash value of two values will be placed in the same position in the array to form a linked list .

So, when the array length is 2 of the power of n times, the different key is the same probability of the index is smaller, then the data in the array distribution on the more uniform, that is, the probability of collision is small, relative, when the query does not have to traverse a position on the list, so query efficiency is higher.

According to the source code of the Put method above, when the program tries to put a key-value pair into HashMap, the program first determines the storage location of the Entry based on the hashcode () return value of the key: if the Entry () of two hashcode keys The return values are the same, and they are stored in the same location. If these two Entry keys return true by equals, the newly added Entry value overrides the Entry value in the collection, but the key is not overwritten. If these two Entry keys return false by Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain--Specify to continue to see Addent Description of the Ry () method.

2) read:

View Code

With the hash algorithm stored above as the basis, it is easy to understand this code. From the source code above can be seen: from the HashMap get element, first calculate the key hashcode, find the corresponding position in the array of an element, and then through the Equals method of key in the corresponding position of the linked list to find the desired element.

3) summed up simply said, HashMap at the bottom of the key-value as a whole to deal with, this whole is a Entry object. HashMap uses a entry[] array to hold all key-value pairs, and when a Entry object needs to be stored, the hash algorithm is used to determine where it is stored in the array, and where it is stored in the linked list on the array location according to the Equals method When a entry is needed, it is also located in the array based on the hash algorithm, and the entry is removed from the linked list at that location according to the Equals method.

4. HashMap's Resize (rehash):

When there are more and more elements in the HashMap, the probability of hash collisions becomes higher, because the length of the array is fixed. Therefore, in order to improve the efficiency of the query, it is necessary to expand the array of HashMap, array expansion This operation will also appear in the ArrayList, this is a common operation, and after the HashMap array expansion, The most performance-consuming point arises: the data in the original array must recalculate its position in the new array and put it in, which is resize.

So when is the hashmap going to be enlarged? When the number of elements in the HashMap exceeds the array size *loadfactor, the array is expanded, and the default value of Loadfactor is 0.75, which is a compromise value. That is, by default, the array size is 16, so when the number of elements in the HashMap exceeds 16*0.75=12 (the value is the threshold value in the code, also called the threshold value), the size of the array is expanded to 2*16=32, that is, to enlarge by one times , and then recalculate the position of each element in the array, which is a very performance-intensive operation, so if we have predicted the number of elements in the HashMap, then the number of preset elements can effectively improve the performance of HashMap.

The code for HASHMAP expansion is as follows:

View CodePerformance parameters of 5.HashMap:

The HASHMAP contains the following constructors:

    1. HashMap (): Constructs a HashMap with an initial capacity of 16 and a load factor of 0.75.
    2. HashMap (int initialcapacity): Constructs a HashMap with an initial capacity of initialcapacity and a load factor of 0.75.
    3. HashMap (int initialcapacity, float loadfactor): Creates a HashMap with the specified initial capacity, specified load factor.
    4. The HashMap base constructor hashmap (int initialcapacity, float loadfactor) has two parameters, which are the initial capacity initialcapacity and the load factor loadfactor.
    5. The maximum capacity of the initialcapacity:hashmap, which is the length of the underlying array.
    6. Loadfactor: Load factor loadfactor is defined as: the number of actual elements of the hash table (n)/The capacity of the hash table (m).

The load factor measures the degree to which a hash table is used, and the larger the load factor, the greater the filling of the hash table, and the smaller the inverse. For a hash table using the list method, the average time to find an element is O (1+a), so if the load factor is larger, the use of space is more adequate, but the result is a reduction of the search efficiency, if the load factor is too small, then the hash table data will be too sparse, the space caused a serious waste.

In the implementation of HASHMAP, the maximum capacity of the HashMap is judged by the threshold field:

threshold = (int) (capacity * loadfactor);  

Based on the definition of load factor, threshold is the maximum number of elements allowed under this loadfactor and capacity, and re-resize to reduce the actual load factor (i.e., although the array length is capacity, But the critical value of its expansion is threshold). The default load factor of 0.75 is a balanced selection of space and time efficiency. When the capacity exceeds this maximum capacity, the HashMap capacity after resize is twice times the capacity:

if (size++ >= threshold)       
6.fail-fast mechanism:

We know that JAVA.UTIL.HASHMAP is not thread-safe, so if there are other threads modifying the map during the use of iterators, concurrentmodificationexceptionwill be thrown, This is the so-called Fail-fast strategy. ( This is also mentioned in the core Java book.) )

The implementation of this strategy in the source code is through the Modcount domain, modcount as thename implies is the number of changes , HashMap content will increase the value of the change, This value is then assigned to the iterator's expectedmodcount during the iterator initialization.

View Code

In the iterative process, determine whether modcount is equal to Expectedmodcount, and if not equal, indicates that another thread has modified the map:

Note that Modcount is declared volatile, which guarantees the visibility of changes between threads. (volatile is thread-safe because volatile-modified variables do not save the cache and are modified directly in memory, so that the visibility of changes between threads is guaranteed).

Final entry<k,v> NextEntry () {       if (modcount! = expectedmodcount)           throw new Concurrentmodificationexception ();

In the HashMap API, it is noted that:

The iterator returned by the collection view method of all the HashMap classes is a quick failure: After the iterator is created, if the mappings are modified from the structure, except through the Remove method of the iterator itself, any other modification at any time, Iterators will throw concurrentmodificationexception. Therefore, in the face of concurrent modifications, the iterator will soon fail completely, without guaranteeing the risk of arbitrary and indeterminate behavior at a future indeterminate time.

Note that the fast failure behavior of iterators is not guaranteed and, in general, there is no firm guarantee that there is an asynchronous concurrent modification. The fast-failing iterator does its best to throw concurrentmodificationexception. Therefore, it is wrong to write a program that relies on this exception, and the correct approach is that the fast failure behavior of the iterator should only be used to detect program errors.

Resources:

JDK API HashMap

HashMap Source Code

Deep understanding of HashMap

Research on Hash storage mechanism by analyzing JDK source code

Brief analysis of Java.util.HashMap source code points

In-depth Java Collection Learning Series: HASHMAP Implementation principle--go

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.