Analysis of the implementation principle of HashMap in Java

Source: Internet
Author: User

This article brings the content is about Java in the implementation of the principle of hashmap, there is a certain reference value, there is a need for friends to refer to, I hope you have some help.

1. HashMap Overview:
HashMap is a non-synchronous implementation of a hash table-based map interface. This implementation provides all the optional mapping operations and allows NULL values and NULL keys to be used. This class does not guarantee the order of the mappings, especially because it does not guarantee that the order is constant.
2. HASHMAP Data structure:
In the Java programming language, the most basic structure is two, one is an array, the other is an analog pointer (reference), all the data structures can be constructed with these two basic structure, HASHMAP is no exception. HashMap is actually a "chain-table hash" of the data structure, that is, the combination of arrays and linked lists.
As you can see, the bottom of the HashMap is an array structure, and each item in the array is a linked list. When a new HashMap is created, an array is initialized.

/**  * The table, resized as necessary. Length must always be a power of.  */  transient entry[] table;  Static Class Entry<k,v> implements Map.entry<k,v> {      final K key;      V value;      Entry<k,v> Next;      final int hash;      ...  }

3. HashMap Access implementation:
1) Storage:

public V put (K key, V value) {//HashMap allows null keys and null values to be stored.      When key is null, the Putfornullkey method is called, and value is placed in the first position of the array.      if (key = = null) return Putfornullkey (value);      The hash value is recalculated based on the keycode of the key.      int hash = hash (Key.hashcode ());      Searches for the index of the specified hash value in the corresponding table.      int i = indexfor (hash, table.length);      If the Entry at the I index is not NULL, the next element of the E element is traversed continuously through the loop.          for (entry<k,v> e = table[i]; E! = null; e = e.next) {Object K;              if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {V oldValue = E.value;              E.value = value;              E.recordaccess (this);          return oldValue;      }}//If the entry at the I index is null, there is no entry.      modcount++;      Adds the key, value, to the I index.      AddEntry (hash, key, value, I);  return null; }

As can be seen from the source code above: When we put the element in the HashMap, we first recalculate the hash value according to the key's hashcode, according to the hash is worth the position of the element in the array (that is, subscript), if the array is already stored in the position of other elements, Then the elements in this position will be stored in the form of a linked list, the newly added ones placed in the chain head, the first to join in the end of the chain. If the array has no elements at that position, the element is placed directly at that position in the array.
The AddEntry (hash, key, value, I) method places the Key-value pair at the I index of the array table, based on the calculated hash value. AddEntry is a way for HASHMAP to provide a package access to the code as follows:

void AddEntry (int hash, K key, V value, int bucketindex) {      //Get Entry       entry<k,v> e = tab at the specified Bucketindex index Le[bucketindex];      Place the newly created Entry into the Bucketindex index and let the new Entry point to the original Entry      Table[bucketindex] = new entry<k,v> (hash, key, value, E );      If the number of key-value pairs in the Map exceeds the limit      if (size++ >= threshold)      //extends the length of the table object to twice times the original.          Resize (2 * table.length);  }

When the system decides to store the Key-value pair in the HashMap, it does not take into account the value in entry, but only calculates and determines the storage location of each entry based on key. We can take the value of the Map collection as a subsidiary of the key, and when the system determines where the key is stored, value is stored there.

The hash (int h) method recalculates the hash once based on the hashcode of the key. This algorithm joins the high-level calculation to prevent the low-level and high-level changes, resulting in hash collisions.

static int hash (int h) {      h ^= (H >>> a) ^ (h >>> N);      Return h ^ (H >>> 7) ^ (H >>> 4);  }

We can see that in the HashMap to find an element, we need to base the hash value of the key to obtain the position in the corresponding array. How to calculate this position is the hash algorithm. Previously said HASHMAP data structure is the combination of arrays and linked lists, so we certainly hope that the hashmap inside the element location as far as possible, so that the number of elements in each position is only one, then when we use the hash algorithm to obtain this position, It is immediately possible to know that the corresponding position element is what we want, without having to traverse the linked list, which greatly optimizes the efficiency of the query.

For any given object, the hash code value computed by the program call hash (int h) method is always the same as long as its hashcode () return value is the same. The first thing we think about is the hash value of the array length modulo operation, so that the distribution of elements is relatively uniform. However, the consumption of the modulo operation is still relatively large, as is done in HashMap: Call the indexfor (int h, int length) method to evaluate which index of the table array the object should be stored in. The code for the indexfor (int h, int length) method is as follows:

static int indexfor (int h, int length) {      return H & (length-1);  }

This method is very ingenious, it through H & (table.length-1) to get the object's save bit, and hashmap the length of the underlying array is always 2 of the N-square, which is hashmap on the speed of optimization. The following code is in the HashMap constructor:

int capacity = 1;      while (capacity < initialcapacity)          capacity <<= 1;

This code guarantees that the capacity of HashMap at initialization is always 2 of the N-square, that is, the length of the underlying array is always 2 of the n-th.
When the length is always 2 of the N-time,h& (length-1) operation is equivalent to the length of the modulo, that is, H%length, but the & ratio% has a higher efficiency.
When the array length is 2 of the power of n times, the different key is the same as the probability of the same index is smaller, then the data in the array distribution is more uniform, that is, the probability of collision is small, relative, when the query does not have to traverse a position of the linked list, so query efficiency is higher.

According to the source code of the Put method above, when the program tries to put a key-value pair into HashMap, the program first determines the storage location of the Entry based on the hashcode () return value of the key: if the Entry of two hashcode keys ( The return values are the same, and they are stored in the same location. If these two Entry keys return true by equals, the newly added Entry value overrides the Entry value in the collection, but the key is not overwritten. If these two Entry keys return false by Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain--Specify to continue to see AddEntry () Description of the method.
(2) Read

Public V get (Object key) {      if (key = = null)          return Getfornullkey ();      int hash = hash (Key.hashcode ());      for (entry<k,v> e = table[indexfor (hash, table.length)];          E! = null;          E = e.next) {          Object k;          if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))              return e.value;      return null;  }

With the hash algorithm stored above as the basis, it is easy to understand this code. From the source code above can be seen: from the HashMap get element, first calculate the key hashcode, find the corresponding position in the array of an element, and then through the Equals method of key in the corresponding position of the linked list to find the desired element.

3) To sum up simply, HashMap at the bottom of the key-value as a whole to deal with, this whole is a Entry object. HashMap uses a entry[] array to hold all key-value pairs, and when a Entry object needs to be stored, the hash algorithm is used to determine where it is stored in the array, and where it is stored in the linked list on the array location according to the Equals method When a entry is needed, it is also located in the array based on the hash algorithm, and the entry is removed from the linked list at that location according to the Equals method.
4. HashMap Resize (rehash):
When there are more and more elements in the HashMap, the probability of hash collisions becomes higher because the length of the array is fixed. Therefore, in order to improve the efficiency of the query, it is necessary to expand the array of HashMap, array expansion This operation will also appear in the ArrayList, this is a common operation, and after the HashMap array expansion, The most performance-consuming point arises: the data in the original array must recalculate its position in the new array and put it in, which is resize.

So when will the HashMap be enlarged? When the number of elements in the HashMap exceeds the array size *loadfactor, the array is expanded, and the default value of Loadfactor is 0.75, which is a compromise value. That is, by default, the array size is 16, so when the number of elements in the HashMap exceeds 16*0.75=12, the size of the array is expanded to 2*16=32, that is, it expands by one time, and then recalculates the position of each element in the array, which is a very performance-intensive operation, So if we have predicted the number of elements in HashMap, then the number of preset elements can effectively improve the performance of HashMap.
5. HashMap Performance Parameters:
HashMap contains the following constructors:
HashMap (): Build a HashMap with an initial capacity of 16 and a load factor of 0.75.
HashMap (int initialcapacity): Constructs a HashMap with an initial capacity of initialcapacity and a load factor of 0.75.
HashMap (int initialcapacity, float loadfactor): Creates a HashMap with the specified initial capacity, specified load factor. The base constructor for the
HashMap hashmap (int initialcapacity, float loadfactor) has two parameters, which are the initial capacity initialcapacity and the load factor loadfactor. The maximum capacity of the
Initialcapacity:hashmap, which is the length of the underlying array.
Loadfactor: Load factor loadfactor is defined as: the number of actual elements of the hash table (n)/The capacity of the hash table (m). The
load factor measures how much space is used for a hash table, and the larger the load factor, the greater the filling of the hash table, and the smaller the inverse. For a hash table using the list method, the average time to find an element is O (1+a), so if the load factor is larger, the use of space is more adequate, but the result is a reduction of the search efficiency, if the load factor is too small, then the hash table data will be too sparse, the space caused a serious waste. In the implementation of the
HashMap, the maximum capacity of the HashMap is judged by the threshold field:

threshold = (int) (capacity * loadfactor);

According to the definition formula of load factor, it is known that threshold is the maximum number of elements allowed under this loadfactor and capacity, and the number is resize again to reduce the actual load factor. The default load factor of 0.75 is a balanced selection of space and time efficiency. When the capacity exceeds this maximum capacity, the HashMap capacity after resize is twice times the capacity:

if (size++ >= threshold)         Resize (2 * table.length);

6. Fail-fast mechanism:
We know that JAVA.UTIL.HASHMAP is not thread-safe, so if there are other threads modifying the map during the use of iterators, then Concurrentmodificationexception is thrown, which is called Fail-fast policy.

The implementation of this strategy in the source code is through the Modcount domain, modcount as the name implies is the number of changes, the HashMap content will increase this value, then in the iterator initialization process will assign this value to the expectedmodcount of the iterator.

Hashiterator () {      expectedmodcount = Modcount;      if (Size > 0) {//advance to first entry      entry[] t = table;      while (Index < t.length && (next = t[index++]) = = null)          ;}  }

In the iterative process, determine whether modcount is equal to Expectedmodcount, and if not equal, indicates that another thread has modified the map:
Note that Modcount is declared volatile, which guarantees the visibility of changes between threads.

Final entry<k,v> NextEntry () {         if (modcount! = expectedmodcount)             throw new Concurrentmodificationexception ();

In the HashMap API, it is noted that:

The iterator returned by the collection view method of all the HashMap classes is a quick failure: After the iterator is created, if the mapping is modified from the structure, the iterator will throw unless modified by the Remove method of the iterator itself, any other time, in any way. Concurrentmodificationexception. Therefore, in the face of concurrent modifications, the iterator will soon fail completely without risking any uncertain behavior at any time in the future.

Note that the fast failure behavior of iterators is not guaranteed and, in general, there is no firm guarantee that there is an asynchronous concurrent modification. The fast-failing iterator does its best to throw concurrentmodificationexception. Therefore, it is wrong to write a program that relies on this exception, and the correct approach is that the fast failure behavior of the iterator should only be used to detect program errors.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: