Deep analysis of Java HASHMAP implementation principle _java

Source: Internet
Author: User
Tags array length data structures hash concurrentmodificationexception

Mark, at the same time can be a good combination of hashcode () and the Equals () method, the best way to overwrite the Equals method Hashcode (), to ensure that equals two objects, hashcode also equal, in turn: Hashcode () Unequal, Must be able to introduce equals () also ranged from hashcode () equal, Equals () may or may not be equal.

Because HashMap in get, first compare hashcode, then compare Equals,hashcode==&&equals, both are true, then think the same key

1. HashMap Overview:

HashMap is an asynchronous implementation based on the map interface of a hash table. This implementation provides all the optional mapping operations and allows NULL values and NULL keys to be used. This class does not guarantee the order of mappings, especially if it does not guarantee that the order is immutable.

2. HASHMAP Data structure:

In the Java programming language, the most basic structure is two kinds, one is an array, the other is an analog pointer (reference), all data structures can be constructed with these two basic structures, HashMap is no exception. HashMap is actually a "linked list hash" of the data structure, that is, the combination of arrays and lists.

As can be seen from the above figure, the HashMap bottom is an array structure, and each item in the array is a linked list. When a new HashMap is created, an array is initialized.

The source code is as follows:

/** 
 * The table, resized as necessary. Length must Always is a power of two. 
 * * 
transient entry[] table; 
Static Class Entry<k,v> implements Map.entry<k,v> { 
  final K key; 
  V value; 
  Entry<k,v> Next; 
  final int hash; 
  ... 
}

As you can see, Entry is an array of elements, each map.entry is actually a key-value pair, it holds a reference to the next element, which constitutes a linked list.

3. HashMap Access implementation:

1) Storage:

 "public V" (K key, V value) {//HashMap allows null keys and null values to be stored. 
  When key is null, the Putfornullkey method is called, and value is placed in the first position of the array. 
  if (key = = null) return Putfornullkey (value); 
  Recalculate the hash value based on the keycode of the key. 
  int hash = hash (Key.hashcode ()); 
  Searches for the index of the specified hash value in the corresponding table. 
  int i = indexfor (hash, table.length); 
  If the Entry at the I index is not NULL, iterate through the next element of the E element continuously. 
    for (entry<k,v> e = table[i]; e!= null; e = e.next) {Object K; 
      if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))) {V oldValue = E.value; 
      E.value = value; 
      E.recordaccess (this); 
    return oldValue; 
  }//If the entry at the I index is NULL, indicating that there is no entry here. 
  modcount++; 
  Adds key and value to the I index. 
  AddEntry (hash, key, value, I); 
return null; } 

From the above source code can be seen: when we put elements in the HashMap, first based on the key Hashcode recalculate hash value, according to the hash is worth to the element in the array position (that is, subscript), if the array that position has been stored in the other elements, Then the elements in this position will be stored in the form of a linked list, and the new ones placed in the chain, the first to be added at the end of the chain. If the array does not have an element at that location, it is placed directly in that position in the array.

The AddEntry (hash, key, value, I) method places the Key-value pair at the I index of the array table according to the computed hash value. AddEntry is a way for HASHMAP to provide a package access, and the code is as follows:

void AddEntry (int hash, K key, V value, int bucketindex) { 
  //Get Entry  
  entry<k,v> e = tab at the specified Bucketindex index Le[bucketindex]; 
  Place the newly created Entry into the Bucketindex index, and let the new Entry point to the original Entry 
  Table[bucketindex] = to new entry<k,v> (hash, key, value, E ); 
  If the number of key-value pairs in the Map exceeds the limit if 
  (size++ >= threshold) 
  //The length of the table object is expanded to twice times the original. 
    Resize (2 * table.length); 

When the system determines the key-value pair in the storage HashMap, the value in entry is not considered, and only the storage location of each entry is computed and determined according to the key. We can completely treat the value of the MAP set as the key, and when the system determines where the key is stored, the value is stored there.

The hash (int h) method recalculates the hash once based on the hashcode of the key. This algorithm adds a high level calculation to prevent the hash conflict caused by low level constant and high change.

static int hash (int h) { 
  h ^= (H >>>) ^ (h >>>); 
  Return h ^ (H >>> 7) ^ (H >>> 4); 

We can see that to find an element in HashMap, we need to get the position of the corresponding array according to the hash value of the key. How to calculate this position is the hash algorithm. Previously said HashMap's data structure is the combination of array and linked list, so we certainly hope that the element position in this hashmap as far as possible evenly distributed, as far as possible the number of elements in each position is only one, then when we use the hash algorithm to obtain this position, Immediately can know that the corresponding position of the element is what we want, and do not have to go through the linked list, which greatly optimizes the efficiency of the query.

For any given object, as long as its hashcode () return value is the same, the hash code value computed by the program call hash (int h) method is always the same. The first thing we think of is to take the hash value of the array length modulo operation, so that the distribution of elements is relatively uniform. However, the consumption of the "modulo" operation is relatively large, as is done in HashMap: Call the indexfor (int h, int length) method to calculate which index the object should be saved at the table array.

The code for the indexfor (int h, int length) method is as follows:

static int indexfor (int h, int length) {return 
  H & (length-1); 

This method is very ingenious, it through H & (table.length-1) to get the object's save bit, and hashmap the length of the underlying array is always 2 n times, this is the HashMap speed optimization. The following code is available in the HashMap constructor:

int capacity = 1; 
  while (Capacity < initialcapacity) 
    

This code guarantees that when initialized, the capacity of the HashMap is always 2 n-th, that is, the length of the underlying array is always 2 n Times Square.

The,h& (length-1) operation is equivalent to the length modulo, which is h%length, but the & ratio is more efficient when the N-second square of length is always 2.

This looks very simple, in fact, more mysterious, we give an example to illustrate:

Assuming that the array lengths are 15 and 16 respectively, and the optimized hash code is 8 and 9 respectively, then the result of the & operation is as follows:

H & (table.length-1) hash table.length-1
8 & (15-1): 0100 & 1110 = 0100
9 & (15-1): 0101 & 1110 = 0100
--------------------------------------------------------------------------------------------------------------- --------
8 & (16-1): 0100 & 1111 = 0100
9 & (16-1): 0101 & 1111 = 0101

As you can see from the above example: when they are 15-1 (1110) "and", produce the same result, that is, they will be positioned in the same position in the array, which produces collisions, 8 and 9 will be placed in the same position in the array to form a linked list, then the query will need to traverse the list , get 8 or 9, which reduces the efficiency of the query. At the same time, we can also find that when the length of the array is 15, the hash value and 15-1 (1110) "and", then the last one is always 0, and 0001,0011,0101,1001,1011,0111,1101 these positions will never be able to store elements, The space waste is quite large, and worse, in this case, the array can be used in a much smaller position than the length of the array, which means further increase the chance of collisions and slow down the efficiency of the query! And when the array length is 16 o'clock, that is 2 of n times, the binary number of 2n-1 obtained by the value of each bit is 1, which makes the lower &, the same as the original hash, and the hash (int h) method for key hashcode further optimization, Adding a high-order calculation allows only two values of the same hash value to be placed in the same position in the array to form a linked list.

So, when the length of the array is 2 n times, different key is the same probability of the index is small, then the data distributed on the array is more uniform, that is, the collision probability is small, relative, the query is not to traverse a certain position of the linked list, so the query efficiency is higher.

According to the source code of the Put method above, when the program attempts to place a key-value pair into the HashMap, the program first determines where the Entry is stored based on the hashcode () return value of the key: if the Entry of two hashcode key ( Returns the same value, they are stored in the same location. If the key of these two Entry returns true through Equals, the value of the newly added Entry will overwrite the value of Entry in the collection, but the key will not overwrite. If these two Entry keys return false through Equals, the newly added Entry will form a Entry chain with Entry in the set, and the newly added Entry is located in the head of the Entry chain-specific instructions continue to see AddEntry () Description of the method.

2) Read:

Public V get (Object key) { 
  if (key = null) return 
    getfornullkey (); 
  int hash = hash (Key.hashcode ()); 
  for (entry<k,v> e = table[indexfor (hash, table.length)]; 
    e!= null; 
    E = e.next) { 
    Object k; 
    if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k)) return 
      E.value 
  ; 
  return null; 
}

With the above stored hash algorithm as the basis, the understanding of this code is very easy. From the above source code can be seen: from the HashMap get elements, first calculate the key hashcode, find the corresponding position in the array of elements, and then through the key of the Equals method in the corresponding location of the linked list to find the necessary elements.

3) To sum up simply, HashMap at the bottom of the key-value as a whole to deal with, this whole is a Entry object. The HashMap bottom uses a entry[] array to hold all the key-value pairs, and when a Entry object needs to be stored, the hash algorithm is used to determine where it is stored in the array, and where it is stored in the list at the array location based on the Equals method When a entry is needed, it also finds its storage location in the array according to the hash algorithm, and then removes the entry from the list in that location according to the Equals method.

4. Resize of HashMap (rehash):

When the elements in the HashMap are more and more, the probability of hash conflict is increasing, because the length of the array is fixed. Therefore, in order to improve the efficiency of the query, it is necessary to expand the array of HashMap, array expansion of the operation will also appear in the ArrayList, this is a common operation, and after the HashMap array expansion, The most performance-consuming point appears: the data in the original array must recalculate its position in the new array and put it in, which is resize.
So when will hashmap be enlarged? When the number of elements in the HashMap exceeds the array size *loadfactor, the array expands, and the default value of Loadfactor is 0.75, which is a compromise value. That is, by default, the array size is 16, so when the number of elements in the HashMap exceeds the 16*0.75=12, the size of the array is expanded to 2*16=32, that is, to expand by one time, and then recalculate the position of each element in the array, which is a very performance-consuming operation, So if we have predicted the number of elements in HashMap, then the number of preset elements can effectively improve the performance of HashMap.

5. Performance parameters of HashMap:

The HASHMAP contains several constructors as follows:

HashMap (): Build a HashMap with an initial capacity of 16 and a load factor of 0.75.

HashMap (int initialcapacity): Constructs a HashMap with an initial capacity of initialcapacity and a load factor of 0.75.

HashMap (int initialcapacity, float loadfactor): Creates a HashMap with the specified initial capacity, the specified load factor.

HashMap's underlying constructor hashmap (int initialcapacity, float loadfactor) has two parameters, which are the initial capacity initialcapacity and load factor loadfactor.

The maximum capacity of the initialcapacity:hashmap, that is, the length of the underlying array.

Loadfactor: Load factor loadfactor defined as: The number of actual elements of the hash table (n)/hash Table capacity (m).

The load factor measures the degree of use of a hash table, and the larger the load factor indicates the higher the reload of the hash table, the smaller the vice. For a hash table using the list method, the average time to find an element is O (1+a), so if the load factor is larger, the space is more fully utilized, but the result is a decrease in lookup efficiency, and if the load factor is too small, the data in the hash table will be too sparse to cause a serious waste of space.

In the implementation of HASHMAP, the maximum capacity of the HashMap is judged by the threshold field:

Copy Code code as follows:

threshold = (int) (capacity * loadfactor);

According to the definition formula of load factor, threshold is the maximum number of elements allowed under this loadfactor and capacity, and resize to reduce the actual load factor. The default load factor 0.75 is a balanced selection of space and time efficiency. When the capacity exceeds this maximum capacity, the HashMap capacity after resize is twice times the capacity:

if (size++ >= threshold)   
  Resize (2 * table.length);  

6. Fail-fast mechanism:

We know that JAVA.UTIL.HASHMAP is not thread-safe, so if there are other threads modifying the map in the process of using the iterator, then the concurrentmodificationexception is thrown, which is called the Fail-fast policy.
The implementation of this strategy in the source code is through the Modcount domain, modcount as the name implies is the number of changes, the HashMap content will be modified to increase this value, then in the iterator initialization will assign this value to the iterator's expectedmodcount.

Hashiterator () { 
  expectedmodcount = Modcount; 
  if (Size > 0) {//advance to-a entry 
  entry[] t = table; 
  while (Index < t.length && (next = t[index++]) = = null) 
    ; 
  } 

In an iterative process, it is judged whether Modcount and expectedmodcount are equal, and if not equal, there are other threads that have modified the map:
Note that Modcount is declared as volatile, guaranteeing the visibility of modifications between threads.

Final entry<k,v> NextEntry () {   
  if (modcount!= expectedmodcount)   
    

In the HashMap API, it is noted that:

The iterators returned by the "collection View Method" Of all HashMap classes are fast failures: After the iterator is created, if you modify the mapping from the structure, unless you modify it by any means, any time, through the Remove method of the iterator itself, the iterator throws Concurrentmodificationexception. Therefore, in the face of concurrent modifications, the iterator will soon fail completely, without risking any uncertain behavior at a future time of uncertainty.

Note that the rapid failure behavior of the iterator is not guaranteed, and in general, there is no firm guarantee when there are asynchronous concurrent modifications. The fast fail iterator tries its best to throw concurrentmodificationexception. Therefore, it is wrong to write a program that relies on this exception, and the correct approach is that the rapid failure behavior of the iterator should be used only for detecting program errors.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.