Java Collection --- HashMap source code analysis, java --- hashmap source code

Source: Internet
Author: User

Java Collection --- HashMap source code analysis, java --- hashmap source code

1. HashMap Overview

HashMap is implemented based on the Map interface of the hash table. This implementation provides all optional ing operations and allows the use of null values and null keys. (Except for not synchronizing data and allowing null, The HashMap class is roughly the same as that of Hashtable .) This class does not guarantee the order of mappings, especially it does not guarantee that the order remains unchanged.

It is worth noting that HashMap is NOT thread-safe. To use a thread-safe HashMap, you can use the static method synchronizedMap of the Collections class to obtain a thread-safe HashMap.

 Map map = Collections.synchronizedMap(new HashMap());

 

Ii. Data Structure of HashMap

The underlying layer of HashMap is mainly implemented based on arrays and linked lists. It has a very fast query speed mainly because it determines the storage location by calculating the hash code. HashMap mainly uses the hashCode of the key to calculate the hash value. As long as the hashCode is the same, the calculated hash value is the same. If many objects are stored, different objects may calculate the same hash value, which leads to a so-called hash conflict. All those who have learned the data structure know that there are many ways to solve hash conflicts. The bottom layer of HashMap uses a linked list to solve hash conflicts.

 

In the figure, the purple part represents a hash table, also known as a hash array. Each element of the array is the header node of a single-chain table, and the linked list is used to resolve conflicts, if different keys are mapped to the same position of the array, they are placed in the single-link table.

Let's look at the Entry class code in HashMap:

 

/** Entry is a one-way linked list. * It is a linked list corresponding to the "HashMap chain storage method. * It implements Map. entry interface, that is, functions such as getKey (), getValue (), setValue (V value), equals (Object o), and hashCode () **/static class Entry <K, v> implements Map. entry <K, V> {final K key; V value; // point to the next node Entry <K, V> next; final int hash; // constructor. // Input parameters include "hash value (h)", "Key (k)", "value (v)", "next node (n)" Entry (int h, K k, V v, Entry <K, V> n) {value = v; next = n; key = k; hash = h;} public final K getKey () {return key;} public final V getValue () {return value;} public final V setValue (V newValue) {V oldValue = value; value = newValue; return oldValue ;} // determine whether two entries are equal. // if both the "key" and "value" values of the two entries are equal, true is returned. // Otherwise, false public final boolean equals (Object o) {if (! (O instanceof Map. entry) return false; Map. entry e = (Map. entry) o; Object k1 = getKey (); Object k2 = e. getKey (); if (k1 = k2 | (k1! = Null & k1.equals (k2) {Object v1 = getValue (); Object v2 = e. getValue (); if (v1 = v2 | (v1! = Null & v1.equals (v2) return true;} return false;} // implement hashCode () public final int hashCode () {return (key = null? 0: key. hashCode () ^ (value = null? 0: value. hashCode ();} public final String toString () {return getKey () + "=" + getValue () ;}// when an element is added to a HashMap, call recordAccess (). // Void recordAccess (HashMap <K, V> m) {}// recordRemoval () is called when elements are deleted from HashMap (). // Void recordRemoval (HashMap <K, V> m) {}} is not processed here ){}}

 

HashMap is actually an Entry array. The Entry object contains keys and values. next is also an Entry object, which is used to handle hash conflicts and form a linked list.

 

Iii. HashMap source code analysis

 

1. Key attributes

Let's take a look at some key attributes in the HashMap class:

 

1 transient Entry [] table; // object array for storing elements 2 3 transient int size; // number of elements stored 4 5 int threshold; // threshold value when the actual size exceeds the threshold value, threshold = loading Factor * capacity 6 7 final float loadFactor will be expanded; // loading Factor 8 9 transient int modCount; // number of changes

 

The loadFactor loading factor indicates the extent to which elements in the H_3 table are filled.

If the load factor is greater, the more elements are filled, the advantage is that the space utilization is high, but the chance of conflict increases. The length of the linked list will grow longer and the search efficiency will decrease.

On the contrary, the smaller the loading factor, the fewer elements to be filled. The advantage is that the chance of conflict is reduced, but space is wasted. the data in the table will be too sparse (a lot of space is useless and it will start to expand)

The larger the chance of conflict, the higher the search cost.

Therefore, a balance and compromise must be found between "conflicting opportunities" and "Space Utilization. this balance and compromise is essentially a balance and compromise between the famous "time-space" in the data structure.

If the machine memory is sufficient and you want to increase the query speed, you can set the load factor to a smaller value. If the machine memory is insufficient and there is no requirement for the query speed, you can set the load factor to a larger value. However, we generally do not need to set it, so we can set it to the default value of 0.75.

 

2. Constructor

Let's take a look at several construction methods of HashMap:

 

Public HashMap (int initialCapacity, float loadFactor) {2 // make sure the number is valid 3 if (initialCapacity <0) 4 throw new IllegalArgumentException ("Illegal initial capacity:" + 5 initialCapacity ); 6 if (initialCapacity> MAXIMUM_CAPACITY) 7 initialCapacity = MAXIMUM_CAPACITY; 8 if (loadFactor <= 0 | Float. isNaN (loadFactor) 9 throw new IllegalArgumentException ("Illegal load factor:" + 10 loadFactor); 11 12 // Find a power of 2> = initialCapacity13 int capacity = 1; // initial capacity 14 while (capacity <initialCapacity) // ensure capacity is 2 n power, make capacity greater than the minimum 2 of initialCapacity n power 15 capacity <= 1; 16 17 this. loadFactor = loadFactor; 18 threshold = (int) (capacity * loadFactor); 19 table = new Entry [capacity]; 20 init (); 21} 22 23 public HashMap (int initialCapacity) {24 this (initialCapacity, DEFAULT_LOAD_FACTOR); 25} 26 27 public HashMap () {28 this. loadFactor = DEFAULT_LOAD_FACTOR; 29 threshold = (int) (DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR); 30 table = new Entry [DEFAULT_INITIAL_CAPACITY]; 31 init (); 32}

 

We can see that when constructing a HashMap, If we specify the loading Factor and initial capacity, the first constructor is called. Otherwise, the default constructor is used. The default initial capacity is 16, and the default load factor is 0.75. We can see 13-15 lines in the above Code. This code is used to ensure that the capacity is the n power of 2, so that capacity is the n power that is greater than the minimum 2 power of initialCapacity, as to why we need to set the capacity to the power of n, let's wait.

 

The most two put and get methods used in HashMap are analyzed.

3. Data Storage

Let's take a look at how HashMap stores data. First, let's look at the put Method of HashMap:

Public V put (K key, V value) {// If "key is null", add this key-value pair to table [0. If (key = null) return putForNullKey (value); // if "key is not null", the hash value of the key is calculated, add it to the linked list corresponding to the hash value. Int hash = hash (key. hashCode (); // search for the index int I = indexFor (hash, table. length); // cyclically traverse the Entry array. If the key-value pair corresponding to the key already exists, replace the old value with the new value. Then exit! For (Entry <K, V> e = table [I]; e! = Null; e = e. next) {Object k; if (e. hash = hash & (k = e. key) = key | key. equals (k) {// if the key is the same, overwrite and return the old value V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue ;}// number of modifications + 1 modCount ++; // Add key-value to addEntry (hash, key, value, I); return null ;}

 

The above program uses an important internal interface: Map. Entry. Each Map. Entry is actually a key-value pair. It can be seen from the above program that when the system decides to store the key-value Pair in HashMap, the value in the Entry is not considered at all. It only calculates and determines the storage location of each Entry based on the key. This also illustrates the previous conclusion: we can regard the value in the Map set as a subsidiary of the key. When the system determines the storage location of the key, the value will be saved there.

We will analyze this function slowly. Rows 2nd and 3 are used to process the case where the key value is null. Let's look at the putForNullKey (value) method:

 

1 private V putForNullKey (V value) {2 for (Entry <K, V> e = table [0]; e! = Null; e = e. next) {3 if (e. key = null) {// if an object whose key is null exists, it overwrites 4 V oldValue = e. value; 5 e. value = value; 6 e. recordAccess (this); 7 return oldValue; 8} 9} 10 modCount ++; 11 addEntry (0, null, value, 0); // if the key is null, the hash value is 012 return null; 13}

 

Note: If the key is null, the hash value is 0, and the position where the object is stored in the array is 0. That is, table [0]

Let's go back and look at the 4th rows in the put method. It calculates the hash code through the hashCode value of the key. below is the function for calculating the hash code:

 

1 // calculate the hash value using the key hashCode to calculate 2 static int hash (int h) {3 // This function ensures that hashCodes that differ only by4 // constant multiples at each bit position have a bounded5 // number of collisions (approximately 8 at default load factor ). 6 h ^ = (h >>> 20) ^ (h >>>> 12); 7 return h ^ (h >>>> 7) ^ (h >>> 4 ); 8}

 

After obtaining the hash code, the index that should be stored in the array will be calculated using the hash code. The index calculation function is as follows:

 

1 static int indexFor (int h, int length) {// calculate the index value 2 return h & (length-1) Based on the hash value and array length ); // here we cannot calculate it as needed. There is a reason to use hash & (length-1). This ensures that the calculated index is within the array size range and does not exceed 3}

 

In this case, we will naturally consider the modulo (Division hash) of length Using hash values for hash tables. This is also true in Hashtable, this method can basically ensure that the elements in the hash table are more uniform, but the modulo operation will be used, the efficiency is very low, HashMap uses h & (length-1) instead of Modulo, it also achieves Even Hash, but the efficiency is much higher, which is also an improvement of HashMap for Hashtable.

 

Next, let's analyze why the hash table capacity must be an integer power of 2. First, if the length is an integer power of 2, h & (length-1) is equivalent to modulo the length, which ensures the uniformity of the hash and improves the efficiency; second, if length is an integer power of 2, it is an even number. In this way, length-1 is an odd number and the last digit of an odd number is 1, which ensures h & (length-1) the last digit of may be 0 or 1 (depending on the value of h), that is, the result may be an even number or an odd number, in this way, the uniformity of the hash can be ensured. If length is an odd number, it is obvious that length-1 is an even number, and its last digit is 0, so that h & (length-1) the last digit must be 0, that is, it can only be an even number, so that any hash value will only be hashed to the position of the even subscript of the array, which wastes nearly half of the space, therefore, length is an integer power of 2 to minimize the probability of collision between different hash values, so that elements can be evenly hashed in the hash table.

 

This seems very simple. Actually, it is quite mysterious. Here is an example to illustrate:

Assume that the array length is 15 and 16, and the optimized hash code is 8 and 9, respectively. The result of the & operation is as follows:

       h & (table.length-1)                     hash                             table.length-1       8 & (15-1):                                 0100                   &              1110                   =                0100       9 & (15-1):                                 0101                   &              1110                   =                0100       -----------------------------------------------------------------------------------------------------------------------       8 & (16-1):                                 0100                   &              1111                   =                0100       9 & (16-1):                                 0101                   &              1111                   =                0101

 

As shown in the preceding example, when they are "and" 15-1 (1110) ", the same results are generated, that is, they are located at the same position in the array, this produces a collision. 8 and 9 will be placed in the same position in the array to form a linked list, so you need to traverse the chain table during query to get 8 or 9, this reduces the query efficiency. At the same time, we can also find that when the array length is 15, the hash value will be "and" with 15-1 (1110), then the last bit will always be 0, in the case of 0001,0011, 0101,1001, 1011,0111, and 1101, elements can never be stored. The waste of space is quite large. Worse, in this case, the positions available for the array are much smaller than the array length, which means the collision probability is further increased and the query efficiency is slowed down! When the length of the array is 16, that is, the Npower of 2, the value of each bit of the binary number obtained by 2n-1 is 1, which makes the & at the low position, the obtained result is the same as the original hash's low position. In addition, the hash (int h) method is used to further optimize the key's hashCode, so that only two values with the same hash value will be placed in the same position in the array to form a linked list.

Therefore, when the array length is 2 to the n power, the probability that different keys calculate the same index is smaller, the data distribution on the array is relatively even, that is to say, the probability of collision is small. Relatively, you do not need to traverse the linked list at a certain position during the query, so the query efficiency is high.

   

According to the source code of the put method, when the program tries to put a key-value pair into HashMap, the program first determines the storage location of the Entry based on the return value of the key hashCode: if the hashCode () values of the keys of the two entries are the same, they are stored in the same location. If the keys of the two entries return true through equals comparison, the value of the newly added Entry will overwrite the value of the original Entry in the set, but the key will not overwrite. If the keys of these two entries are compared by equals, false is returned. The newly added Entry forms an Entry chain with the original Entry in the set, the newly added Entry is in the header of the Entry chain. For more information, see the description of the addEntry () method.

 

 

1 void addEntry (int hash, K key, V value, int bucketIndex) {2 Entry <K, V> e = table [bucketIndex]; // if the position to be added has a value, set the original value of this position to next of the new entry, that is, the next node of the new entry chain table 3 table [bucketIndex] = new Entry <> (hash, key, value, e); 4 if (size ++> = threshold) // resize 5 resize (2 * table if it exceeds the critical value. length); // scale up 6 in multiples of 2}

 

The bucketIndex parameter is the index value calculated by the indexFor function. The 2nd line of code is the Entry object that obtains the index in the array, the first row is to use hash, key, and value to construct a new Entry object and put it in the index as the bucketIndex position. In addition, the original object at the position is set as the next of the new object to form a linked list.

Rows 4th and 5th determine whether the size after put has reached the critical value of threshold. If the critical value is reached, scale up is required. The HashMap scale-up is twice the original size.

 

4. Resize

The resize () method is as follows:

Adjust the size of the HashMap. newCapacity is the unit after adjustment.

1 void resize (int newCapacity) {2 Entry [] oldTable = table; 3 int oldCapacity = oldTable. length; 4 if (oldCapacity = MAXIMUM_CAPACITY) {5 threshold = Integer. MAX_VALUE; 6 return; 7} 8 9 Entry [] newTable = new Entry [newCapacity]; 10 transfer (newTable ); // It is used to move all the elements of the original table to newTable 11 table = newTable; // assign newTable to table12 threshold = (int) (newCapacity * loadFactor ); // recalculate critical value 13}

 

A new underlying array of HashMap is created. In the code above, 10th calls the transfer method, adds all the elements of HashMap to the new HashMap, and recalculates the index position of the elements in the new array.

 

 

When there are more and more elements in HashMap, the probability of hash conflicts increases, because the length of the array is fixed. Therefore, to improve the query efficiency, we need to resize the HashMap array. The array expansion operation will also appear in the ArrayList. This is a common operation. After the HashMap array is expanded, the most performance-consuming point appears: the data in the original array must be recalculated and placed in the new array. This is resize.

 

When Will HashMap be resized? When the number of elements in HashMap exceeds the array size * loadFactor, array expansion is performed. The default value of loadFactor is 0.75, which is a compromise value. That is to say, by default, the array size is 16. When the number of elements in HashMap exceeds 16*0.75 = 12, the array size is expanded to 2*16 = 32, that is, to double the number of elements, and then re-calculate the position of each element in the array, expansion requires array replication, copying an array is a very performance-consuming operation, therefore, if we have predicted the number of elements in HashMap, the number of Preset elements can effectively improve the performance of HashMap.

 

 

5. Data Reading

 

 

1.public V get(Object key) {   2.    if (key == null)   3.        return getForNullKey();   4.    int hash = hash(key.hashCode());   5.    for (Entry<K,V> e = table[indexFor(hash, table.length)];   6.        e != null;   7.        e = e.next) {   8.        Object k;   9.        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))   10.            return e.value;   11.    }   12.    return null;   13.}  

With the hash algorithm stored above as the basis, it is easy to understand this code. From the source code above, we can see that when we get elements from HashMap, we first calculate the hashCode of the key and find an element at the corresponding position in the array, then, use the equals method of the key to find the required elements in the linked list at the corresponding position.

 

6. Performance Parameters of HashMap:

 

HashMap contains the following constructor:

HashMap (): Construct a HashMap with an initial capacity of 16 and a load factor of 0.75.

HashMap (int initialCapacity): constructs a HashMap with an initial capacity of initialCapacity and a load factor of 0.75.

HashMap (int initialCapacity, float loadFactor): Creates a HashMap with the specified initial capacity and load factor.

HashMap's basic constructor HashMap (int initialCapacity, float loadFactor) has two parameters: initial capacity initialCapacity and load factor loadFactor.

InitialCapacity: the maximum capacity of HashMap, that is, the length of the underlying array.

LoadFactor: the load factor loadFactor is defined as: the actual number of elements in the hash list (n)/the capacity of the hash list (m ).

The load factor is used to measure the space usage of a hash table. The larger the load factor is, the higher the loading level of the hash table. The smaller the load factor is. For the hash list using the linked list method, the average time for searching an element is O (1 + a). Therefore, if the load factor is greater, the space utilization is more adequate, however, the result is reduced search efficiency. If the load factor is too small, the data in the scattered list will be too sparse, causing serious waste of space.

In the implementation of HashMap, the maximum capacity of HashMap is determined through the threshold field:

 

threshold = (int)(capacity * loadFactor);  
 

According to the definition formula of the load factor, threshold is the maximum number of elements allowed under the corresponding loadFactor and capacity. If this number is exceeded, re-resize the element to reduce the actual load factor. The default load factor 0.75 is a balance between space and time efficiency. When the capacity exceeds the maximum capacity, the HashMap capacity after resize is twice the capacity:

 

 

This article Excerpted from http://www.cnblogs.com/ITtangtang/p/3948406.html#a1

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.