Java. util. HashMap, java. util. hashmap
HashMap is one of the most commonly used classes. It implements the hash algorithm. Although it is easy to use, there are many points worth studying.
HashMap stores key-value pairs in the form of key-value. This key-value pair is represented by a static internal class Entry in implementation, it stores the key, value, hash value, and reference of the next element in the linked list in case of hash conflicts.
The underlying implementation of HashMap uses an array to store elements. Its initial capacity is 16 by default and must be an integer power of 2. Its maximum capacity is 1 <30 (1.07 billion + ), at the same time, a load factor is also used to control the expansion of the map's hash table. The default value is 0.75, that is, when the capacity reaches the initial capacity of 3/4, it will be expanded (of course not only this way, ).
When an element is added to a HashMap, The hashCode of the key is calculated, and the storage location of the key in the array is determined based on the hashCode and the array size. In case of a hash conflict, will be stored in the array as a linked list.
Next, let's take a look at the source code. First, let's look at the constructor.
Public HashMap (int initialCapacity, float loadFactor) {// The initial capacity cannot be smaller than 0; otherwise, an exception if (initialCapacity <0) throw new IllegalArgumentException ("Illegal initial capacity: "+ initialCapacity); // control the initial capacity cannot exceed the maximum capacity of 1 <30 if (initialCapacity> MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; // check the validity of the loading factor. It cannot be less than 0, the value must be if (loadFactor <= 0 | Float. isNaN (loadFactor) throw new IllegalArgumentException ("Illegal load factor:" + loadFactor); this. loadFactor = loadFactor; threshold = initialCapacity; // The init method is left to the subclass extension init ();}
We can see that when creating a HashMap, it does not allocate memory space, but is allocated only when actually adding data to the map. You can see from the put method:
Public V put (K key, V value) {// No space is allocated when the table is created. Therefore, if the table is empty, the memory space is allocated. if (table = EMPTY_TABLE) {inflateTable (threshold);} // special processing of null key if (key = null) return putForNullKey (value ); // calculate the hashCode int hash = hash (key) of the key; // determine the position of the element in the hash table based on the hashCode and current capacity, that is, the position of the hash bucket int I = indexFor (hash, table. length); // check whether the key already exists. If yes, replace the old value with the new value and return the old value for (Entry <K, v> e = table [I]; e! = Null; e = e. next) {Object k; // you can see that the hashCode and equals methods are used to determine whether a key already exists if (e. hash = hash & (k = e. key) = key | key. equals (k) {V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue ;}// increase the number of map modifications, which is used to implement the fail-fast mechanism modCount ++; // Add the elements to the specified index location in the hash table for processing (also called the hash bucket) addEntry (hash, key, value, I ); // return null indicates that return null does not exist before the key;} void addEntry (int hash, K key, V va Lue, int bucketIndex) {// determine whether to scale up. The current capacity reaches the threshold and a hash conflict occurs (the specified hash bucket already has an element) if (size> = threshold) & (null! = Table [bucketIndex]) {// The capacity is extended to 2 times of resize (2 * table. length); hash = (null! = Key )? Hash (key): 0; // recalculate the location of the stored hash bucket bucketIndex = indexFor (hash, table. length);} // create an Entry and store it in the hash table createEntry (hash, key, value, bucketIndex);} void createEntry (int hash, K key, V value, int bucketIndex) {// retrieve the existing element Entry <K, V> e = table [bucketIndex]; // place the new element at the beginning of the linked list, that is, let the next reference of the new element point to the existing element table [bucketIndex] = new Entry <> (hash, key, value, e ); // modify the element count size ++ ;}
From the code, we can see that the expansion must meet the following two conditions:
However, the number of key values in the current container exceeds the threshold. That is to say, for example, if the initial capacity is 16, a large number of hash conflicts occur before the threshold value is reached, and then the added elements rarely have hash conflicts, it is possible that the number of key values exceeds 16*0.75 = 12 or even over 16, so the hash algorithm must ensure even distribution and minimize hash conflicts.
The above is the implementation of adding elements. Here we will look at how it initializes and allocates memory:
Private void inflateTable (int toSize) {// The integer power int capacity = roundUpToPowerOf2 (toSize) that guarantees the capacity is 2 ); // calculate and save the expansion threshold value during initialization to avoid re-computing threshold = (int) Math every time. min (capacity * loadFactor, MAXIMUM_CAPACITY + 1); // The memory is allocated here. table = new Entry [capacity]; // The hash seinithashseedasneeded (capacity) is initialized );} /*** ensure that the capacity is an integer power of 2 and cannot exceed the maximum capacity. * For example, if the input value is 15, the value is 16, and the input value is 17, the value is 32, * That is, the number of integers greater than the current value and closest to 2 */private static int roundUpToPowerOf2 (int number) {// ensure that the capacity is an integer power of 2, and cannot exceed the maximum capacity return number> = MAXIMUM_CAPACITY? MAXIMUM_CAPACITY: (number> 1 )? Integer. highestOneBit (number-1) <1): 1 ;}
Special Handling of null key:
Private V putForNullKey (V value) {// if it already exists, replace the old value for (Entry <K, V> e = table [0]; e! = Null; e = e. next) {if (e. key = null) {V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue ;}// increase the number of map modifications, which is used to implement the fail-fast mechanism modCount ++; // The hashCode of the null key is fixed to 0, and the bucket location is also fixed to 0 addEntry (0, null, value, 0); return null ;}
Let's take a look at how to determine the location of a non-null key.
static int indexFor(int h, int length) { return h & (length-1); }
H is the hashCode of the key, and length is the maximum length of the current hash table. h & (length-1) is equivalent to h % length, but the former uses bitwise operations, bitwise operations are faster than modulo operations. Why can I replace the modulo operation with the "&" operation? Because length is the integer power of 2, and it is reduced by 1, the low position is exactly 1, and the result of the & operation with the other number will certainly not exceed the length, the effect is the same as that of the % operation. If length is not an integer power of 2, this cannot be done, so here we use it very cleverly.
Next let's take a look at the core hash method for generating hashCode:
Final int hash (Object k) {int h = hashSeed; if (0! = H & k instanceof String) {return sun. misc. hashing. stringHash32 (String) k);} // call the hashCode () method of the key to obtain hashCode h ^ = k. hashCode (); // perform a series of displacement and exclusive or operations on hashCode and return the result as hashCode h ^ = (h >>> 20) ^ (h >>> 12 ); return h ^ (h >>> 7) ^ (h >>>> 4 );}
Here, why do we need to perform this series of displacement and exclusive or operations? After the calculation, the bit 0 and 1 in the hashCode can be evenly distributed to reduce hash conflicts and improve the efficiency of the entire HashMap.
Rehash during resizing:
Void resize (int newCapacity) {Entry [] oldTable = table; int oldCapacity = oldTable. length; if (oldCapacity = MAXIMUM_CAPACITY) {threshold = Integer. MAX_VALUE; return;} // recreate the underlying array Entry [] newTable = new Entry [newCapacity]; // re-hash existing elements into the new hash bucket transfer (newTable, initHashSeedAsNeeded (newCapacity); table = newTable; // update the expansion threshold value threshold = (int) Math. min (newCapacity * loadFactor, MAXIMUM_C APACITY + 1);} void transfer (Entry [] newTable, boolean rehash) {int newCapacity = newTable. length; for (Entry <K, V> e: table) {while (null! = E) {Entry <K, V> next = e. next; if (rehash) {e. hash = null = e. key? 0: hash (e. key);} int I = indexFor (e. hash, newCapacity); e. next = newTable [I]; newTable [I] = e; e = next ;}}}
Because the length of the hash table has changed, you need to recalculate the hashCode for existing elements and put it in the new hash bucket. This is a time-consuming operation. Therefore, when creating a HashMap, if there is an expected value for the data volume, you should set a more appropriate initial capacity, to avoid performance loss caused by continuous resizing during data addition.
Next let's take a look at the get operation.
Public V get (Object key) {// null key for special operations if (key = null) return getForNullKey (); // obtain the Entry corresponding to the key <K, v> entry = getEntry (key); // If yes, return the value corresponding to the key. If no value exists, return null = entry? Null: entry. getValue () ;}final Entry <K, V> getEntry (Object key) {// The value 0 indicates no element. Therefore, null if (size = 0) is returned directly) {return null;} // obtain the key's hashCode int hash = (key = null )? 0: hash (key); // obtain the elements in the hash bucket corresponding to the key, and return the corresponding value for (Entry <K, v> e = table [indexFor (hash, table. length)]; e! = Null; e = e. next) {Object k; // determine the key if (e. hash = hash & (k = e. key) = key | (key! = Null & key. equals (k) return e;} // if not, return null ;}
The default value of the load factor is 0.75, which is a compromise value. We can change this value through the constructor. However, the larger the load factor, the larger the Data Query overhead may be. Because the larger the load factor, the more elements are stored in the map, the more likely the hash conflicts are. If the location of the hash bucket calculated based on the hashCode is the same, it is saved as a linked list, the query operation of the linked list traverses the entire linked list, so the query efficiency is not high. When get and put are used to query elements, improving the query efficiency improves the hashmap efficiency. This is a method of exchanging space for time.
Why is HashMap very efficient? HashMap ensures its efficiency through the following points:
- Efficient hash algorithms make it difficult to generate hash conflicts
- Fast element access based on Array Storage
- You can use the loading factor to exchange space for time.
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.