from: Go deep Java Collection Learning Series: the principle of HashMap
Reference documents
References: Deep Java Collection Learning series: the implementation of the HashMap principle, most of the reference to this blog, only slightly modified
I have written: hashmap implementation of the principle of 1. HashMap Overview:
HashMap is an asynchronous implementation of the map interface based on a hash table (Hashtable is similar to HashMap, the only difference being that the method in Hashtalbe is thread-safe, that is, synchronous). This implementation provides all the optional mapping operations and allows NULL values and NULL keys to be used. This class does not guarantee the order of mappings, especially if it does not guarantee that the order is immutable. 2. HASHMAP Data structure:
In the Java programming language, the most basic structure is two kinds, one is an array, the other is an analog pointer (reference), all data structures can be constructed with these two basic structures, HashMap is no exception. HashMap is actually an "array of linked lists" of data structures, with each element storing the array of chain header nodes, that is, the combination of the array and the linked list.
As can be seen from the above figure, the HashMap bottom is an array structure, and each item in the array is a linked list. When a new HashMap is created, an array is initialized. The source code is as follows:
/**
* The table, resized as necessary. Length must Always is a power of two.
* *
transient entry[] table;
Static Class Entry<k,v> implements Map.entry<k,v> {
final K key;
V value;
Entry<k,v> Next;
final int hash;
...
}
As you can see, Entry is an array of elements, each map.entry is actually a key-value pair, it holds a reference to the next element, which constitutes a linked list. 3. HashMap Access implementation:
1) Storage:
public v put (K key, V value) {//HashMap allows null keys and null values to be stored.
When key is null, the Putfornullkey method is called, and value is placed in the first position of the array.
if (key = = null) return Putfornullkey (value);
Recalculate the hash value based on the hashcode of the key.
int hash = hash (Key.hashcode ());
Searches for the index in the table corresponding to the specified hash value.
int i = indexfor (hash, table.length);
If the Entry at the I index is not NULL, iterate through the next element of the E element continuously.
for (entry<k,v> e = table[i]; e!= null; e = e.next) {Object K;
if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))) {V oldValue = E.value;
E.value = value;
E.recordaccess (this);
return oldValue;
}//If the entry at the I index is NULL, indicating that there is no entry here.
Modcount Records HashMap The number of changes in the structure modcount++;
Adds key and value to the I index.
AddEntry (hash, key, value, I);
return null; }
From the above source code can be seen: when we put elements in the HashMap, first based on the key Hashcode recalculate hash value, according to the hash is worth to the element in the array position (that is, subscript), if the array that position has been stored in the other elements, Then the elements in this position will be stored in the form of a linked list, and the new ones placed in the chain, the first to be added at the end of the chain. If the array does not have an element at that location, it is placed directly in that position in the array.
The AddEntry (hash, key, value, I) method places the Key-value pair at the I index of the array table according to the computed hash value. AddEntry is a way for HashMap to provide a package access (that is, there is no public,protected,private of these three access modifiers, for default access rights, but not this default in code), The code is as follows:
void AddEntry (int hash, K key, V value, int bucketindex) {
//Get Entry
entry<k,v> e = tab at the specified Bucketindex index Le[bucketindex];
Place the newly created Entry into the Bucketindex index, and let the new Entry point to the original Entry
Table[bucketindex] = to new entry<k,v> (hash, key, value, E );
If the number of key-value pairs in the Map exceeds the limit if
(size++ >= threshold)
//The length of the table object is expanded to twice times the original.
Resize (2 * table.length);
}
When the system determines the key-value pair in the storage HashMap, the value in entry is not considered, and only the storage location of each entry is computed and determined according to the key. We can completely treat the value of the MAP set as the key, and when the system determines where the key is stored, the value is stored there.
The hash (int h) method recalculates the hash once based on the hashcode of the key. This algorithm adds a high level calculation to prevent the hash conflict caused by low level constant and high change.
static int hash (int h) {
h ^= (H >>>) ^ (h >>>);
Return h ^ (H >>> 7) ^ (H >>> 4);
}
We can see that to find an element in HashMap, we need to get the position of the corresponding array according to the hash value of the key. How to calculate this position is the hash algorithm. Previously said HashMap's data structure is the combination of array and linked list, so we certainly hope that the element position in this hashmap as far as possible evenly distributed, as far as possible the number of elements in each position is only one, then when we use the hash algorithm to obtain this position, Immediately can know that the corresponding position of the element is what we want, and do not have to go through the linked list, which greatly optimizes the efficiency of the query.
For any given object, as long as its hashcode () return value is the same, the hash code value computed by the program call hash (int h) method is always the same. The first thing we think of is to take the hash value of the array length modulo operation, so that the distribution of elements is relatively uniform. However, the consumption of the "modulo" operation is relatively large, as is done in HashMap: Call the indexfor (int h, int length) method to calculate which index the object should be saved at the table array. The code for the indexfor (int h, int length) method is as follows:
static int indexfor (int h, int length) {return
H & (length-1);
}
This method is very ingenious, it through H & (table.length-1) to get the object's save bit, and hashmap the length of the underlying array is always 2 n times, this is the HashMap speed optimization. The following code is available in the HashMap constructor:
int capacity = 1;
while (capacity < initialcapacity)
capacity <<= 1;
This code guarantees that when initialized, the capacity of the HashMap is always 2 n-th, that is, the length of the underlying array is always 2 n Times Square.
The,h& (length-1) operation is equivalent to the length modulo, which is h%length, but the & ratio is more efficient when the N-second square of length is always 2.
This looks very simple, in fact, more mysterious, we give an example to illustrate:
Assuming that the array lengths are 15 and 16 respectively, and the optimized hash code is 8 and 9 respectively, then the result of the & operation is as follows:
H & (table.length-1) hash table.length-1
8 & (15-1): 0100 & 1110 = &NBSP;0100
9 & (15-1): 0101 & 1110 = 0100
--------------------------------------------------------------------------------------------------------------- --------
8 & (16-1): 0100 & 1111 = 0100
9 & (16-1): 0101 & 1111 = 0101
--------------------------------------------------------------------------------------------------------------- --------
The
can be seen from the example above: When 8, 92, and (15-1) 2 = (1110) perform "and Op &", the same results are produced, all 0100, which means they are positioned in the same position in the array, which creates a collision, 8 and 9 will be placed in the same position in the array to form a linked list, then the query will need to traverse the chain table, get 8 or 9, which reduces the efficiency of the query. At the same time, we can also find that when the length of the array is 15, the hash value and (15-1) 2 = (1110) with the "Operation &", then the last one is always 0, and 0001,0011,0101,1001,1011, 0111,1101 These positions will never be able to store elements, space waste is very large, worse in this case, the array can be used in the position of the array is much smaller than the length, which means that further increase the probability of collisions, slow down the efficiency of the query.