Collection Series-hashmap Source Analysis
We have already analyzed the two sets of ArrayList and LinkedList, we know that ArrayList is based on the array, LinkedList is based on the linked list. They each have their own advantages and disadvantages, such as ArrayList is better positioned to find elements than LinkedList, and LinkedList is better than ArrayList when adding deletion elements. The HashMap of this article combines the advantages of the two, its bottom is based on the hash table, if not to consider the hash conflict, hashmap in the modification of the time complexity of the operation can reach a staggering O (1). Let's look at the structure of the hash table on which it is based.
As you can see from the above illustration, a hash table is a structure composed of arrays and lists, and of course the image above is a bad example, a good hash function should try to distribute the average element in the array, reduce the hash conflict and reduce the length of the list. The longer the length of a linked list means that the more nodes that need to be traversed in the lookup, the worse the performance of the hash table. Let's take a look at some of the member variables in the HashMap.
Default initial capacity
static final int default_initial_capacity = 1 << 4;
Default Maximum Capacity
static final int maximum_capacity = 1 << 30;
The default load factor, which is the scale at which the hash table can reach full size.
Static final float default_load_factor = 0.75f;
An empty hash table
Static final entry<?,? >[] empty_table = {};
The actual hash table used
Transient entry<k,v>[] table = (entry<k,v>[]) empty_table;
HashMap size, that is, the number of key value pairs stored HashMap
transient int size;
A threshold value pair that is used to determine whether the expansion of the hash table is required
int threshold;
Load factor
Final float Loadfactor;
Number of revisions used for fail-fast mechanism
transient int modcount;
Default thresholds for using alternate hashes
static final int alternative_hashing_threshold_default = Integer.max_value;
Random hash seed to help reduce the number of hash collisions
transient int hashseed = 0;
As seen in the member variable, the default initial capacity of HashMap is 16, and the default load factor is 0.75. And threshold is the set can store the value of the value of the threshold, the default is the initial capacity * load factor, that is, 16*0.75=12, when the key value to exceed the threshold, means that this time the hash table is saturated, and then continue to add elements will increase the hash conflict, Thus the performance of the HashMap is reduced. This will trigger an automatic expansion mechanism to ensure hashmap performance. We can also see that the hash table is actually a entry array, and each entry in the array is the head node of the one-way list. This entry is the static inner class of the HashMap, looking at the entry member variables.
Static Class Entry<k,v> implements Map.entry<k,v> {
Final K Key; Key
V value; Value
Entry<k,v> Next; The next entry reference
int hash; Hash code
..//Omit the following code
}
A entry instance is a key-value pair that contains key and value, and each entry instance also has a reference to the next entry instance. To avoid repeated computations, each entry instance also holds the corresponding hash code. It can be said that the entry array is the core of the HashMap, all operations are done against this array. Because the HashMap source is long, it is impossible to introduce all of its methods, so we only grasp the focus to introduce. Next, we will take questions as the guide, and delve into the internal mechanism of HASHMAP for the following questions.
1. What did HashMap do at the time of construction?
constructor, incoming initialization capacity and load factor
Public HashMap (int initialcapacity, float loadfactor) {
if (Initialcapacity < 0) {
throw new IllegalArgumentException ("Illegal initial capacity:"
+ initialcapacity);
}
If the initialization capacity is greater than the maximum capacity, it is set to the maximum capacity
if (Initialcapacity > Maximum_capacity) {
initialcapacity = maximum_capacity;
}
Throws an exception if the load factor is less than 0 or the load factor is not a floating-point number
if (loadfactor <= 0 | | Float.isnan (Loadfactor)) {
throw new IllegalArgumentException ("Illegal load factor:"
+ Loadfactor);
}
To set the load factor
This.loadfactor = Loadfactor;
Threshold is initialized capacity
threshold = initialcapacity;
Init ();
}
void init () {}
All constructors are called to this constructor, in which we see that in addition to doing some validation on the parameter, it does two things, sets the load factor to the incoming load factor, and sets the threshold to the incoming initialization size. The Init method is empty and nothing has been done. Note that this is not a new entry array based on the incoming initialization size. So when do you want to create a new array? Keep looking down.
2. What happens when HashMap adds a key value pair.
Place the Key-value key value to the HashMap
Public V-Put (K key, V value) {
Initialize the hash table if it is not initialized
if (table = = empty_table) {
Initializing a hash table
Inflatetable (threshold);
}
if (key = = null) {
return Putfornullkey (value);
}
Hash code for key calculation
int hash = hash (key);
Position in hash table based on hash code
int i = indexfor (hash, table.length);
for (entry<k,v> e = table[i]; e!= null; e = e.next) {
Object K;
If the corresponding key already exists, replace its value and return the original value
if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))) {
V oldValue = E.value;
E.value = value;
E.recordaccess (this);
return oldValue;
}
}
modcount++;
Add entry to HashMap if there is no corresponding key
AddEntry (hash, key, value, I);
Add successful return null
return null;
}
See, when you add a key-value pair, you first check whether the hash table is empty, and initialize it if it is an empty table. Later, the hash function is invoked to calculate the hash code of the incoming key. Locate the specified slot of the entry array according to the hash code, and then iterate through the one-way linked list of the slot, if the incoming one already exists, replace it, or a new entry is added to the hash table.
3. How the hash table is initialized.
Initializing a hash table expands the hash table capacity because it is possible that the incoming capacity is not a power of 2
private void inflatetable (int tosize) {
Hash table capacity must be a power of 2
int capacity = ROUNDUPTOPOWEROF2 (tosize);
Set valve value, this is usually taken capacity*loadfactor
threshold = (int) math.min (capacity * Loadfactor, maximum_capacity + 1);
New hash table with specified capacity
Table = new Entry[capacity];
Initializing a hash seed
Inithashseedasneeded (capacity);
}
As we know above, we don't create a new entry array when we construct hashmap, but we check that the current hash table is empty when the put operation is in place, and then call the Inflatetable method to initialize if it is an empty table. The code for this method is posted, and you can see that the method will recalculate the capacity of the entry array, because the initialization size passed in when constructing hashmap may not be a power of 2, so convert the number to a power of 2 to create a new entry array based on the new capacity. When initializing the hash table, reset the threshold again, the threshold is generally capacity*loadfactor. In addition, the hash seed (hashseed) is initialized when the hash table is initialized, the hashseed is used to optimize the hash function, the default is 0 is not to use the alternate hashing algorithm, but can also set the Hashseed value to achieve the optimization effect. The following will be mentioned.
4. HashMap when to decide whether to expand, and how it is expanding.
Add entry method to determine whether to enlarge
void AddEntry (int hash, K key, V value, int bucketindex) {
If the size of the HASHMAP is greater than the threshold and the value of the hash table corresponding slot is not NULL
if (size >= threshold) && (null!= table[bucketindex)) {
Because the size of the HASHMAP is greater than the threshold, indicating that a hash conflict is imminent, the expansion
Resize (2 * table.length);
hash = (null!= key)? Hash (key): 0;
Bucketindex = Indexfor (hash, table.length);
}
This indicates that the size of the HashMap does not exceed the threshold, so no expansion is required.
Createentry (hash, key, value, Bucketindex);
}
Expansion of a hash table
void Resize (int newcapacity) {
entry[] oldtable = table;
int oldcapacity = Oldtable.length;
If the current is already the maximum capacity can only increase the valve value
if (oldcapacity = = maximum_capacity) {
threshold = Integer.max_value;