HashMap has a very important role to play in Java development, and every Java programmer should be aware of HashMap.
This paper mainly analyzes the design idea of HashMap from the source point of view, and elaborates several concepts in HashMap, and probes into the internal structure and implementation details of HashMap, and discusses the performance of HashMap.
1. HashMap Design idea and internal structure composition
HashMap Design Ideas
Map<k,v> is a container that stores data with key values, and HashMap uses the hashcode value of key value to organize storage so that data can be accessed very quickly and efficiently based on key value keys.
For a key-value pair, the <key,value>,hashmap is encapsulated inside a corresponding Entry<key,value> object, that is, the Entry<key,value> object is a key value pair <key,value the organization of >;
For each object, the JVM generates a Hashcode value for it. HashMap when storing the key value to entry<key,value>, depending on the hashcode value of the key, a mapping relationship is made to determine that the pair of key values should be entry<key,value> Where to store in the HashMap;
When the data is taken through the key value, and then according to the key value of the hashcode, as well as the internal mapping conditions, directly positioned to the value of the key corresponding to the location of the value, can be very efficient to remove value value.
In order to realize the above design idea, in the HashMap, we use the form of array + linked list to organize the key value to entry<key,value>.
HashMap internally maintains a entry[] table array, and when we create a HASHMAP using the new HashMap (), the default length of the entry[table is 16 (see Java API). The length of the entry[] table is also called the capacity of the hashmap (capacity);
For each element of the entry[] table, either null, or a list of several entry<key,value>. The number of entry<key,value> in the HashMap is called the size of the HashMap;
An element in the entry[] table and its corresponding entry<key,value> is also called a bucket (bucket);
Its structure is shown in the following illustration:
HashMap The internal organization structure is shown in the above illustration, and then look at the basic workflow of HashMap:
HashMap design is designed to be as fast as possible according to the Key hashcode value, directly can be positioned to the corresponding Entry<key,value> object, and then get value.
Consider such a problem:
When we use the HashMap map = new HashMap () statement, we create a HashMap object with a size of 16 inside the entry[table, and we assume that the entry[table will change in size. Now, we're going to add 160 pairs of key values that are completely different for the key value to the <KEY,VALUE>, and then there's the possibility that the hashmap inside of each bucket is made up of entry<key,value> The length of the linked list will be very long. We know that the time complexity of finding list operations is high, O (n). The performance of such a hashmap will be very low and low, as shown in the following figure:
Now to analyze this problem, the current hashmap can achieve:
1. According to the key hashcode, can be directly positioned to store the entry<key,value> of the bucket where the complexity of the time is O (1);
2. In the bucket to find the corresponding Entry<key,value> object node, you need to traverse the bucket of the entry<key,value> linked list, time complexity of O (n);
So, now, we should minimize the time complexity O (n) of the 2nd question, and the reader is now thinking: we should demand that the length of the list in the bucket be as short as possible. The shorter the length of the list in the bucket, the lower the search time consumed, preferably a entry<key,value> object node in a bucket.
As a result, the Entry<key,value> object node in the bucket requires as little as possible, which requires that the number of barrels in the HashMap is much more.
We know that the length of the HashMap bucket, that is, the entry[] table array, because the array is a contiguous storage unit in memory, its space cost is very large, but its random access speed is the fastest in the Java collection. We increase the number of buckets and reduce the length of the entry<key,value> list to increase the speed of reading data from the HashMap. This is a typical strategy for taking space to change time.
But we can't just start to allocate too much bucket (that is, entry[] table array start not too large, because the array is a contiguous memory space, it is expensive to create, and we are not sure to allocate such a large space for hashmap, it actually can use how much, In order to solve this problem, HashMap uses the actual situation, the dynamic allocation of the number of barrels.
the tradeoff strategy of HashMap
To dynamically allocate the number of buckets, this requires a trade-off strategy, and HashMap's tradeoff strategy is this:
If HashMap size > HashMap capacity (i.e. entry[] table size) * load factor (experience value 0.75)
The capacity of the entry[] table in HashMap is expanded to the current one-fold;
Then reassign the entry<key,value> linked list in the previous bucket to each bucket
The capacity of the above hashmap (i.e. the size of the entry[] table) * Load factor (experience value 0.75) is the so-called valve value (threshold):
Threshold (threshold) = capacity (capacity) * load factor (load factor)
Capacity (capacity): Refers to the length of the HashMap internal entry[] Table linear array
Load factor (load factor): default is 0.75
Valve value (threshold): When the hashmap size exceeds the valve value, HASHMAP will expand twice times and rehash.
Finally, look at an example:
The HashMap map =new HashMap () is created by default, and the capacity of the map is 16, so when we add the first few completely different key values to the map <Key,Value>, the capacity of the HASHMAP expands.
Very simple calculation: because the default load factor is 0.75, then the map's valve value is 16*0.75 = 12, that is, when the 13th key value is added to <Key,Value>, the map's capacity will expand by one times.
There may be a doubt: the capacity of entry[] table is 16, when the 12 key value to <Key,Value>, not at least 4 entry[] TABLE element is not used. Isn't this a waste of precious space? This is true, but in order to reduce the length of the entry<key,value> linked list in the bucket as much as possible to improve the HashMap's access performance, this empirical value is determined. If you're not too high on access efficiency, and you want to save some space, you can set the new HashMap (int initialcapacity, float loadfactor) to make this factor larger.
2. HashMap Algorithm Implementation Analysis
The two most important implementations of the HASHMAP algorithm are put () and get () two methods, and I will analyze the two methods below:
Public V-Put (K key, v value);
Public V get (Object key);
In addition, the HASHMAP supports the case where the key value is null and will be discussed next.
1. Store a pair of key values to the HashMap <Key,Value> process---put () method to implement:
put () method-store key value to HashMap <Key,Value>
A. Obtain the hashcode value of this key, determining, based on this value, which bucket the pair of key values should be stored in, that is, the index of the bucket to be stored;
B. Traverse the entry<key,value> list in the bucket to find out if there are any entry<key,value> objects stored with key values.
C1. If it exists, the corresponding Entry<key,value> is positioned, the value value of which is updated to the new value, and the old value is returned;
C2. If not, a new Entry<key,value> object is created for <Key,Value> based on the key value, and then added to the head of the bucket's entry<key,value> list.
D. Whether the current size of the HashMap (that is, the number of entry<key,value> nodes) exceeds the threshold, if the threshold (threshold) is exceeded, the capacity of the hashmap (that is, the entry[) table is increased, And rearrange the various entry<key,value> within the organization.
The detailed process is shown in the following code:
/** * Saves the <Key,Value> key value to HashMap, and if the key already exists in HashMap, it eventually returns the substituted value.
* Key and value allow NULL/public V to put (K key, V value) {//1. If Key is null, place this value in table[0], that is, the first bucket
if (key = = null) return Putfornullkey (value);
2. Recalculate hashcode value, int hash = hash (Key.hashcode ());
3. Calculate which bucket the current hashcode value should be assigned to, get the index of the bucket int i = indexfor (hash, table.length);
4. Loop through the bucket's Entry list for (entry<k,v> e = table[i]; e!= null; e = e.next) {Object K; 5. To find out if there is already a Entry<key,value> object stored with key value in the Entry<key,value> list,//already exists, then overwrite the value value to the corresponding Entry<key,value
> Object node if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))) {//Please pay attention to this judgment condition, very important ...
V oldValue = E.value;
E.value = value;
E.recordaccess (this);
return oldValue;
}} modcount++; 6 does not exist, then according to the keyValue pairs <Key,Value> creates a new Entry<key,value> object, and then adds the head to this bucket's entry<key,value> list.
AddEntry (hash, key, value, I);
return null;
/** * Key is NULL, the entry<null,value> is placed in the first barrel table[0]/private v Putfornullkey (v Value) { for (entry<k,v> e = table[0]; e!= null; e = e.next) {if (E.key = null) {V Oldvalu
e = E.value;
E.value = value;
E.recordaccess (this);
return oldValue;
}} modcount++;
AddEntry (0, NULL, value, 0);
return null;
}
/**
* Recalculate hash value based on specific hashcode,
* Because the JVM generated hashcode low byte (lower bits) conflict probability, (JDK just so, as for why I do not know)
* to improve performance , HashMap hashcode The key, takes the high byte of the key's hashcode to participate in the operation/
static int hash (int h) {
//This function ensures that Hashcodes that differ only by
//constant multiples at each bit position have a bounded
//number of collisions (a Pproximately 8 at default load factor).
H ^= (H >>>) ^ (h >>>);
Return h ^ (H >>> 7) ^ (H >>> 4);
}
/**
* Returns the index of the bucket to which this hashcode should be allocated * *
static int indexfor (int h, int length) {return
H & (length-1); c16/>}
when the size of the hashmap is larger than the valve value, the expansion algorithm of HASHMAP capacity
When the current hashmap size is greater than the threshold, HashMap expands the capacity of the HashMap, which is the internal entry[] table array.
HashMap has two requirements for capacity (entry[] Table array length):
1. The size of the capacity should be 2 power of n times;
2. When the capacity size exceeds the valve value, the capacity expands to the current one times;
2nd is very important here, if the current hashmap capacity of 16, the need to expand, the capacity will become 16*2 = 32, followed by 32*2=64, 64*2=128, 128*2=256 ... as you can see, the size of capacity expansion is exponential level increment.
This capacity expansion operation can be divided into the following steps:
1. Apply for a new array of twice times the size of the current capacity;
2. Recalculate the hash value from the list in the entry[] table of the old array, and then place it evenly into the new extended array;
3. Release the old array;
From the above capacity expansion process, the cost of a single capacity expansion is very high, so when capacity expansion, the expansion ratio is the current one, so as to minimize the number of capacity expansion.
to improve the performance of HashMap:
1. In the process of using HashMap, you are more specific about how much entry<key,value> it will hold; you should specify its capacity directly when creating hashmap;
2. If you are sure that the hashmap is used in a very large size, then you should control the size of the load factor and try to set it larger. Avoid entry[] table is too large, and utilization is very low.
/** * Rehashes The contents of this map to a new array with a * larger capacity.
This are called automatically when the * number of keys in this map reaches its threshold. * * If current capacity are maximum_capacity, this method does not * Resize the map, but sets threshold to Intege
R.max_value.
* This has the effect of preventing future calls.
* * @param newcapacity The new capacity, must is a power of two; * Must be greater than current capacity unless current * capacity-maximum_capacity (in which case VA
Lue * is irrelevant).
*/void Resize (int newcapacity) {entry[] oldtable = table;
int oldcapacity = Oldtable.length;
if (oldcapacity = = maximum_capacity) {threshold = Integer.max_value;
Return
} entry[] newtable = new Entry[newcapacity];
Transfer (newtable);
Table = newtable; threshold = (int) (Newcapacity * loadfactor);
}/** * Transfers all entries from the current table to newtable.
*/void Transfer (entry[] newtable) {entry[] src = table;
int newcapacity = Newtable.length;
for (int j = 0; J < Src.length; J + +) {entry<k,v> e = src[j];
if (e!= null) {SRC[J] = null;
do {entry<k,v> next = E.next;
int i = indexfor (E.hash, newcapacity);
E.next = Newtable[i];
Newtable[i] = e;
e = next;
while (e!= null);
}
}
}
Why does the JDK recommend that when we rewrite the Object.Equals (object obj) method, we need to ensure that the object can return the same hashcode value.
Java programmers have looked at the JDK's API documentation, which describes the Object.Equals (Object obj) method:
"Note: When this method is overridden, it is often necessary to override the Hashcode method to maintain the general contract for the Hashcode method, which declares that the equality object must have an equal hash code. ”
Some people know this agreement, but do not really know why there is such a request, now, to see why.
Take a look at the above put () method implementation, when traversing a bucket of entry<key,value> linked list to find the entry instance of the process used in the judgment conditions:
for (entry<k,v> e = table[i]; e!= null; e = e.next) {
Object K;
5. To find out if there is already a Entry<key,value> object stored with key value in the Entry<key,value> list,
//already exists, then overwrite the value value to the corresponding Entry<key. Value> Object Node
if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))) {
V oldValue = e.value;
E.value = value;
E.recordaccess (this);
return oldValue;
}
For a given key,value, to determine whether the key is equal to the key value of a entry object in the entry list (K==e.key) ==key) | | Key.equals (k), there is also a judgment condition: that key after the hash function converted hash value and the current entry Object hash property value is equal (the hash property value and entry key after the hash method conversion of the hash value is equal).
The above situation can be summed up as: HashMap to determine whether the key exists in the HASHMAP has two requirements
1. Key value is equal;
2. Whether the hashcode is equal;
So when we define a class, if the Equals () method is overridden, but hashcode does not guarantee equality, it causes the problem of HashMap "work exception" to occur when the class instance is used as a key value in the HashMap, and there is a situation that you do not want. Let's take a look at this "work exception" scenario by using an example:
Example: Define a simple employee class, override the Equals method, without overriding the Hashcode () method. You then use this class to create two instances and place them in a hashmap:
package Com.hash; /** * Simple Employee Bean, overriding the Equals method, not overriding Hashcode () method * @author Louluan/public class Employee {private String Employe
Ecode;
private String name;
Public Employee (String employeecode, string name) {This.employeecode = Employeecode;
THIS.name = name;
Public String Getemployeecode () {return employeecode;
Public String GetName () {return name;
@Override public boolean equals (Object o) {if (o instanceof employee) {Employee E = (employee) O;
if (This.employeeCode.equals (E.getemployeecode ()) && name.equals (E.getname ())) {return true;
return false; }
}
Package Com.hash;
Import Java.util.HashMap;
public class Test {public
static void Main (string[] args) {
employee em1= New Employee ("123", "Anndy");
Employee em2= New Employee ("123", "Anndy");
Boolean equals= em1.equals (EM2);
System.out.println ("Em1 equals em2" +equals);
HashMap map = new HashMap ();
Map.put (EM1, "test1");
Map.put (EM2, "test2");
SYSTEM.OUT.PRINTLN ("Map Size:" +map.size ());
}
<em><u><span style= "Font-family:courier new;color: #000000; Background-color:rgb (240, 240, 240);" >
</span></u></em>
Run Result:
Em1 equals em2? True
Map Size:2
Results Analysis:
In the above example, we use the new Employee ("123", "Anndy"); Statement creates two identical objects em1,em2, for us, they are the same object, and then we put the two objects we think are equal as Key values into the HashMap, and we want the result: the Entry<key,value in HashMap > The number of key value pairs should be one, and the value value of the entry object should be replaced by "test1" to "test2", but the actual result is: the size of the HashMap is 2, that is, there are two hashmap in the entry<key,value> Key value pair ...
The reason is now clear: because the hashcode () of the Em1 and Em2 objects inherit from object, they return two different values, that is, the EM2 values for EM1 and hashcode are not the same.
As you can see from the above example:
When we rewrite the Object.Equals (object obj) method, we need to ensure that the object can return the same hashcode. Otherwise, the HashMap will have an uncontrollable anomaly at work.
2. Implementation of the Get () method:
The result of taking value from HashMap based on a specific key value is simpler:
Get () method-takes value from HashMap according to key
A. Get the hashcode value of this key, and according to the Hashcode value, decide which bucket should be looked up from;
B. Traverse the entry<key,value> list in the bucket to find out if there is a entry<key,value> object stored with key value as key;
C1. If it exists, locate to the corresponding Entry<key,value>, return Value;
C2. If it does not exist, return null;
The specific algorithm is as follows:
/** * Returns the value to which the specified key was mapped, * or {@code null} If this map contains no Mappin
g for the key. * Returns the value of the key, or null if not in HASHMAP; * Support key is NULL condition * <p>more formally, if this map contains a mapping From a key * {@code k} to a value {@code v.} such that {@code (key==null? K==null: * Key.equals (k))}, then thi S method returns {@code V}; otherwise * It returns {@code null}.
(There can be at most one such mapping.) * <p>a return value of {@code null} does not <i>necessarily</i> * indicate that map cont Ains no mapping for the key;
It's also * possible this map explicitly maps the key to {@code null}.
* The {@link #containsKey ContainsKey} operation May is used to * distinguish these two cases. * * @see #put (object, Object) */Public V get (object key) {if (key = null) return GETF
Ornullkey (); int hash = haSH (Key.hashcode ());
Traversal list for (entry<k,v> e = table[indexfor (hash, table.length)];
e!= null;
E = E.next) {Object k;
if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k)) return e.value;
return null;
}
3.HASHMAP support for key null condition
HashMap allows the key to be accessed in null form, HASHMAP will place the key null entry<null,value> into table[0], that is, the first bucket, when the put () and get () operation, the key Special handling for NULL values:
/** * offloaded version of ' Get ' () to the look up
null keys. Null keys Map
* to index 0. This null case was split out into separate methods
* for the sake of performance in two most commonly used
* OP Erations (get and put), but incorporated with conditionals in
* others.
* Getćä˝ */
private V Getfornullkey () {for
(entry<k,v> e = table[0]; e!= null; e = e.next) {
I F (e.key = null) return
e.value;
}
return null;
}
/**
* Key is NULL, the entry<null,value> is placed in the first barrel table[0]
/
Private v Putfornullkey (v Value) {for
( Entry<k,v> e = table[0]; e!= null; E = e.next) {
if (E.key = = null) {
V oldValue = e.value;
E.value = value;
E.recordaccess (this);
Return OldValue
}
}
modcount++;
AddEntry (0, NULL, value, 0);
return null;
}
4. The implementation of key-value entry<key,value> removal----Remove (key) method
The operation of removing key value pairs based on key values is also simpler, and the internal key processes are divided into two:
1. Position the Entry<key,value> object in HashMap according to the hashcode value and key of the key;
2. Since entry<key,value> is a linked list element, it is the operation of the linked list to delete nodes;
/** * Removes the mapping for the specified key from this map if present. * * @param key key whose mapping is to removed the map * @return the previous value associated
; tt>key</tt>, or * <tt>null</tt> if there is no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate the map * previously associated
;tt>null</tt> with <tt>key</tt>.)
*/Public V Remove (Object key) {entry<k,v> E = Removeentryforkey (key);
return (E = null null:e.value); }/** * Removes and returns the entry associated with the specified key * in the HASHMAP.
Returns NULL if the HASHMAP contains no mapping * for this key.
* * Final entry<k,v> Removeentryforkey (Object key) {int hash = (key = = null)? 0:hash (Key.hashcode ());
int i = indexfor (hash, table.length); Entry<k,v> prev = table[i];
entry<k,v> e = prev;
while (e!= null) {entry<k,v> next = E.next;
Object K; if (E.hash = = Hash && (k = e.key) = = Key | | (Key!= null && key.equals (k)))
{modcount++;
size--;
if (prev = = e) Table[i] = next;
else Prev.next = next;
E.recordremoval (this);
return e;
} prev = e;
e = next;
return e;
}
3, HashMap Characteristics of the summary:
1. HashMap is thread insecure, and if you want to use thread-safe, you can use Hashtable, which provides the same functionality as HashMap. HashMap is actually a lightweight implementation of a hashtable;
2. Allow <null,Value> key pairs to be stored in the form of key null;
3. HashMap search efficiency is very high, because it uses the hash table to search, can be directly positioned to the key value in the bucket;
4. When using HashMap, pay attention to the relationship between HashMap capacity and load factor, which will directly affect the HashMap performance problem. The load factor is too small, will improve the search efficiency of hashmap, but also consumes a lot of memory space, load factor is too large, save space, but will lead to hashmap search efficiency is reduced.