In the Java interview HashMap should say a required topic, and HASHMAP and HashSet are two important members of the Java Collection Framework, where HashMap is the common implementation class of the Map interface, HashSet is Set Common implementation classes for interfaces. Although the interface specifications implemented by HASHMAP and HashSet are different, their underlying Hash storage mechanism is exactly the same, even HashSet itself is implemented using HASHMAP.
The Hash storage mechanism is analyzed through the source code of HASHMAP and HashSet.
In fact, there are many similarities between HashSet and HashMap, for HashSet, the system uses the Hash algorithm to determine the storage location of the collection elements, so as to ensure that the set elements can be stored and taken quickly, and for HashMap, the system key-value as a whole Line processing, the system always calculates the storage location of the Key-value according to the Hash algorithm, so that the key-value of the MAP can be saved and taken quickly.
Before you introduce the collection store, it is important to point out that although the collection is supposed to store Java objects, it does not actually put the Java objects in the Set collection, only the references to those objects in the Set collection. That is, the Java collection is actually a collection of multiple reference variables that point to the actual Java object.
Collections and references
Just like an array of reference types, when we put a Java object into an array, we don't actually put the Java object in the array, just put the object's reference into the array, each array element is a reference variable.
Storage implementation of HASHMAP
When the program tries to put multiple key-value into HashMap, take the following code snippet as an example:
New Hashmap<string, double> (); Map.put ("Language", 80.0), Map.put ("mathematics", 89.0); Map.put (
HashMap uses a so-called "Hash algorithm" to determine where each element is stored.
When the program executes Map.put ("language", 80.0); , the system will invoke the Hashcode () method of the "language" to get its hashcode value-each Java object has a hashcode () method that can be used to obtain its hashcode value. After you get the Hashcode value for this object, the system determines where the element is stored, based on the hashcode value.
We can look at the source code of the put (K key, V value) method of the HashMap class:
1 Publicv put (K key, V value) {2 //If key is null, call the Putfornullkey method to process3 if(Key = =NULL) 4 returnPutfornullkey (value);5 //calculates the hash value based on the keycode of key6 inthash =Hash (Key.hashcode ());7 //searches for the index of the specified hash value in the corresponding table8 inti =indexfor (hash, table.length);9 //if the Entry at the I index is not NULL, the next element of the E element is traversed continuously through the loopTen for(entry<k,v> e = table[i]; E! =NULL; E =e.next) { One Object K; A //the specified key is found equal to the key to be placed (the hash value is the same//put true by equals comparison) - if(E.hash = = Hash && (k = e.key) = = Key | |Key.equals (k))) - { theV OldValue =E.value; -E.value =value; -E.recordaccess ( This); - returnOldValue; + } - } + //if the Entry at the I index is NULL, it indicates that there is no Entry
modcount++;
at //add key, value to index at I
AddEntry (hash, key, value, I);
return null; - } -
The above program uses an important internal interface: Map.entry, each map.entry is actually a key-value pair. As can be seen from the above program: when the system decides to store the Key-value pair in the HASHMAP, it does not take into account the value in Entry, but only calculates and determines the storage location of each Entry based on key. This also illustrates the previous conclusion: we can completely consider the value of the MAP set as a subsidiary of the key, and when the system determines where the key is stored, value is stored there.
The method above provides a method for calculating the hash code based on the Hashcode () return value: hash (), which is a purely mathematical calculation with the following method:
1 Static int hash (int h) {2 H ^= (H >>>) ^ (H >>>3
return H ^ (H >>> 7) ^ (H >>> 44
For any given object, the hash code value computed by the program call hash (int h) method is always the same as long as its hashcode () return value is the same. The program then calls the indexfor (int h, int length) method to calculate at which index the object should be stored in the table array. The code for the indexfor (int h, int length) method is as follows:
Static int indexfor (intint length) { return H & (length-1
This method is very ingenious, it always through H & (table.length-1) to get the object's save location-and HashMap the bottom array is always 2 of the length of the N-square, this point can be see later about the HashMap constructor of the introduction.
When length is always a multiple of 2, H & (Length-1) will be a very ingenious design: assuming h=5,length=16, then H & Length-1 will get 5; if h=6,length=16, then H &am P Length-1 will get 6 ... If h=15,length=16, then H & Length-1 will get 15, but when H=16, length=16, then H & Length-1 will get 0; when h=17, length=1 6 o'clock, then H & length-1 will get 1 ... This ensures that the computed index value is always within the index of the table array.
According to the source code of the Put method above, when the program tries to put a key-value pair into HashMap, the program first determines the storage location of the Entry based on the hashcode () return value of the key: if the Entry of two HASHC keys The ODE () return values are the same, and they are stored in the same location. If these two Entry keys return true by equals, the newly added Entry value overrides the Entry value in the collection, but the key is not overwritten. If these two Entry keys return false by Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain--Specify to continue to see AddEntry () Description of the method.
When adding key-value pairs to HashMap, the hashcode () return value of its key determines where the key-value pair (that is, the Entry object) is stored. When the hashcode () return value of the key of the two Entry object is the same, it is determined by the key through the Eqauls () comparison value whether the overwrite behavior (return true) or the Entry chain (return false) is used.
The above program also calls the AddEntry (hash, key, value, I); Code, where AddEntry is a method that HASHMAP provides access to a package that is used only to add a key-value pair. Here is the code for the method:
1 voidAddEntry (intHash, K key, V value,intBucketindex) { 2 //gets the specified Bucketindex index at the3Entry entry<k,v> e = Table[bucketindex];//①4 //Place the newly created Entry into the Bucketindex index and let the new Entry point to the original5Entry Table[bucketindex] =NewEntry<k,v>(hash, key, value, e);6 //if the number of key-value pairs in the Map exceeds the limit7 if(size++ >=threshold)8 //extends the length of the table object to twice times. 9Resize (2 * table.length);//②Ten}
The code for the above method is simple, but it contains a very elegant design: The system always places the newly added Entry object in the Bucketindex index of the table array--If a Entry object is already at the Bucketindex index, the newly added Entry Like pointing to the original Entry object (producing a Entry chain), if there is no Entry object at the Bucketindex index, that is, the e variable of the program ① code above is null, that is, the newly placed Entry object points to null, that is, no resulting Entry Chain.
JDK Source
In the JDK installation directory, you can find a Src.zip compressed file that contains all the source files for the Java base Class library. As long as the reader is interested in learning, you can open this compressed file to read the Java class Library source code, which is very helpful to improve the reader's programming ability. It should be noted that the source code contained in the Src.zip does not contain the Chinese comments as described above, which are added by the author himself.
Performance options for Hash algorithms
As can be seen from the above code, in the case of the same bucket storage Entry chain, the new Entry is always in the bucket, and the first place in the bucket Entry is located at the end of the Entry chain.
There are two more variables in the above program:
* Size: This variable holds the key-contained in the HASHMAP
From the ② code in the above program, it can be seen that when size++ >= threshold, HashMap automatically calls resize method to expand the capacity of HashMap. For each expansion, the capacity of the HashMap is increased by one-fold.
The table used in the above program is actually an ordinary array, each array has a fixed length, the length of the array is HashMap capacity. The HASHMAP contains the following constructors:
* HASHMAP (): Build an initial capacity of 16, load factor is 0.75* HASHMAP (int initialcapacity): Build an initial capacity of initialcapacity, load factor is 0.75* HASHMAP (intfloat
When creating a HashMap, the system automatically creates a table array to hold the Entry in HashMap, and the following is the code for a constructor in HashMap:
1 PublicHashMap (intInitialcapacity,floatloadfactor) { 2 //initial capacity cannot be negative3 if(Initialcapacity < 0) 4 Throw NewIllegalArgumentException ("Illegal initial capacity:" +initialcapacity); 5 //If the initial capacity is greater than the maximum capacity, show capacity6 if(Initialcapacity >maximum_capacity)7Initialcapacity =maximum_capacity;8 //the load factor must be greater than 0 value9 if(loadfactor <= 0 | |Float.isnan (loadfactor))Ten Throw Newillegalargumentexception (loadfactor); One //calculates the minimum 2 N-square value that is greater than initialcapacity. A intCapacity = 1; - while(Capacity <initialcapacity) -Capacity <<= 1; the This. Loadfactor =Loadfactor; - //set capacity limit equals capacity * load factor -Threshold = (int) (Capacity *loadfactor); - //initializing the table array +Table =NewEntry[capacity];//① - init (); +}
The bold code in the above code contains a concise code implementation: Find the minimum 2 N-square value greater than initialcapacity, and use it as the actual capacity of the HashMap (saved by the capacity variable). For example, given a initialcapacity of 10, the actual capacity of the HASHMAP is 16.
Program ① code can be seen: the essence of table is an array, a length of capacity array.
For HashMap and its subclasses, they use a Hash algorithm to determine where the elements in the collection are stored. When the system starts initializing HASHMAP, the system creates an Entry array of length capacity, where the elements can be stored in a bucket, each bucket has its specified index, and the system can quickly access the bucket based on its index. Elements stored in the.
Read implementation of HASHMAP
HASHMAP has the best performance when the Entry stored in each bucket of HASHMAP is simply a single entry--that is not generated by a pointer Entry: When the program takes the value out of the key, the system calculates the key first The Hashcode () return value that finds the index of the key in the table array based on the Hashcode return value, then takes out the Entry at that index, and finally returns the value corresponding to the key. Look at the Get (K key) method code for the HashMap class:
1 PublicV get (Object key) {2 //If key is null, call Getfornullkey to remove the corresponding value3 if(Key = =NULL) 4 returnGetfornullkey ();5 //calculates its hash code based on the hashcode value of the key6 inthash =Hash (Key.hashcode ());7 //directly takes the value at the specified index in the table array,8 for(Entry<k,v> e =table[indexfor (hash, table.length)];9E! =NULL; Ten //search for the next Entry chain OneEntr e = e.next)//① A { - Object K; - //If the Entry key is the same as the key being searched the if(E.hash = = Hash && (k = e.key) = = Key | |Key.equals (k))) - returnE.value; - } - return NULL; +}
As can be seen from the above code, if there is only one Entry in each bucket of HashMap, HashMap can quickly take out the Entry in the bucket according to the index, and in the case of "Hash conflict", the single bucket is not stored in an E Ntry, instead of a Entry chain, the system must traverse each Entry sequentially until it finds the Entry to search for--if the Entry that happens to be searched is at the very end of the Entry chain (the Entry is first placed in the bucket), the system must loop to the Before you can find the element.
summed up simply, HashMap at the bottom of the key-value as a whole to deal with, this whole is a Entry object. HashMap the bottom of a entry[] array to hold all key-value pairs, when a Entry object needs to be stored, according to the hash algorithm to determine its storage location, when the need to remove a Entry, the hash algorithm will also find its storage location, directly take out The Entry. Thus: HashMap is able to quickly save, take it contains the Entry, exactly like the real life of the mother taught us: different things to put in different locations, when needed to quickly find it.
When creating a HashMap, there is a default load factor (load factor) with a default value of 0.75, which is a tradeoff between time and space costs: increasing the load factor can reduce the memory footprint of the Hash table (which is the Entry array), but increases the time overhead of querying the data , and the query is the most frequent operation (the HashMap get () and the Put () method all use the query); Reducing the load factor will improve the performance of the data query, but will increase the memory space occupied by the Hash table.
Having mastered the above knowledge, we can adjust the value of load factor according to the actual need when we create HASHMAP, if the program is concerned about space overhead, memory is more tense, the load factor can be increased appropriately, if the program is more concerned about the time overhead, A more comfortable memory can reduce the load factor appropriately. Typically, programmers do not need to change the value of the load factor.
If you start to know that HashMap will save multiple key-value pairs, you can use a large initialization capacity at creation time, if the number of Entry in HashMap never exceeds the limit capacity (capacity * load factor), HASHMAP does not need to call The resize () method re-allocates the table array to ensure good performance. Of course, starting to set the initial capacity too high can be a waste of space (the system needs to create a Entry array of length capacity), so initializing the capacity setting when creating HashMap also requires careful treatment.
Docmike
Links: http://www.imooc.com/article/19343
Source: MU-Class Network
HashMap of the Java interview