HashMap, HashSet source code analysis its Hash storage mechanism

Source: Internet
Author: User

collections and References

Just like an array of reference types, when we put a Java object into an array, we don't actually put the Java object in the array, just put the object's reference into the array, each array element is a reference variable.

In fact, there are many similarities between HashSet and HashMap, for HashSet, the system uses the Hash algorithm to determine the storage location of the collection elements, so as to ensure that the set elements can be stored and taken quickly, and for HashMap, the system key-value as a whole Line processing, the system always calculates the storage location of the Key-value according to the Hash algorithm, so that the key-value of the MAP can be saved and taken quickly.

Before you introduce the collection store, it is important to point out that although the collection is supposed to store Java objects, it does not actually put the Java objects in the Set collection, only the references to those objects in the Set collection. That is, the Java collection is actually a collection of multiple reference variables that point to the actual Java object.

storage implementation of HashMap

When the program tries to put multiple key-value into HashMap, take the following code snippet as an example:

hashmap<string, double> map = new hashmap<string, double> ();  Map.put ("language", 80.0);  Map.put ("mathematics", 89.0);  Map.put ("English", 78.2);

HashMap uses a so-called "Hash algorithm" to determine where each element is stored.

When the program executes Map.put ("language", 80.0); , the system will invoke the Hashcode () method of the "language" to get its hashcode value-each Java object has a hashcode () method that can be used to obtain its hashcode value. After you get the Hashcode value for this object, the system determines where the element is stored, based on the hashcode value.

We can look at the source code of the put (K key, V value) method of the HashMap class:

Public V put (K key, V value)  {  //If key is null, call the Putfornullkey method to process if (key = = null)  return Putfornullkey ( value);  Computes the hash value int hash = hash (Key.hashcode ()) according to the keycode of key;  int i = indexfor (hash, table.length);  If the Entry at the I index is not NULL, the next element of the E element is continuously traversed by looping for (entry<k,v> e = table[i]; e = null; e = e.next)  {  Object K;  Find the specified key equal to the key that needs to be placed (the hash value is the same//put back true by equals) if (E.hash = = Hash && (k = e.key) = = Key  | | key.eq Uals (k)))  {  V oldValue = e.value;  E.value = value;  E.recordaccess (this);  return oldValue;  }  }  If the Entry at the I index is NULL, it indicates that there is no Entry  modcount++ here;  Add key, value to the I index addentry (hash, key, value, I);  return null;  }
JDK Source Code

In the JDK installation directory, you can find a Src.zip compressed file that contains all the source files for the Java base Class library. As long as the reader is interested in learning, you can open this compressed file to read the Java class Library source code, which is very helpful to improve the reader's programming ability. It should be noted that the source code contained in the Src.zip does not contain the Chinese comments as described above, which are added by the author himself.

The above program uses an important internal interface: Map.entry, each map.entry is actually a key-value pair. As can be seen from the above program: when the system decides to store the Key-value pair in the HASHMAP, it does not take into account the value in Entry, but only calculates and determines the storage location of each Entry based on key. This also illustrates the previous conclusion: we can completely consider the value of the MAP set as a subsidiary of the key, and when the system determines where the key is stored, value is stored there.

The method above provides a method for calculating the hash code based on the Hashcode () return value: hash (), which is a purely mathematical calculation with the following method:

static int hash (int h) {     h ^= (H >>> a) ^ (h >>> N);     Return h ^ (H >>> 7) ^ (H >>> 4); }

For any given object, the hash code value computed by the program call hash (int h) method is always the same as long as its hashcode () return value is the same. The program then calls the indexfor (int h, int length) method to calculate at which index the object should be stored in the table array. The code for the indexfor (int h, int length) method is as follows:

static int indexfor (int h, int length) {     return H & (length-1);}

This method is very ingenious, it always through H & (table.length-1) to get the object's save location-and HashMap the bottom array is always 2 of the length of the N-square, this point can be see later about the HashMap constructor of the introduction.

When length is always a multiple of 2, H& (length-1)将是一个非常巧妙的设计:假设 h=5,length=16, 那么 h & length - 1 将得到 5;如果 h=6,length=16, 那么 h & length - 1 将得到 6 ……如果 h=15,length=16, 那么 h & length - 1 将得到 15;但是当 h=16 时 , length=16 时,那么 h & length - 1 将得到 0 了;当 h=17 时 , length=16 时,那么 h & length - 1 将得到 1 了……这样保证计算得到的索引值总是位于 table 数组的索引之内。

According to the source code of the Put method above, when the program tries to put a key-value pair into HashMap, the program first determines the storage location of the Entry based on the hashcode () return value of the key: if the Entry of two HASHC keys The ODE () return values are the same, and they are stored in the same location. If these two Entry keys return true by equals, the newly added Entry value overrides the Entry value in the collection, but the key is not overwritten. If these two Entry keys return false by Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain--Specify to continue to see AddEntry () Description of the method.

When adding key-value pairs to HashMap, the hashcode () return value of its key determines where the key-value pair (that is, the Entry object) is stored. When the hashcode () return value of the key of the two Entry object is the same, it is determined by the key through the Eqauls () comparison value whether the overwrite behavior (return true) or the Entry chain (return false) is used.

The above program also calls the AddEntry (hash, key, value, I); Code, where AddEntry is a method that HASHMAP provides access to a package that is used only to add a key-value pair. Here is the code for the method:

void AddEntry (int hash, K key, V value, int bucketindex) {     //Get Entry     entry<k,v> e = tab at the specified Bucketindex index Le[bucketindex];  ①    //Add the newly created Entry to the Bucketindex index and let the new Entry point to the original Entry     Table[bucketindex] = new entry<k,v> (hash, key, Value, e);     If the number of key-value pairs in the Map exceeds the limit    if (size++ >= threshold)         //extends the length of the table object to twice times.        Resize (2 * table.length);  ②}

The code for the above method is simple, but it contains a very elegant design: The system always places the newly added Entry object in the Bucketindex index of the table array--If a Entry object is already at the Bucketindex index, the newly added Entry Like pointing to the original Entry object (producing a Entry chain), if there is no Entry object at the Bucketindex index, that is, the e variable of the program ① code above is null, that is, the newly placed Entry object points to null, that is, no resulting Entry Chain.

Performance options for Hash algorithms

As can be seen from the above code, in the case of the same bucket storage Entry chain, the new Entry is always in the bucket, and the first place in the bucket Entry is located at the end of the Entry chain.

There are two more variables in the above program:

    • Size: The variable holds the number of key-value pairs contained in the HASHMAP.
    • Threshold: This variable contains the limit of the key-value pair that the HASHMAP can hold, and its value equals the capacity of HashMap multiplied by the load factor (load factor).

From the ② code in the above program, it can be seen that when size++ >= threshold, HashMap automatically calls resize method to expand the capacity of HashMap. For each expansion, the capacity of the HashMap is increased by one-fold.

The table used in the above program is actually an ordinary array, each array has a fixed length, the length of the array is HashMap capacity. The HASHMAP contains the following constructors:

    • HashMap (): Constructs a HashMap with an initial capacity of 16 and a load factor of 0.75.
    • HashMap (int initialcapacity): Constructs a HashMap with an initial capacity of initialcapacity and a load factor of 0.75.
    • HashMap (int initialcapacity, float loadfactor): Creates a HashMap with the specified initial capacity, specified load factor.

When creating a HashMap, the system automatically creates a table array to hold the Entry in HashMap, and the following is the code for a constructor in HashMap:

To specify initialization capacity, load factor create HashMap public  HashMap (int initialcapacity, float loadfactor)  {  //initial capacity cannot be negative if ( Initialcapacity < 0)  throw new IllegalArgumentException ("Illegal initial capacity:" +  initialcapacity);  If the initial capacity is greater than the maximum capacity, let present the capacity if (Initialcapacity > maximum_capacity)  initialcapacity = maximum_capacity;  The load factor must be greater than 0 if (loadfactor <= 0 | | Float.isnan (loadfactor))  throw new IllegalArgumentException (  loadfactor);  Calculates the minimum 2 N-square value that is greater than initialcapacity. int capacity = 1;  while (capacity < initialcapacity)  capacity <<= 1;  This.loadfactor = Loadfactor;  Set capacity limit equals capacity * load factor threshold = (int) (capacity * loadfactor);  Initialize table Array table = new entry[capacity];  ①init ();  }

The bold code in the above code contains a concise code implementation: Find the minimum 2 N-square value greater than initialcapacity, and use it as the actual capacity of the HashMap (saved by the capacity variable). For example, given a initialcapacity of 10, the actual capacity of the HASHMAP is 16.

capacity of initialcapacity and HashTable

The initialcapacity specified at the time the HASHMAP was created is not equal to the actual capacity of HashMap, and generally HASHMAP's actual capacity is always greater than initialcapacity unless the initialcapacity parameter values we specify are exactly is 2 of the n-th square. Of course, once you have mastered the knowledge of HASHMAP capacity allocation, you should specify the Initialcapacity parameter value as an n-order of 2 when creating HASHMAP, which reduces the computational overhead of the system.

Program ① code can be seen: the essence of table is an array, a length of capacity array.

For HashMap and its subclasses, they use a Hash algorithm to determine where the elements in the collection are stored. When the system starts initializing HASHMAP, the system creates an Entry array of length capacity, where the elements can be stored in a bucket, each bucket has its specified index, and the system can quickly access the bucket based on its index. Elements stored in the.

Whenever a HashMap "bucket" stores only one element (that is, one Entry), the Entry object can contain a reference variable (the last parameter of the Entry constructor) to point to the next Entry, so the possible scenario is: HASHMAP There is only one Entry in the bucket, but this Entry points to another entry--which forms a chain of Entry. 1 is shown below:

Figure 1. Storage schematic of the HASHMAP

read implementation of HashMap

HASHMAP has the best performance when the Entry stored in each bucket of HASHMAP is simply a single entry--that is not generated by a pointer Entry: When the program takes the value out of the key, the system calculates the key first The Hashcode () return value that finds the index of the key in the table array based on the Hashcode return value, then takes out the Entry at that index, and finally returns the value corresponding to the key. Look at the Get (K key) method code for the HashMap class:

Public V get (Object key)  {  //If key is null, call Getfornullkey to remove the corresponding value  if (key = = null)  return Getfornul Lkey ();  Computes its hash code int hash = hash (Key.hashcode ()) According to the hashcode value of the key;  Directly takes out the value at the specified index in the table array, for (entry<k,v> e = table[indexfor (hash, table.length)];  E! = null;  Search the Entry chain for the next Entr  e = e.next)  //①{  Object K;  If the Entry key is the same as the key being searched if (E.hash = = Hash && (k = e.key) = = Key  | | key.equals (k))  return E.val UE;  }  return null;  }

As can be seen from the above code, if there is only one Entry in each bucket of HashMap, HashMap can quickly take out the Entry in the bucket according to the index, and in the case of "Hash conflict", the single bucket is not stored in an E Ntry, instead of a Entry chain, the system must traverse each Entry sequentially until it finds the Entry to search for--if the Entry that happens to be searched is at the very end of the Entry chain (the Entry is first placed in the bucket), the system must loop to the Before you can find the element.

summed up simply, HashMap at the bottom of the key-value as a whole to deal with, this whole is a Entry object. HashMap the bottom of a entry[] array to hold all key-value pairs, when a Entry object needs to be stored, according to the hash algorithm to determine its storage location, when the need to remove a Entry, the hash algorithm will also find its storage location, directly take out The Entry. Thus: HashMap is able to quickly save, take it contains the Entry, exactly like the real life of the mother taught us: different things to put in different locations, when needed to quickly find it.

When creating a HashMap, there is a default load factor (load factor) with a default value of 0.75, which is a tradeoff between time and space costs: increasing the load factor can reduce the memory footprint of the Hash table (which is the Entry array), but increases the time overhead of querying the data , and the query is the most frequent operation (the HashMap get () and the Put () method all use the query); Reducing the load factor will improve the performance of the data query, but will increase the memory space occupied by the Hash table.

Having mastered the above knowledge, we can adjust the value of load factor according to the actual need when we create HASHMAP, if the program is concerned about space overhead, memory is more tense, the load factor can be increased appropriately, if the program is more concerned about the time overhead, A more comfortable memory can reduce the load factor appropriately. Typically, programmers do not need to change the value of the load factor.

If you start to know that HashMap will save multiple key-value pairs, you can use a large initialization capacity at creation time, if the number of Entry in HashMap never exceeds the limit capacity (capacity * load factor), HASHMAP does not need to call The resize () method re-allocates the table array to ensure good performance. Of course, starting to set the initial capacity too high can be a waste of space (the system needs to create a Entry array of length capacity), so initializing the capacity setting when creating HashMap also requires careful treatment.

the realization of HashSet

For HashSet, it is based on HASHMAP implementation, HashSet the bottom of the HashMap to save all the elements, so HashSet implementation is relatively simple, view the source code of HashSet, you can see the following code:

 public class Hashset<e> extends abstractset<e> implements Set<e>, Cloneable, java.io.Serializable {  Use HashMap key to save all elements in HashSet private transient hashmap<e,object> map;  A virtual object object is defined as the value of the HashMap private static final Object PRESENT = new Object ();  ...//Initialize HashSet, the underlying initializes a HASHMAP public HashSet () {map = new hashmap<e,object> (); }//created with the specified initialcapacity, Loadfactor HashSet//is actually created with the corresponding parameters HashMap public HashSet (int initialcapacity, float LOADF  Actor) {map = new hashmap<e,object> (initialcapacity, loadfactor);  } public HashSet (int initialcapacity) {map = new hashmap<e,object> (initialcapacity); } HashSet (int initialcapacity, float loadfactor, Boolean dummy) {map = new linkedhashmap<e,object> (Initialcapaci  Ty, loadfactor);  }//Call map KeySet to return all key public iterator<e> Iterator () {return Map.keyset (). Iterator (); }//Call the size () method of the HashMap to return the number of Entry, the number of elements in the Set PUblic int size () {return map.size (); }//Call HashMap's IsEmpty () to determine if the HashSet is empty,//When the HashMap is empty, the corresponding HashSet is also empty public boolean isEmpty () {return map.is  Empty ();  }//Call HashMap's containskey to determine whether all elements that contain the specified key//hashset are saved by the key of the HashMap, the public boolean contains (Object o) {  return Map.containskey (o);  }//Put the specified element into HashSet, that is, place the element as a key HashMap public boolean add (e e) {return Map.put (E, PRESENT) = = NULL; }//Call the Remove method of HashMap to delete the specified Entry, and then delete the corresponding element in the HashSet public boolean remove (Object o) {return map.remove (o) ==pres  ENT;  }//The Clear method of calling Map clears all Entry, and clears all elements in HashSet public void Clear () {map.clear (); }  ...  }

As can be seen from the above source program, the implementation of HashSet is very simple, it just encapsulates a HashMap object to store all the collection elements, all the collection elements put into HashSet are actually saved by the key of HashMap, while the HASHMAP value is stored A PRESENT, which is a static object.

Most of the methods of HashSet are implemented by invoking the HashMap method, so HashSet and HashMap two sets are essentially the same in their implementations.

HashMap's put and HashSet's add

Because the Add () method of HashSet adds a collection element, it actually transitions to call HashMap's put () method to add the Key-value pair, when the new Entry in HashMap is the same as the key in the Entry of the set (ha Shcode () Returns the value equal, which is also true by equals, and the value of the newly added Entry overwrites the value of the original Entry, but the key does not change anything, so if you add an existing element to the HashSet, the newly added set The HashMap element (the bottom layer is saved by the key) does not overwrite the existing collection element.

After mastering the above theoretical knowledge, take a look at an example program and test whether you really mastered the functions of the HashMap and HashSet collections.

Class name{    private String first;     Private String last;         Public Name (string first, string last)     {         this.first = first;         This.last = Last;     }     public boolean equals (Object o)     {         if (this = = O)         {             return true;         }         if (o.getclass () = = Name.class)         {             name n = (name) o;             Return N.first.equals (first)                 && n.last.equals (last);         }         return false;     } }public class hashsettest{public    static void Main (string[] args)    {         set<name> s = new hashset< Name> ();        S.add (New Name ("abc", "123"));        System.out.println (            s.contains ("New Name" ("abc", "123"));}    }

After adding a new name ("abc", "123") object to HashSet in the above program, it is immediately determined by the program that the HashSet contains a new name ("abc", "123") object. Coarse looks, it is easy to assume that the program will output true.

Actually running the above program will see the program output false, this is because HashSet judge two objects equal to the standard in addition to the Equals () method is required to return true, but also requires two objects hashcode () return values equal. The above program does not override the name class's Hashcode () method, the Hashcode () return value of the two name objects is not the same, so HashSet treats them as 2 objects, so the program returns FALSE.

Thus, it is important to override the Equals (Object obj) method and the Hashcode () method of this class when we try to treat the object of a class as a HashMap key, or try to save the object of this class in HashSet. And the return values of the two methods must be consistent: when the two hashcode () return values of the class are identical, they should also return true through the Equals () method. In general, all key attributes that participate in the calculation of the hashcode () return value should be used as the criteria for the Equals () comparison.

hashcode () and Equals ()

For information on how to correctly rewrite the Hashcode () method and the Equals () method for a class, refer to the Crazy Java handout book in the Mad Java system.

The following program correctly rewrites the hashcode () and Equals () methods of the Name class, with the following procedures:

class Name {private String first;    Private String last;         Public Name (string first, String last) {This.first = first;     This.last = Last; }//According to first determine whether two Name is equal public boolean equals (Object o) {if (this = O) {ret         Urn true;             } if (o.getclass () = = name.class) {Name n = (name) o;         Return n.first.equals (first);     } return false;     }//Calculates the hashcode () value of the Name object according to first, public int hashcode () {return first.hashcode ();     } public String toString () {return ' name[first= ' + First + ', last= + last + '] ";  }} public class HashSetTest2 {public static void main (string[] args) {hashset<name> set = new         Hashset<name> ();         Set.add (New Name ("abc", "123"));         Set.add (New Name ("abc", "456"));     SYSTEM.OUT.PRINTLN (set); } }

The above program provides a name class that overrides the Equals () and toString () two methods, both of which are judged by the first instance variable of the name class, and when the first instance variable of the two name object is equal, the two N The Hashcode () return value of the Ame object is also the same, and true is also returned by the Equals () comparison.

The main method of the program first adds the first name object to the HashSet, the name object is named "ABC", and then the program tries again to add a Name object "ABC" to the HashSet, it is clear that the new Na The Me object is added to the HashSet because the first of the name object you are trying to add is also "ABC", and HashSet will determine that the new name object here is the same as the original name object and therefore cannot be added, and the program will look at the output set collection at ① code. To the collection contains only one name object, which is the first name object that is "123".

Original address: http://www.ibm.com/developerworks/cn/java/j-lo-hash/

HashMap, HashSet source code analysis its Hash storage mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.