Java Collection framework 08 -- HashMap and source code analysis

Last Update:2016-04-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java Collection framework 08 -- HashMap and source code analysis
1. Introduction to HashMap

First, let's take a look at the inheritance relationship of HashMap.

java.lang.Object   ?     java.util.AbstractMap
 
           ?     java.util.HashMap
  
   public class HashMap
   
        extends AbstractMap
    
         implements Map
     
      , Cloneable, Serializable { }

We can see that HashMap not only inherits AbstractMap, but also implements Map, Cloneable, and Serializable interfaces, so HashMap can also be serialized. In addition, HashMap is non-synchronous, but we can use the static method synchronizedMap of the Collections class to obtain the thread-safe HashMap. That is:

Map map = Collections.synchronizedMap(new HashMap());

Next, let's take a look at the APIs of HashMap, and then analyze them in detail.

void                 clear()Object               clone()boolean              containsKey(Object key)boolean              containsValue(Object value)Set
 
  >     entrySet()V                    get(Object key)boolean              isEmpty()Set
  
                  keySet()V                    put(K key, V value)void                 putAll(Map
    map)V                    remove(Object key)int                  size()Collection
   
            values()

2. Data Structure of HashMap2.1 Storage Structure

The HashMap data storage array is defined as follows, which stores the Entry Entity:

transient Entry
 
  [] table

The underlying layer of HashMap is mainly not implemented based on Array combination. It has a very fast query speed mainly because it determines the storage location by calculating the hash code. In HashMap, the hash value is calculated through the hashCode of the key, and then stored by selecting different Arrays for the hash value. As long as the hashCode is the same, the calculated hash value is the same. If there are more storage objects, different objects may calculate the same hash value, there is a so-called hash conflict, and there are many ways to solve the hash conflict. The underlying layer of HashMap solves hash conflicts through linked lists. Let's take a look at its storage structure:

In the figure, the purple part represents the hash table, which is actually a hash array. Each element of the array is the header node of a single-chain table. The linked list is used to solve the hash conflict, if different keys are mapped to the same position of the array, they are placed in the single-link table. The following figure may better illustrate the problem from the code perspective:

Next, let's take a look at the source code of the Entry object class stored in the array.

2.2Entry entity

/*** Entry is actually a one-way linked list: it is a linked list corresponding to the "HashMap chain storage method. * It implements the Map. Entry interface, that is, the getKey (), getValue (), setValue (V value) * equals (Object o), and hashCode () methods. **/Static class Entry
 
  
Implements Map. Entry
  
   
{Final K key; V value; Entry
   
    
Next; // point to the next node int hash;/*** constructor to create an Entry * parameter: hash Value h, key value k, value v and next node n */Entry (int h, K k, V v, Entry
    
     
N) {value = v; next = n; key = k; hash = h;} public final K getKey () {return key;} public final V getValue () {return value;} public final V setValue (V newValue) {V oldValue = value; value = newValue; return oldValue;} // checks whether two entries are equal, true public final boolean equals (Object o) {if (! (O instanceof Map. entry) return false; Map. entry e = (Map. entry) o; Object k1 = getKey (); Object k2 = e. getKey (); if (k1 = k2 | (k1! = Null & k1.equals (k2) {Object v1 = getValue (); Object v2 = e. getValue (); if (v1 = v2 | (v1! = Null & v1.equals (v2) return true;} return false;} public final int hashCode () {// implement hashCode return Objects. hashCode (getKey () ^ Objects. hashCode (getValue ();} public final String toString () {return getKey () + "=" + getValue ();} /*** when an element is added to a HashMap, that is, put (k, v) is called, * When v is overwritten at the k position in the HashMap, will call this method * No processing is done here */void recordAccess (HashMap
     
      
M) {}/*** when an Entry is deleted from HashMap, this function is called * No processing is performed here */void recordRemoval (HashMap
      
        M ){}}

From the source code of the Entry object, we can see that HashMap is actually an array that stores the Entry. The Entry object contains the key and value. next is also an Entry object used to handle hash conflicts, form a linked list. In this way, we have a good understanding of HashMap. Next we will analyze the source code in HashMap in detail.

3. HashMap source code analysis

Previously, the source code analysis was to paste all the source code, and then put the analysis part into the source code. This looks a little too much, and a few hundred lines of source code look a little tricky. This chapter begins to use segmented analysis to classify the source code, and then break through various parts, which looks clearer and clearer.

3.1 member attributes

Let's take a look at several key attributes of HashMap:

// The default initial capacity is 16 and must be 2 of the Power static final int DEFAULT_INITIAL_CAPACITY = 1 <4; // aka 16 // maximum capacity (must be a power of 2 and smaller than the power of 2, and the incoming capacity will be replaced by this value) static final int MAXIMUM_CAPACITY = 1 <30; // default load factor. The so-called load factor refers to the static final float DEFAULT_LOAD_FACTOR = 0.75f scale that can be reached before the hash table's capacity is automatically increased; // store the default empty array of Entry static final Entry
 [] EMPTY_TABLE ={}; // an array for storing entries. The length is a power of 2. HashMap is implemented by the zipper method, and each Entry is essentially a one-way linked list transient Entry
 
  
[] Table = (Entry
  
   
[]) EMPTY_TABLE; // HashMap size, that is, the number of key-value pairs stored in HashMap transient int size; // HashMap threshold, used to determine whether to adjust the capacity of HashMap int threshold; // the actual size of the loading Factor final float loadFactor; // number of times the HashMap is modified for fail-fast mechanism transient int modCount;

Let's take a look at the loadFactor attribute. loadFactor indicates the fill level of elements in the Hash table.

If the loading factor is set too large, the more elements to be filled, the higher the space utilization, but the chance of conflict increases. The more conflicting elements, the longer the linked list, the search efficiency will be lower;

If the value of the loading factor is too small, the fewer elements to be filled, the lower the space utilization, the data in the table will become more sparse, but the chance of conflict is reduced, so that the linked list will not be too long, the search efficiency is higher.

This seems a bit difficult. For example, if the array capacity is 100 and the loading factor is set to 80, Expansion starts only when the array capacity is 80, but during the installation process, there may be many keys that correspond to the same hash value, so they will be placed in the same linked list (because not 80 cannot be resized), which will lead to a lot of chain tables become very long, that is, different keys correspond to the same hash value, which is more likely to appear than the array is filled with 80.

However, if the loading factor is set to 10, the array will be resized when it is filled with 10, which is relatively easy to fill, in addition, the probability of having the same hash value within 10 is much lower than the above case. After resizing, the hash value will be different from the original one, so there will be no conflict, this ensures that the linked list will not be very long, or even a single header may be possible, but the space utilization is very low, because there is always a lot of space to start resizing.

Therefore, we need to find a balance between "reduce conflict" and "Space Utilization", which is the balance of the famous "time-empty" Conflict in the data structure. If the machine memory is sufficient and you want to increase the query speed, you can set the load factor to a smaller value. If the machine memory is insufficient and there is no requirement for the query speed, you can set the load factor to a larger value. Generally, we use the default value 0.75.

3.2 Constructor

Let's take a look at several construction methods of HashMap:

*********** * ******************/Public HashMap (int initialCapacity, float loadFactor) {// with initial capacity and load factor // ensure that capacity numbers are valid if (initialCapacity <0) throw new IllegalArgumentException ("Illegal initial capacity:" + initialCapacity ); if (initialCapacity> MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 | Float. isNaN (loadFactor) throw new IllegalArgumentException ("Illegal load factor:" + loadFactor); this. loadFactor = loadFactor; // set the threshold value to the initial capacity. This is not the actual threshold value. It is used to expand the table. The following threshold value will recalculate threshold = initialCapacity; init (); // an empty method is used for future Sub-object extension} public HashMap (int initialCapacity) {// with initial capacity, the loading factor is set to the default value this (initialCapacity, DEFAULT_LOAD_FACTOR );} public HashMap () {// default value for initial capacity and load factor: this (DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);} // construct a new HashMappublic HashMap (Map
 M) {this (Math. max (int) (m. size ()/DEFAULT_LOAD_FACTOR) + 1, DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR); inflateTable (threshold); putAllForCreate (m );}

We can see that when constructing a HashMap, If we specify the loading Factor and initial capacity, the first constructor is called. Otherwise, the default constructor is used. The default initial capacity is 16. The loading factor is 0.75.
3.3 Access Method

The access part focuses on the put and get methods, because these two methods are also the most commonly used. Other access methods are analyzed in the code. First, let's take a look at how data is stored in HashMap and see the put method:

Public V put (K key, V value) {if (table = EMPTY_TABLE) {// if the hash table is not initialized (table is empty) inflateTable (threshold ); // use the constructor threshold (in fact, the initial capacity) to expand the table} // If key = null, add the value to the position of table [0] // This position always has only one value. The new value overwrites the old valueif (key = null) return putForNullKey (value ); int hash = hash (key); // calculate the hash value int I = indexFor (hash, table. length); // search the index of the specified hash in the table // traverse the Entry array cyclically. If the key-value pair corresponding to the key already exists, replace the old valuefor (Entry
 
  
E = table [I]; e! = Null; e = e. next) {Object k; if (e. hash = hash & (k = e. key) = key | key. equals (k) {V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue; // and return the old value} modCount ++; // if the corresponding key is not found in table [I, then add this EntryaddEntry (hash, key, value, I); return null;} directly to the linked list at this position ;}

Next we will analyze what the put method has done:

First, check whether the table is empty. If the table is empty, it indicates that the table is not initialized. Therefore, the inflateTable (threadshold) method is called to initialize the table. The method is as follows:

// Extended tableprivate void inflateTable (int toSize) {// Find a power of 2> = toSizeint capacity = roundUpToPowerOf2 (toSize ); // obtain the power of 2 closest to toSize as capacity // recalculate threshold value threshold = capacity * load factor threshold = (int) Math. min (capacity * loadFactor, MAXIMUM_CAPACITY + 1); table = new Entry [capacity]; // use this capacity to initialize tableinitHashSeedAsNeeded (capacity );} // convert the initial capacity to the power private static int roundUpToPowerOf2 (int number) {// assert number> = 0: "number must be non-negative"; return number> = MAXIMUM_CAPACITY? MAXIMUM_CAPACITY // if the capacity exceeds the maximum value, set it to the maximum value. // otherwise, set it to the power of 2 closest to the given value: (number> 1 )? Integer. highestOneBit (number-1) <1): 1 ;}

In the inflateTable method, first initialize the array capacity. The array capacity is always the power of 2 (the following will analyze why ). Therefore, call the roundUpToPowerOf2 method to convert the passed capacity to the power value closest to 2, re-calculate the threshold threadshold = capacity x loading factor, and finally initialize the table. Therefore, the initial table Initialization is not in the HashMap constructor, because the constructor simply uses the transferred capacity as the threshold. The first time a table is put to a HashMap.

After the table is initialized, data is stored in the table. The table stores the Entry object, and the put method transmits the key and value. Therefore, we need to do the following:

1. Locate the location to be saved in the table array;

2. encapsulate the key and value into the Entry and store them.

Let's go back to the put method. First, we will analyze the first step. We need to rely on the key value to find the storage location, because we need to use the key value to calculate the hash value, the position in the table is determined based on the hash value. When the key is null, call the putForNullKey method. The internal implementation of this method is as follows:

// Input key = null Entryprivate V putForNullKey (V value) {for (Entry
 
  
E = table [0]; e! = Null; e = e. next) {if (e. key = null) {V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue ;}} modCount ++; // if no key in table [0] Is nulladdEntry (0, null, value, 0 ); // if the key is null, the hash value is 0 return null ;}

As can be seen from the method, the hash value of null is 0, so the system first locates at table [0], and then queries whether there is a key with key = null in sequence. If yes, replace the corresponding value with a new value and return the old value. If there is no key = null, call the addEntry method to encapsulate the null key and value into the Entry and place it in table [0]. The addEntry method is as follows:

// Add Entryvoid addEntry (int hash, K key, V value, int bucketIndex) to HashMap {if (size> = threshold) & (null! = Table [bucketIndex]) {resize (2 * table. length); // resize hash = (null! = Key )? Hash (key): 0; bucketIndex = indexFor (hash, table. length);} createEntry (hash, key, value, bucketIndex);} // create an Entryvoid createEntry (int hash, K key, V value, int bucketIndex) {Entry
 
  
E = table [bucketIndex]; // first, save the original Entry in the table. // create a new Entry in the table, mount the original Entry to nexttable [bucketIndex] = new Entry <> (hash, key, value, e) of the Entry ); // Therefore, each position in the table always stores only one newly added Entry, and the other Entry is mounted to one, so the mounted size ++ ;}

From this method, we can see that the first parameter is the hash value, the two in the middle are the key and value, and the last is the index location of the inserted table. Before insertion, determine whether the capacity is sufficient. If not, the size in HashMap is doubled. If enough, addEntry calculates the hash value first and then returns the index location by calling the indexFor method. The two methods are as follows:

Final int hash (Object k) {int h = hashSeed; if (0! = H & k instanceof String) {return sun. misc. hashing. stringHash32 (String) k);} h ^ = k. hashCode (); // pre-process the hash value to avoid a poor discrete hash sequence. As a result, table does not fully utilize h ^ = (h >>> 20) ^ (h >>> 12); return h ^ (h >>>> 7) ^ (h >>> 4);} // This method is somewhat interesting, this is also why the capacity is set to the power of 2. static int indexFor (int h, int length) {// assert Integer. bitCount (length) = 1: "length must be a non-zero power of 2"; return h & (length-1 );}

The indexFor method returns the index location, which only does one thing: h & (length-1 ). What did this actually do? Why does this sentence explain that the capacity must be a power of 2? Our detailed analysis is as follows:

First, h & (length-1) is equivalent to h & length, but h % length is less efficient (HashTable does it here ). Why is h & (length-1) equivalent to h % length? Assuming that the length is the power of 2, the length can be expressed as 100 ...... 00 format (indicating at least 1 0), then length-1 is 01111 .... 11. For any number of h smaller than the length, and 01111... for h = length, the result after "&" is 0. For h greater than the length, "&" is equivalent to h-j * length, that is, h % length. This is why the capacity must be a power of 2. In order to optimize the capacity, it is easy to operate and the efficiency is high.

Second, if length is the power of 2, it is an even number. In this way, length-1 is an odd number and the last digit of an odd number is 1, which ensures h & (length-1) the last bit may be 0 or 1 (depending on the value of h), that is, the result may be an odd or even number, so as to ensure the uniformity of the hash, that is, evenly distributed in the array table. If length is an odd number, it is obvious that length-1 is an even number, and its last digit is 0, so h & (length-1) the last bit must be 0, and the level can only be an even number. In this way, any hash value will be mapped to the even subscript position of the array, which wastes nearly half of the space! Therefore, length is used to remove the integer power of 2, so that the probability of collision between different hash values is small, so that the elements can be evenly hashed in the hash table.

Return to the addEntry method, and then call the createEntry method to create an Entry in the appropriate position of the table array. set next to the Entry originally located in the new Entry, in this way, the original Entry will be mounted to the current Entry. In the future, as long as a new Entry is located, the original Entry will be mounted. Such an Entry will form a linked list. But the table always stores the latest Entry, not a real linked list data structure, but so many entries are connected one by one, just like the linked list.

Now let's go back to the put method. We have just analyzed the case of key = null, and then go down. The following is actually the same as what we just analyzed. Calculate the hash value first, then locate the position in the table and start to judge whether there are already entries with the same key. If so, replace the old value with the new value. If not, put the new Entry with the passed key and value into the table and link it with the original Entry. The process is exactly the same as the analysis above. The only difference is the key! = Null. I will not go into details here.

After analyzing the put method, it should be easy to understand the get method. Next let's take a look at the get Method for reading data in HashMap:

Public V get (Object key) {if (key = null) return getForNullKey (); // when hey = null, retrieve the Entry from table [0]
 
  
Entry = getEntry (key); // key! = Null-> getEntryreturn null = entry? Null: entry. getValue ();} private V getForNullKey () {if (size = 0) {return null;} for (Entry
  
   
E = table [0]; e! = Null; e = e. next) {if (e. key = null) return e. value; // obtain the value of key = null from table [0]} return null;} final Entry
   
    
GetEntry (Object key) {if (size = 0) {return null;} // the opposite of the value in the put Operation: int hash = (key = null )? 0: hash (key); for (Entry
    
     
E = table [indexFor (hash, table. length)]; e! = Null; e = e. next) {Object k; // if the hash value is equal, and the key is equal, it means that what is in this bucket is what we want if (e. hash = hash & (k = e. key) = key | (key! = Null & key. equals (k) return e;} return null ;}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java Collection framework 08 -- HashMap and source code analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java Collection framework 08 -- HashMap and source code analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support