Java collection: HashMap source Analysis

Source: Internet
Author: User
Tags modulus

I. Overview of HASHMAP
II. data structure of HashMap
Third, HashMap source analysis
1. Key attributes
2. Construction method
3. Store data
4. Size adjustment

5. Data read

6, HashMap performance parameters
7. Fail-fast mechanism

I. Overview of HASHMAP

HashMap the implementation of the Map interface based on the hash table. This implementation provides all the optional mapping operations and allows NULL values and NULL keys to be used. (in addition to not synchronizing and allowing Nulls, the HashMap class is roughly the same as Hashtable.) This class does not guarantee the order of the mappings, especially because it does not guarantee that the order is constant.

It is important to note that HashMap is not thread-safe, and if you want thread-safe hashmap, you can synchronizedmap get thread-safe HashMap through the static method of the Collections class.

Map map = Collections.synchronizedmap (new HashMap ());

II. data structure of HashMap

The bottom of the HashMap is mainly based on arrays and linked lists, and it has a fairly fast query speed mainly because it calculates the location of the store by calculating the hash code. HashMap is mainly through the key hashcode to calculate the hash value, as long as hashcode the same, the calculated hash value is the same. If you store more objects, it is possible that different objects will have the same hash value, and there is the so-called hash conflict. The students who have learned the data structure know that there are many ways to solve the hash conflict, hashmap the bottom is to solve the hash conflict through the list.

In the figure, the purple part represents a hash table, also known as a hash array, each element of the array is a head node of a single-linked list, which is used to resolve conflicts, and if different keys are mapped to the same position in the array, they are placed in a single-linked list.

Let's look at the code for the entry class in HashMap:

    /** entry is a unidirectional linked list.         * It is a linked list of "HashMap chain-store Method". * It implements the Map.entry interface, which implements Getkey (), GetValue (), SetValue (V value), Equals (Object O), hashcode () These functions **/static class            Entry<k,v> implements map.entry<k,v> {final K key;            V value;            Point to next node entry<k,v> next;               final int hash;            constructor function. Input parameters include "hash value (h)", "key (k)", "Value (v)", "Next node (n)" Entry (int h, K K, v V, entry<k,v> N) {value                = V;                Next = n;                key = k;            hash = h;            } Public final K GetKey () {return key;            Public final V GetValue () {return value;                Public final V SetValue (v newvalue) {v oldValue = value;                value = newvalue;            return oldValue; }//Determine if two entry are equal//If two entry "key" and "Value "is equal, returns True. Otherwise, returns false public final Boolean equals (Object o) {if (!) (                o instanceof Map.entry)) return false;                Map.entry e = (map.entry) o;                Object K1 = GetKey ();                Object K2 = E.getkey (); if (k1 = = K2 | | (K1! = null && k1.equals (K2)))                    {Object V1 = GetValue ();                    Object v2 = E.getvalue (); if (v1 = = V2 | |                        (V1! = null && v1.equals (v2)))                return true;            } return false; }//Implement Hashcode () public final int hashcode () {return (key==null? 0:key.hash            Code ()) ^ (Value==null 0:value.hashcode ());            Public final String toString () {return GetKey () + "=" + GetValue ();            }//When adding elements to HashMap, draw calls Recordaccess (). //do not do any processing here void Recordaccess (hashmap<k,v> m) {}//when removing elements from HashMap, draw calls Recordrem            Oval (). Do not do any processing here void Recordremoval (hashmap<k,v> m) {}}

HashMap is actually a entry array, entry object contains the key and the value, where next is also a entry object, it is used to deal with the hash conflict, form a linked list.

Third, HashMap source analysis

1. Key attributes

First look at some of the key properties in the HashMap class:

1 transient entry[] table;//The entity array that stores the elements 2  3 transient int size;//The number of elements stored 4  5 int threshold;//Critical value   when the actual size exceeds the critical value, will be expanded threshold = load factor * Capacity 6 7  final float loadfactor;//Load factor 8  9 transient int modcount;//number of times modified

Where the loadfactor load factor is the degree to which the elements in the Hsah table are filled.

If: The larger the load factor, the more elements filled, the advantage is that the space utilization is high, but: the chance of conflict increased. The list length will be longer and the search efficiency is reduced.

Conversely, the smaller the load factor, the less the element that fills up, the advantage is that the chance of conflict is reduced, but the space is wasted. The data in the table will be too sparse (lots of space is useless and will start expanding)

The greater the chance of conflict, the higher the cost of finding it.

It is therefore necessary to find a balance and compromise between "opportunity for Conflict" and "space utilization". This balance and tradeoff is essentially a balance and tradeoff between the well-known "time-space" contradictions in the data structure.

If the machine memory is sufficient and you want to increase the query speed, you can set the load factor to a smaller point, but if the machine memory is tight and there is no requirement for the query speed, you can set the load factor a bit larger. But generally we do not have to set it, let it take the default value of 0.75 is good.

2. Construction method

Here's a look at some of the construction methods of HashMap:

Public HashMap (int initialcapacity, float loadfactor) {2//Make sure the number is valid 3 if (initialcapacity < 0) 4 throw new IllegalArgumentException ("Illegal initial capacity:" + 5 ini tialcapacity); 6 if (initialcapacity > maximum_capacity) 7 initialcapacity = maximum_capacity; 8 if (loadfactor <= 0 | |                                               Float.isnan (Loadfactor)) 9 throw new IllegalArgumentException ("Illegal load factor:" +10    Loadfactor): One-to-one/Find a power of 2 >= initialCapacity13 int capacity = 1;             Initial capacity (capacity < initialcapacity)//Ensure a capacity of 2 n power, so that capacity is greater than initialcapacity of the smallest 2 n power 15         Capacity <<= 1;16 This.loadfactor = loadfactor;18 threshold = (int) (capacity * loadfactor); 19      Table = new ENTRY[CAPACITY];20 init ()}22 HashMap (int initialcapacity) {24   This (initialcapacity, default_load_factor),}26 HashMap () {this.loadfactor = Default_load _factor;29 threshold = (int) (default_initial_capacity * default_load_factor), table = new Entry[default_ INITIAL_CAPACITY];31 init (); 32}

We can see that when we construct the HashMap, we call the first constructor if we specify the loading factor and the initial capacity, otherwise the default is the one. The default initial capacity is 16, and the default load factor is 0.75. We can see the above code in 13-15 lines, the function of this code is to ensure that the capacity of the N power of 2, so that the capacity is greater than initialcapacity the smallest 2 of the power of N, as to why the capacity is set to 2 of the N power, we wait to see.

Focus on the two methods used in HashMap put and get

3. Store data

Here's a look at how the HashMap stores the data, first look at the HashMap put method:

Public V put (K key, V value) {     //if "key is null", the key value pair is added to table[0].         if (key = = null)             return Putfornullkey (value);     If "key is not NULL", the hash value of the key is computed and then added to the linked list for that hash value.         int hash = hash (Key.hashcode ());     Search for the index of the specified hash value in the corresponding table         int i = indexfor (hash, table.length);     Loops through the entry array, replacing the old value with the new value if the key value pair already exists for that key. Then quit! For         (entry<k,v> e = table[i]; E! = null; e = e.next) {              Object K;              if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k)) {//If key is the same then overwrite and return old value                  V oldValue = e.value;                 E.value = value;                 E.recordaccess (this);                 return oldValue;              }         }     Number of modifications +1         modcount++;     Add Key-value to Table[i] at     addentry (hash, key, value, I);     return null;}

The above program uses an important internal interface: Map.entry, each map.entry is actually a key-value pair. As can be seen from the above program: when the system decides to store the Key-value pair in the HASHMAP, it does not take into account the value in Entry, but only calculates and determines the storage location of each Entry based on key. This also illustrates the previous conclusion: we can completely consider the value of the MAP set as a subsidiary of the key, and when the system determines where the key is stored, value is stored there.

We slowly analyze this function, and the 2nd and 3 lines are dealing with the case where the key value is null, let's take a look at the Putfornullkey (value) method:

1 private V Putfornullkey (v value) {2 for         (entry<k,v> e = table[0]; E! = null; e = e.next) {3             if (E.key = = NULL) {   //If an object with key null exists, overwrite 4                 V oldValue = e.value; 5                 e.value = value; 6                 e.recordaccess (this); 7
   
    return OldValue; 8            } 9        }10         modcount++;11         addentry (0, NULL, value, 0);//If the key is null, the hash value is 012         return null;13     }
   

Note: If key is null, the hash value is 0 and the object is stored in the array where the index is 0. IE Table[0]

Let's go back to the 4th line of the Put method, which computes the hash code by the Hashcode value of the key, and the following is the function that computes the hash code:

1//  calculate the hash value by the hashcode of the key to calculate the 2     static int hash (int h) {3         //This function ensures that hashcodes that differ Only by4         //constant multiples @ each bit position has a bounded5         //number of collisions (approximately 8 at De Fault load factor). 6         H ^= (H >>>) ^ (h >>> N); 7         return H ^ (H >>> 7) ^ (H >>& Gt 4); 8     }

After the hash code is obtained, the hash code is computed to calculate the index that should be stored in the array, and the function of the index is as follows:

1     static int indexfor (int h, int length) {//Calculate index value 2         return H & (LENGTH-1) based on hash value and array length;  It is not possible to use hash& (length-1) for a reason, which ensures that the index is calculated within the array size range and will not exceed 3     }

This we have to focus on, we generally hash hash table is naturally thought of hash value to the length of the modulus (that is, Division hashing), Hashtable is also implemented, this method is basically to ensure that the elements in the hash table of the uniform hash, but the modulo will use the division operation, the efficiency is very low, In the HashMap, the method of h& (length-1) is used instead of the modulus, the same uniform hash is achieved, but the efficiency is much higher, which is also an improvement of HashMap to Hashtable.

Next, we analyze why the capacity of the hash table must be 2 of the power of the whole number. First, the length of 2 of the whole number of power words,h& (length-1) is equivalent to the length of the modulus, so that the uniform hash, but also improve efficiency, and secondly, the length of 2 of the whole number of power, for even, so length-1 is odd, The last digit of the odd number is 1, which guarantees that the last of h& (length-1) may be 0, or 1 (depending on the value of h), that the result may be even or odd, which guarantees the uniformity of the hash, and if the length is odd, It is obvious that length-1 is an even number, the last of which is 0, so that the last of h& (length-1) must be 0, i.e. only even, so that any hash value is only hashed to the even subscript position of the array, which wastes nearly half of the space, so Length takes an integer power of 2 so that the probability of collisions of different hash values is small, so that the elements can be uniformly hashed in the hash table.

This looks very simple, actually more mysterious, we give an example to illustrate:

Assuming that the array length is 15 and 16 respectively, the optimized hash code is 8 and 9, then the result of the & operation is as follows:

       H & (table.length-1)                     hash                             table.length-1       8 & (15-1):                                 0100                   &              1110                   =                0100       9 & (15-1):                                 0101                   &              1110                   =                0100       ---------------------------------- -------------------------------------------------------------------------------------       8 & (16-1):                                 0100                   &              1111                   =                0100       9 & (16-1):                                 0101                   &              1111                   =                0101

As can be seen from the above example: when they are 15-1 (1110) "and", they produce the same result, that is, they will be positioned in the same position in the array, which will result in collisions, and 8 and 9 will be placed in the same position in the array to form the linked list, then the query will need to traverse the list , get 8 or 9, which reduces the efficiency of the query. At the same time, we can also find that when the array length is 15, the hash value and 15-1 (1110) will be "with", then the last one is always 0, and 0001,0011,0101,1001,1011,0111,1101 these positions can never store elements, Space waste is quite large, and worse, in this case, the array can be used in a much smaller position than the array length, which means a further increase in the chance of collisions, slowing down the efficiency of the query! And when the array length is 16 o'clock, that is 2 of the N-square, the 2n-1 to get the binary number of each bit on the value of 1, which makes & at the low level, and the original hash of the same low, coupled with the hash (int h) method of key hashcode further optimization, By adding a high-level calculation, only two values of the same hash value are placed in the same position in the array to form the linked list.

So, when the array length is 2 of the power of n times, the different key is the same probability of the index is smaller, then the data in the array distribution on the more uniform, that is, the probability of collision is small, relative, when the query does not have to traverse a position on the list, so query efficiency is higher.

   

According to the source code of the Put method above, when the program tries to put a key-value pair into HashMap, the program first determines the storage location of the Entry based on the hashcode () return value of the key: if the Entry of two hashcode keys ( The return values are the same, and they are stored in the same location. If these two Entry keys return true by equals, the newly added Entry value overrides the Entry value in the collection, but the key is not overwritten. If these two Entry keys return false by Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain--Specify to continue to see AddEntry () Description of the method.

1 void addentry (int hash, K key, V value, int bucketindex) {2         entry<k,v> e = Table[bucketindex];//If the position to be joined has a value, the The original value of the position is set to the next of the new Entry, which is the next node of the new Entry List 3         table[bucketindex] = new entry<> (hash, key, value, e); 4         if ( size++ >= threshold)//If the threshold value is greater than 5             resize (2 * table.length);//2 in multiples of the expansion of 6     }

The parameter bucketindex is the index value computed by the Indexfor function, the 2nd line of code is to get the entry object indexed to Bucketindex in the array, and the 3rd line is hash, key, Value constructs a new entry object where the index is Bucketindex, and sets the original object of that location to the next constituent list of the new object.

Line 4th and 5th is to determine whether the size of the put to reach the critical value threshold, if the threshold is reached to expand, hashmap expansion is twice times the original.

4. Size adjustment

The resize () method is as follows:

Resize the HashMap, newcapacity is the adjusted unit

1     void Resize (int newcapacity) {2         entry[] oldtable = table; 3         int oldcapacity = oldtable.length; 4         if (OL Dcapacity = = maximum_capacity) {5             threshold = Integer.max_value; 6             return; 7        } 8  9         entry[] NewTable = new entry[newcapacity];10         transfer (newtable);//used to move all the elements of the original table into the newtable         table = newtable;  Assign newtable to table12         threshold = (int) (newcapacity * loadfactor);//recalculate critical value     

A new underlying array of HashMap is created, and the 10th behavior in the code above calls the transfer method, adds all HASHMAP elements to the new HashMap, and recalculates the index position of the element in the new array

When there are more and more elements in the HashMap, the probability of hash collisions becomes higher, because the length of the array is fixed. Therefore, in order to improve the efficiency of the query, it is necessary to expand the array of HashMap, array expansion This operation will also appear in the ArrayList, this is a common operation, and after the HashMap array expansion, The most performance-consuming point arises: the data in the original array must recalculate its position in the new array and put it in, which is resize.

So when is the hashmap going to be enlarged? When the number of elements in the HashMap exceeds the array size *loadfactor, the array is expanded, and the default value of Loadfactor is 0.75, which is a compromise value. That is, by default, the size of the array is 16, then when the number of elements in the HashMap exceeds 16*0.75=12, the size of the array is expanded to 2*16=32, that is, to enlarge one time, and then recalculate the position of each element in the array, the expansion is required to replicate the array, Copying an array is a very performance-intensive operation, so if we have predicted the number of elements in HashMap, then the number of preset elements can effectively improve the performance of HashMap.

5. Data read

1.public V get (Object key) {   2.    if (key = = null)   3.        return Getfornullkey ();   4.    int hash = hash (Key.hashcode ());   5.    for (entry<k,v> e = table[indexfor (hash, table.length)];   6.        E! = null;   7.        E = e.next) {   8.        Object K;   9.        if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))   .            return e.value;   One.    }   .    return null;   13.}  

With the hash algorithm stored above as the basis, it is easy to understand this code. From the source code above can be seen: from the HashMap get element, first calculate the key hashcode, find the corresponding position in the array of an element, and then through the Equals method of key in the corresponding position of the linked list to find the desired element.

6, HashMap Performance parameters:

The HASHMAP contains the following constructors:

HashMap (): Constructs a HashMap with an initial capacity of 16 and a load factor of 0.75.

HashMap (int initialcapacity): Constructs a HashMap with an initial capacity of initialcapacity and a load factor of 0.75.

HashMap (int initialcapacity, float loadfactor): Creates a HashMap with the specified initial capacity, specified load factor.

The HashMap base constructor hashmap (int initialcapacity, float loadfactor) has two parameters, which are the initial capacity initialcapacity and the load factor loadfactor.

The maximum capacity of the initialcapacity:hashmap, which is the length of the underlying array.

Loadfactor: Load factor loadfactor is defined as: the number of actual elements of the hash table (n)/The capacity of the hash table (m).

The load factor measures the degree to which a hash table is used, and the larger the load factor, the greater the filling of the hash table, and the smaller the inverse. For a hash table using the list method, the average time to find an element is O (1+a), so if the load factor is larger, the use of space is more adequate, but the result is a reduction of the search efficiency, if the load factor is too small, then the hash table data will be too sparse, the space caused a serious waste.

In the implementation of HASHMAP, the maximum capacity of the HashMap is judged by the threshold field:

threshold = (int) (capacity * loadfactor);  

According to the definition formula of load factor, it is known that threshold is the maximum number of elements allowed under this loadfactor and capacity, and the number is resize again to reduce the actual load factor. The default load factor of 0.75 is a balanced selection of space and time efficiency. When the capacity exceeds this maximum capacity, the HashMap capacity after resize is twice times the capacity:

Reference Links:

In-depth Java Collection Learning series: the implementation principle of HashMap

Java collection: HashMap source Analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.