Java Collection series [3] ---- HashMap source code analysis, java ---- hashmap

Source: Internet
Author: User

Java Collection series [3] ---- HashMap source code analysis, java ---- hashmap

We have analyzed the ArrayList and ArrayList sets. We know that ArrayList is implemented based on arrays, and ArrayList is implemented based on linked lists. They each have their own advantages and disadvantages. For example, ArrayList is superior to lateral list when locating and searching for elements, while lateral list is superior to ArrayList when adding and deleting elements. The HashMap introduced in this article combines the advantages of the two. Its underlying layer is implemented based on hash tables. If hash conflicts are not considered, the time complexity of HashMap In addition, deletion, modification, and query operations can reach an astonishing O (1 ). Let's first look at the structure of the hash table based on it.

We can see that the hash table is a structure composed of arrays and linked lists. Of course, it is a bad example. A good hash function should average the distribution of elements in the array as much as possible, reduce hash conflicts to reduce the length of the linked list. The longer the length of the linked list, the more nodes that need to be traversed during search, the worse the performance of the hash table. Next, let's take a look at some member variables of HashMap.

1 // default initial capacity 2 static final int DEFAULT_INITIAL_CAPACITY = 1 <4; 3 4 // default maximum capacity 5 static final int MAXIMUM_CAPACITY = 1 <30; 6 7 // default loading factor, which means that the hash table can reach a full scale 8 static final float DEFAULT_LOAD_FACTOR = 0.75f; 9 10 // empty hash table 11 static final Entry <?,?> [] EMPTY_TABLE ={}; 12 13 // actually used Hash table 14 transient Entry <K, V> [] table = (Entry <K, V> []) EMPTY_TABLE; 15 16 // HashMap size, that is, number of key-value pairs stored in HashMap 17 transient int size; 18 19 // threshold of key-value pairs, used to determine whether to expand the hash table capacity of 20 int threshold; 21 22 // loading Factor 23 final float loadFactor; 24 25 // number of modifications, used for fail-fast Mechanism 26 transient int modCount; 27 28 // use the default threshold value 29 static final int ALTERNATIVE_HASHING_THRESHOLD_DEFAULT = Integer. MAX_VALUE; 30 31 // random hash seed, helps reduce the number of hash collisions 32 transient int hashSeed = 0;

As shown in the member variables, the default initial capacity of HashMap is 16, and the default loading factor is 0.75. Threshold is a set of key-value pairs that can be stored. The default value is the initial capacity * loading factor, that is, 16*0.75 = 12. When the key-Value Pair exceeds the threshold, this means that the hash table is saturated at this time. If you add more elements, the hash conflict will be increased, which will degrade the performance of HashMap. The automatic resizing mechanism is triggered to ensure the performance of HashMap. We can also see that the hash table is actually an Entry array, and each Entry in the array is the header node of the one-way linked list. This Entry is a static internal class of HashMap. Let's take a look at the member variables of the Entry.

1 static class Entry <K, V> implements Map. entry <K, V> {2 final K key; // key 3 V value; // value 4 Entry <K, V> next; // reference 5 int hash for the next Entry; // hash code 6 7... // omit the following code 8}

An Entry instance is a key-value pair that contains keys and values. Each Entry instance also has a reference pointing to the next Entry instance. To avoid repeated calculations, each Entry instance also stores the corresponding Hash code. It can be said that the Entry array is the core of HashMap, and all operations are performed on this array. Because the HashMap source code is relatively long, it is impossible to fully introduce all its methods, so we only focus on introducing it. Next we will be problem-oriented, and explore the internal mechanism of HashMap in depth for the following issues.

1. What operations does HashMap perform during construction?

1 // constructor, passed in Initialization capacity and loading Factor 2 public HashMap (int initialCapacity, float loadFactor) {3 if (initialCapacity <0) {4 throw new IllegalArgumentException ("Illegal initial capacity:" + initialCapacity); 5} 6 // if the initial capacity is greater than the maximum capacity, set it to the maximum capacity 7 if (initialCapacity> MAXIMUM_CAPACITY) {8 initialCapacity = MAXIMUM_CAPACITY; 9} 10 // if the loading factor is less than 0 or the loading factor is not a floating point number, an exception is thrown. 11 if (loadFactor <= 0 | Float. isNaN (loadFactor) {12 throw new IllegalArgumentException ("Illegal load factor:" + loadFactor); 13} 14 // set the loading factor 15 this. loadFactor = loadFactor; 16 // threshold Value: initialization capacity 17 threshold = initialCapacity; 18 init (); 19} 20 21 void init (){}

All constructors call this constructor. In this constructor, we can see that apart from verifying the parameters, it does two things, set the loading factor to the input loading factor, and set the threshold to the input initialization size. The init method is empty and nothing is done. Note: At this time, no Entry array is created based on the input initialization size. When can I create an array? Continue.

2. What operations does HashMap perform when adding a key-value pair?

1 // place key-value pairs to 2 public V put (K key, V value) in HashMap) {3 // initialize the hash table if it is not initialized. 4 if (table = EMPTY_TABLE) {5 // initialize the hash table 6 inflateTable (threshold ); 7} 8 if (key = null) {9 return putForNullKey (value); 10} 11 // calculate the key hash code 12 int hash = hash (key ); 13 // locate the position in the hash table based on the hash code 14 int I = indexFor (hash, table. length); 15 for (Entry <K, V> e = table [I]; e! = Null; e = e. next) {16 Object k; 17 // if the corresponding key already exists, replace its value and return the original value 18 if (e. hash = hash & (k = e. key) = key | key. equals (k) {19 V oldValue = e. value; 20 e. value = value; 21 e. recordAccess (this); 22 return oldValue; 23} 24} 25 modCount ++; 26 // if no corresponding key exists, add the Entry to 27 addEntry (hash, key, value, I); 28 // return null29 return null; 30} after successful addition}

We can see that when adding a key-value pair, we will first check whether the hash table is an empty table. If it is an empty table, we will initialize it. Then, call the Hash function to calculate the Hash code of the Input key. Locate the specified slot of the Entry Array Based on the hash code, and traverse the one-way linked list of the slot. If the input already exists, replace it, otherwise, a new Entry is added to the hash table.

3. How is a hash table initialized?

1 // initialize the hash table, which will expand the capacity of the hash table, because the incoming capacity may not be 2 power 2 private void inflateTable (int toSize) {3 // The hash table capacity must be 2 Power 4 int capacity = roundUpToPowerOf2 (toSize); 5 // set the threshold value, capacity * loadFactor 6 threshold = (int) Math. min (capacity * loadFactor, MAXIMUM_CAPACITY + 1); 7 // create a hash table with the specified capacity 8 table = new Entry [capacity]; 9 // initialize the hash seed 10 initHashSeedAsNeeded (capacity); 11}

As we know above, no Entry array will be created when constructing a HashMap. Instead, we will check whether the current hash table is an empty table during the put operation. If it is an empty table, we will call the inflateTable Method for initialization. The code of this method is pasted above. We can see that the size of the Entry array will be recalculated inside the method, because the input initialization size during HashMap construction may not be a power of 2, therefore, we need to convert the number to the power of 2 and then create a new Entry Array Based on the new capacity. Re-set the threshold value when initializing the hash table. The threshold value is usually capacity * loadFactor. In addition, hashSeed is initialized when the hash table is initialized. This hashSeed is used to optimize the hash function. By default, 0 is used instead of replacing the hash algorithm, however, you can also set the hashSeed value to achieve optimization. The details are described below.

4. When does HashMap determine whether to scale up or how to scale up?

1 // Add Entry method, first determine whether to expand 2 void addEntry (int hash, K key, V value, int bucketIndex) {3 // if the HashMap size is greater than the threshold value and the corresponding slot value of the hash table is not empty, 4 if (size> = threshold) & (null! = Table [bucketIndex]) {5 // because the HashMap size is greater than the threshold value, it indicates that a hash conflict is about to occur. Therefore, scale 6 resize (2 * table. length); 7 hash = (null! = Key )? Hash (key): 0; 8 bucketIndex = indexFor (hash, table. length); 9} 10 // It indicates that the size of HashMap does not exceed the threshold value, so you do not need to expand 11 createEntry (hash, key, value, bucketIndex ); 12} 13 14 // scale up the hash table by 15 void resize (int newCapacity) {16 Entry [] oldTable = table; 17 int oldCapacity = oldTable. length; 18 // if the current capacity is already the maximum capacity, the threshold can only be increased by 19 if (oldCapacity = MAXIMUM_CAPACITY) {20 threshold = Integer. MAX_VALUE; 21 return; 22} 23 // otherwise, 24 Entry [] newTable = new Entry [newCapacity] will be resized; 25 // Method 26 transfer (newTable, initHashSeedAsNeeded (newCapacity); 27 // set the current hash table to the new hash table 28 table = newTable; 29 // update the hash table threshold of 30 threshold = (int) Math. min (newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); 31}

When you call the put Method to add a key-value pair, if there is no key in the Set, call the addEntry method to create an Entry. Seeing the addEntry code posted above, before creating a new Entry, you will first determine whether the size of the current set element exceeds the threshold value. If the threshold value is exceeded, you will call resize to resize. The new capacity passed in is twice that of the original hash table. In the resize method, a new Entry array with twice the original capacity is created. Then, all the elements in the old hash table are migrated to the new hash table, which may be re-hashed. Based on the value calculated by the initHashSeedAsNeeded method, determine whether to re-hash. After the hash table is migrated, replace the current hash table with the new one, and then recalculate the HashMap threshold value based on the new hash table capacity.

5. Why must the size of the Entry array be the power of 2?

1 // returns the array subscript 2 static int indexFor (int h, int length) {3 return h & (length-1); 4} corresponding to the hash code}

The indexFor method calculates the corresponding subscript In the Array Based on the hash code. We can see that the and (&) operator is used internally in this method. The bitwise operation is performed on two operands. If both bits are 1, the result is 1. Otherwise, the result is 0. And operations are often used to remove the high value of the operand, for example: 01011010 & 00001111 = 00001010. Let's go back to the code and see what h & (length-1) has done.

We know that the input length is the length of the Entry array. We know that the array subscript is calculated from 0, so the maximum subscript of the array is length-1. If length is the power of 2, the binary bits of length-1 are followed by 1. H & (length-1) removes the high value of h, leaving only the low value of h as the subscript of the array. From this we can see that the Entry array size is limited to the power of 2 to be able to use this algorithm to determine the subscript of the array.

6. How does the Hash function calculate the Hash code?

1 // function 2 final int hash (Object k) {3 int h = hashSeed; 4 // if the key is of the String type, use another hash algorithm 5 if (0! = H & k instanceof String) {6 return sun. misc. hashing. stringHash32 (String) k); 7} 8 h ^ = k. hashCode (); 9 // disturbance function 10 h ^ = (h >>>> 20) ^ (h >>> 12); 11 return h ^ (h >>> 7) ^ (h >>> 4); 12}

The last two lines of the hash method are the algorithms that actually calculate the hash value. The algorithm used to calculate the hash code is called a disturbance function. The so-called disturbance function is to mix everything together, we can see that four shift to the right operations are used here. The purpose is to mix the high value of h with the low value to increase the randomness of the low value. We know that the subscript of the positioning array is determined based on the low value of the hash code. The hash code of a key is generated by using the hashCode method, while the low-level value of the hash code generated by a bad hashCode method may be large duplicates. In order to make the hash code map evenly on the array, the disturbance function is used to combine the features of the high value into the low value to increase the randomness of the low value, in this way, the hash distribution is more loose to improve the performance. An example is provided to help you understand the problem.

7. What is the replacement of hash?

We can see that hashSeed is assigned to h first in the hash method. This hashSeed is the hash seed. It is a random value and serves to help optimize the hash function. HashSeed defaults to 0, that is, it does not use an alternative hash algorithm by default. So when will hashSeed be used? First, you need to enable alternative hashing and set jdk in system properties. map. althashing. the value of threshold. The default value in the system attribute is-1. When it is-1, the threshold value used to replace hash is Integer. MAX_VALUE. This also means that you may never use an alternative hash. Of course, you can set this threshold value to a smaller value, so that when the set element reaches the threshold value, a random hashSeed will be generated. This increases the randomness of the hash function. Why should we replace hash? When the set element reaches the threshold you set, it means that the hash table is saturated, and the possibility of hash conflicts will be greatly increased, in this case, a more random hash function can be used for the elements that are added later to make them more randomly distributed in the hash list.

Note: All the above analyses are based on JDK1.7. Major changes may occur between different versions.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.