The HashMap class in the Java set and the HashMap class in the Java set

Source: Internet
Author: User
Tags concurrentmodificationexception

The HashMap class in the Java set and the HashMap class in the Java set

Jdk1.8.0 _ 144

As one of the most common collections, HashMap inherits from AbstractMap. The HashMap Implementation of JDK 8 is different from that of JDK 7. The red/black tree is added as the underlying data structure, and the structure becomes complicated and the efficiency becomes higher. Many methods in AbstractMap have been re-implemented to meet their own needs. This article will focus on HashMap and discuss in detail the underlying data structure, Expansion Mechanism of HashMap, and endless loops in the concurrent environment.

JDK8 is similar to JDK7 for Map. the Entry is implemented again, and the name is changed to "-- Node". I think this is because it is easier to understand in the red/black tree. The method is roughly the same as that in JDK 7, but only several methods are canceled. In addition, the Node (Entry) structure is more complete:

1 static class Node <K, V> implements Map. entry <K, V> {2 final int hash; // Node hash value 3 final K key; // key value 4 V value; // value 5 Node <K, v> next; // The next Node pointed to is 6 7 // omitted. Because the Map interface of JDK 8 adds several compare comparison methods, Node inherits 8 9 directly}

As a HashMap Node, it is relatively simple to maintain the internal data structure of key-value. The following is a method for HashMap to implement Map again.

Public int size ()

HashMap does not inherit the size method of AbstractMap, but overwrites this method. HashMap defines a size variable in the class, and then returns the size variable directly without calling the entrySet method to return the set for calculation. We can guess that this size variable is auto-incrementing when a key-value pair is inserted.

Public boolean isEmpty ()

Determine whether the size variable is 0.

Public boolean containsKey (Object key)

AbstractMap implements this method by traversing the Entry node. It is clear that HashMap thinks that the efficiency is too low and is not reused. Instead, it overwrites this method.

The underlying data structure of HashMap in JDK 8 introduces the red/black tree. Its implementation is slightly more complex than that of JDK 7. Let's look at the implementation of this method in JDK 7.

1 // JDK7, HashMap # containsKey2 public boolean containsKey (Object key) {3 return getEntry (key )! = Null; // call the getEntry Method 4}

The implementation of getEntry is also relatively simple, because the HashMap of JDK7 is the data structure of arrays + linked lists, when the hash value of the key conflicts, use the link address method to directly add the next pointer to the next linked list of the conflicting address Entry. So the idea of the getEntry method is to calculate the hash value of the key first, and then find the subscript of the key in the hash list, and then traverse the linked list at this position to return the result.

When the number of linked lists reaches the threshold value of 8, the linked list is converted to a red-black tree. If the number of linked lists reaches the threshold value of 8, the key value cannot be searched through the traversal link table, therefore, JDK 8 has improved this method mainly by traversing the red and black trees.Red/black treeThe specific algorithm is not described here.

1 // JDK8, HashMap # containsKey2 public boolean containsKey (Object key) {3 return getNode (hash (key), key )! = Null; // The getNode method is added to JDK 8, and the key hash value is calculated and passed as a parameter. 4} 5 // HashMap # getNode6 final Node <K, V> getNode (int hash, Object key) {7 // This method is basically the same as getEntry in JDK 7, the only difference is that after the key value conflict is found, the system checks whether the tree structure is red or black through "first instanceof TreeNode. If it is a red/black tree, the getTreeNode method will be called to query in the red/black tree. If it is not a red/black tree, it is a linked list structure. You can use a traversal link table. 8}

 Public boolean containsValue (Object value)

Traverse elements in the hash list

Public V get (Object key)

In JDK 8, The get method calls the containsKey method getNode, which is similar to the getEntry method called in the get method of JDK 7.

3.1 if they are equal, the Node is directly returned;

3.2 if they are not equal, check whether the current node has a successor node:

3.2.1Determines whether the tree structure is red or black. If yes, call getTreeNode.The query key value is key.NodeNode;

3.2.2 If the chain table structure is used, the entire chain table is traversed.

Public V put (K key, V value)

This method is the most critical. Insert key-value to Map. In this method, the hash value of the key needs to be calculated, and then the location of the hash bucket where the key is located is calculated using the hash value, determine whether there is a conflict in the location of the hash bucket. After the conflict, you need to use the link address method to resolve the conflict to form a linked list, from JDK8, if the number of elements in the linked list reaches 8, it will be converted to a red/black tree. During insertion, you also need to determine whether expansion is required, the design of the Expansion mechanism, and the endless loop problems caused by expansion in the concurrent environment.

Because JDK 7 is relatively simple, let's first check the put method source code in JDK 7.

JDK7-- HashMap # put

1 // JDK7, HashMap # put 2 public V put (K key, V value) {3 // 1. first, determine whether it is the first insertion, that is, whether the hash list points to an empty array. If yes, call the inflateTable method to initialize the HashMap. 4 if (table = EMPTY_TABLE) {5 inflateTable (threshold); 6} 7 // 2. to determine whether the key is null or not, call the putForNullKey method to store the key-value whose key is null. HashMap supports key = null. 8 if (key = null) 9 return putForNullKey (value); 10 // 3. call the hash method to calculate the hash value of the key, and call indexFor to calculate the subscript I of the hash list where the key value is located based on the hash value and the hash length. 11 int hash = hash (key); 12 int I = indexFor (hash, table. length); 13 // 4. this step uses cyclic traversal to determine whether the inserted key-value already exists in the HashMap. The condition is that the key's hash value is equal, and the value either references equal or equals equal, if yes, return value directly. 14 for (Entry <K, V> e = table [I]; e! = Null; e = e. next) {// if there is no hash conflict in the insert position, that is, if there is no Entry element in this position, the loop is not entered. If a hash conflict exists, you need to determine the traversal table. 15 Object k; 16 if (e. hash = hash & (k = e. key) = key | key. equals (k) {17 V oldValue = e. value; 18 e. value = value; 19 e. recordAccess (this); 20 return oldValue; 21} 22} 23 // insert 24 modCount ++; // record the number of modifications, in a concurrent environment, the ConcurrentModificationException (Fail-Fast mechanism) is thrown through the iteration duration. This variable is used. During the initialization process of the iterator, modCount is assigned to the ExpectedModCount of the iterator. Whether to throw the ConcurrentModificationException is to determine whether modCount is equal to ExpectedModCount during the iteration. 25 // Insert key-value pairs, pass in the hash value, key, value of the key, and the insert position i26 addEntry (hash, key, value, I) of the hash list; 27}

 

1 // JDK7, HashMap # addEntry. This method is the core of the put method. In this method, it determines whether a conflict exists and whether to expand. 2 void addEntry (int hash, K key, V value, int bucketIndex) {3 // The first step is to determine whether to scale up. The following conditions must be met: 1. The number of key-values in Map is greater than or equal to the Map capacity threshold (threshold = hash list capacity (array size) * load factor ). 2. the hash position corresponding to the key value is not null. 4 if (size> = threshold) & (null! = Table [bucketIndex]) {5 resize (2 * table. length); // key resizing mechanism. The size after resizing is twice the size of the previous hash = (null! = Key )? Hash (key): 0; // calculate the key's hash value 7 bucketIndex = indexFor (hash, table. length); // re-calculate the subscript 8} 9 of the hash list where the key is located. // create an Entry node and insert it. Each insert will be inserted in the first position of the linked list. 10 createEntry (hash, key, value, bucketIndex); 11}

Let's see how HashMap is resized. The size of JDK7HashMap expansion is twice the size of the previous hash List 2 * table. length

Void resize (int newCapacity)

The most important part of this method is the transfer (Entry [], boolean) method. The first parameter indicates the new hash List reference after expansion, and the second parameter indicates whether to initialize the hash seed.

We use a legend to describe how HashMap expands in jdk7.

Assume that the following HashMap is available, the initial capacity is initialCapacity = 4, and the load factor is loadFactor = 0.5. The threshold value threshold = 4*0.5 = 2 during initialization. That is to say, when the third element is inserted, the size = 3 in HashMap is greater than the threshold value threshold = 2. Then, the system will resize. We have analyzed the Expansion Mechanism in two cases. One is that two key-values do not produce hash conflicts, and the other is that two key-values generate hash conflicts.

  1. During expansion, the current HashMapKey-valueNo hash conflict occurs.

Calculate the new position I of e based on the new hash, and then insert the element into the new hash using the header insertion method.

Insert A into the I position of the new hash by using the header insertion method. At this time, the pointer continues to move through e = next and the element to be inserted becomes B, as shown below.

In this case, the key value of Element B is hashed to calculate its position in the new hash list. No matter where it is, it is a header insertion method, assume that A conflict still occurs on location A, and the header insertion becomes as follows.

We can see that during the expansion process, the transfer of the linked list is the key. The transfer of the linked list is inserted by the header insertion method, so it is precisely because of the header insertion method, the positions of conflicting elements in the new hash are the opposite to those in the old hash.

There is also a need to note about the HashMap resizing mechanism. In the case of concurrency, HashMap may not only cause data errors, but also cause CPU usage, the reason is that, under the concurrency condition, the expansion mechanism of HashMap may causeEndless loop. The following illustration shows why HashMap creates an endless loop in a concurrent environment.

Assume that in the concurrent environment, two threads are currently resizing the same HashMap.

At this time, thread T1 has completed the transfer of the HashMap element before the expansion. However, due to the Java memory model, thread T2 still shows a copy of the variable before HashMap in its own thread. In this case, T2 transfers data, as shown in.

In addition, newTable [I] points to element A in the new hash list of T2, And the node to be inserted becomes B, as shown in.

Originally, under normal circumstances, next would point to null, but because T1 has transposed A-> B linked list B-> A, that is, next again refers to, and B is inserted into newTable [I] of T2.

Because next is not empty at this time, the next step will assign the next value to e, that is, e = next. The loop is formed by repeating A and B to form an endless loop.

Therefore, do not use HashMap in a concurrent environment. Once an infinite loop CPU100 % occurs, this problem is not easy to reproduce and troubleshoot. The ConcurrentHashMap thread security class must be used in the concurrent environment.

After discussing the put Method in JDK 7, let's take a look at how the HashMap of the Red-black tree is added in JDK 8, how to resize it, and how to convert the linked list to the red-black tree.

JDK8-- HashMap # put

1 // JDK8, HashMap # put2 public V put (K key, V value) {3 // In JDK8, The put method directly calls the putVal method, this method has five parameters: key hash value, key, value, and onlyIfAbsent (if it is true, the value will not be replaced when the value already exists in the Map ), evict is meaningless in HashMap. 4 return putVal (hash (key), key, value, false, true); 5}

Therefore, the key method is putVal.

1 // The putVal method in JDK8 is roughly the same as the insert step in the put Method in JDK7. You also need to determine whether the insert is the first time and whether the insert position conflicts, the difference is that the inserted node is a "linked list node" or "Red/Black" node. 2 final V putVal (int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {3 // 1. whether it is the first insertion or not. if it is the first insertion, the resize algorithm is reused to initialize the hash list. 4 if (tab = table) = null | (n = tab. length) = 0) 5 n = (tab = resize ()). length; 6 // 2. use I = (n-1) & hash to calculate the subscript of the hash list where the key value is located, and determine whether the tab [I] already has elements, that is, whether there are any conflicts. If not, insert them directly, note that if the inserted key is null, the policy is slightly different from that of JDK 7. JDK 7 is used to traverse the hash list and insert data directly if it is null, while JDK 8 always inserts the first position, even if there are elements, the chain table 7 if (p = tab [I = (n-1 )& Hash]) = null) 8 tab [I] = newNode (hash, key, value, null); 9 // 3. when tab [I] has elements, a conflict occurs. If it is JDK7, you can directly use the header insertion method. However, in JDK8, HashMap adds the data structure of the red/black tree, at this time, it may already be a red-black tree structure, or it may be at the critical point when the chain table turns to a red-black tree. Therefore, you need to have several judgment conditions: 10 else {11 // 3.1. This is a special judgment, if the hash and key values of tab [I] are the same as those of the inserted element, the value can be overwritten by 12 if (p. hash = hash & (k = p. key) = key | (key! = Null & key. equals (k) 13 e = p; 14 // 3.2 the node to be inserted is a red/black tree node 15 else if (p instanceof TreeNode) 16 e = (TreeNode <K, V>) p ). putTreeVal (this, tab, hash, key, value); 17 // 3.3 After insertion, it may continue to be a linked list or be converted to a red/black tree. When the number of elements exceeds 8, the linked list is converted to a red/black tree, therefore, the first one requires a counter to traverse and calculate the number of elements on tab [I] 18 else {19 for (int binCount = 0; ++ binCount) {20 if (e = p. next) = null) {21 p. next = newNode (hash, key, value, null); // traverses the next of the current element and points to null. It is inserted through the end plug method, this is also slightly different from JDK7's header insertion method. 22 if (binCount> = TREEIFY_THRESHOLD-1) // The number of tab [I] exceeds the critical value of 8, in this case, the linked list is converted to the red/black tree, and the cycle 23 treeifyBin (tab, hash); 24 break; 25} 26 if (e. hash = hash & (k = e. key) = key | (ke Y! = Null & key. equals (k) // In this case, the same element as the inserted key occurs. directly jump out of the loop and overwrite the value. You do not need to insert 27 break; 28 p = e; 29} 30} 31 if (e! = Null) {// In this case, the key with the inserted element already exists in the Map. In this case, no insert operation is performed. You can directly overwrite the value to 32 V oldValue = e. value; 33 if (! OnlyIfAbsent | oldValue = null) 34 e. value = value; 35 afterNodeAccess (e); 36 return oldValue; 37} 38} 39 + + modCount; // modify the count. It is compared with this variable when the Iterator is used, if they are not equal, ConcurrentModificationException will be thrown. 40 if (++ size> threshold) // you can determine whether to resize 41 resize (); 42 afterNodeInsertion (evict ); // It does not make sense 43 return null; 44}

 

From the source code analysis of the put insertion methods of JDK 7 and JDK 8, JDK 8 is indeed a lot more complicated. Without patience, this "dry goods" is indeed relatively dry, I tried to review the insertion process of JDK7 and JDK8 in the following illustration. After comparison, I analyzed the insertion of the red and black trees in jdk8.

In summary, the put insertion methods of JDK 7 and JDK 8 are basically the same. The core is to calculate the hash of the key and calculate the subscript of the hash list through hash, and then determine whether a conflict exists. There is only a slight difference in implementation details. For example, JDK 7 will perform special processing on key = null, while JDK 8 will always be placed in 0th locations; in the case of conflicts, JDK7 inserts data using the header insertion method, while JDK8 inserts data using the tail Insertion Method in the linked list structure. Of course, the biggest difference is that JDK8 judges nodes as follows: linked List node, red/black tree node, and linked list conversion key node of the red/black tree.

For the time being, the insertion of the red and black trees is not analyzed. The next step is to analyze the JDK8 expansion method.

1 // JDK8, HashMap # resize expansion, the size of HashMap expansion is still twice the size of the previous hash List 2 final Node <K, V> [] resize () {3 // 1. because the resize method is used when JDK8 initializes the hash, The oldTab is used to determine whether the value is 0 (indicating initialization) and whether the size is greater than or equal to the maximum capacity. After the judgment, newTab is expanded to twice the oldTab, And the newThr (threshold) is twice the previous one. Source code. 4 // 2. after determining the newTab size, initialize the newTab hash array 5 Node <K, V> [] newTab = (Node <K, V> []) new Node [newCap]; 6 table = newTab; 7 // 3. if it is initialization (that is, oldTab = null), a new hash array is returned directly. If it is not, the array is transferred 8 // 4. first, traverse the hash List 9 for (int j = 0; j <oldCap; ++ j) {10 // 5. e = oldCap [I]! = Null, then continue to judge 11 // 5.1 current position I, whether there is a conflict, if not, directly transfer 12 if (e. next = null) 13 newTab [e. hash & (newCap-1)] = e; // here, hash is not re-calculated for the element to be transferred. For JDK7, hash (e. getKey () ^ newCap recalculates the position of e in newTab, Which is e. hash & (newCap-1) reduces the process of re-calculating the hash. The expanded position is either in the original position or in the original index + oldCap position 14 // 5.2 to determine whether it is a red/black tree node 15 else if (e instanceof TreeNode) 16 (TreeNode <K, V>) e ). split (this, newTab, j, oldCap); 17 // 5.3 judge whether it is a linked list node 18 else {19... 20} 21} 22}

 

Compared with JDK8's expansion mechanism, JDK7 not only adds a judgment on whether the node is a red/black tree, but also makes some minor optimizations. In particular, the hash value of the key is not recalculated in JDK 8.

Public V remove (Object key)

If you already know the put process clearly, I believe that other methods in HashMap can basically understand the routines. Removing and deleting are no exception. Calculate the hash (key) and the position I of the hash list to determine whether I has elements, whether the elements are red-black trees or linked lists.

This method is easy to fall into the trap that the key value is a custom pojo class and does not overwrite the equals and hashCode methods. In this case, pojo is used as the key value for deletion, it is very likely that "cannot be deleted" will occur. This requires rewriting of equals and hashCode to make the two pojo objects "equal ".

The remaining methods are similar in terms of concept, such as hash calculation, traversal, and node type determination. After finding out the put and resize methods, all the methods are similar. After reading this article, you should try to ask yourself the following questions:

 

 

This is a public number that can add buff to programmers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.