First, the background: In the look HashMap source code is seen resize () source codes, at that time found in the old linked list of reference data copied to the new linked list, when the replication process, the source is reversed, this is allowed to reverse the storage, at the same time the design efficiency is high, do not use the tail insertion , and each time it is traversed to the tail. The following is a summary of the principle: JDK1.7 HashMap in the implementation of resize (), the new table[] List of the LIFO method, that is, the team header insertion. The purpose of this is to avoid trailing loops. Tail traversal is to avoid traversing the end of the queue when the data is inserted in the new list. Because, direct insertion is more efficient. Direct use of the team head insert, will make the list data reverse
For example, the original order is: the insertion sequence is as follows 1020 1030 20 1040 30 20 10
Second, the existing problems:
Using the method of Team head insertion, which led to the HashMap in the "Multi-threaded environment" of the death cycle problem
Symptoms of the problem
In the past, our Java code used HashMap for some reason, but the program was single-threaded and everything was fine. Later, our program performance problems, so need to become multi-threaded, and then, become multithreaded to the line, found that the program is often accounted for 100% of the CPU, view the stack, you will find that the program is hang in the Hashmap.get () This method, the problem disappears after restarting the program. But it will come a while. Moreover, this problem may be difficult to reproduce in a test environment.
We simply look at our own code and we know that HashMap is being manipulated by multiple threads. The Java documentation says HASHMAP is non-thread safe and should be used Concurrenthashmap.
But here we can look at the reasons.
Hash table Data Structure
HashMap usually uses an array of pointers (assuming table[]) to do all the key scattered, when a key is added, the hash algorithm through the key to calculate the array of the subscript I, and then put this <key, value> plug into table[i] , if two different keys are counted in the same I, then it is called conflict, also called collision, this will form a list on table[i].
We know that if the size of table[] is very small, such as only 2, if you want to put in 10 keys, then the collision is very frequent, so an O (1) search algorithm, it becomes the link list traversal, performance becomes O (n), which is the hash table defect.
Therefore, the size and capacity of the hash table is very important. In general, the hash table this container when there is data to be inserted, will check whether the capacity is more than the set of Thredhold, if more than, need to increase the size of the hash table, but in this way, the whole hash of the entire list of the need to be re-counted. This is called rehash, this cost is quite big.
I believe you are familiar with this basic knowledge.
HashMap's Rehash source code
voidTransfer (entry[] newtable,Booleanrehash) { intNewcapacity =newtable.length; //for the code in the For loop, iterate through the linked list, recalculate the index position, copy the old array data into the new array (the array does not store the actual data, so it is just a copy reference) and the Clone method in ArrayList or LinkedList is the same as the shallow copy relationship foreach (entry<k,v>e:table) { while(NULL!=e) {Entry<K,V> next =E.next; if(Rehash) {E.hash=NULL= = E.key? 0: Hash (E.key); } inti =indexfor (E.hash, newcapacity); //The current entry next chain points to the new index position, Newtable[i] may be empty, there may also be a entry chain, if it is a chain of entry, directly in the list header inserted.
First time newtable[i] = null
e.next = Newtable[i]; Newtable[i] = e; E = Next; } } }
Well, this code is fairly normal. And there's no problem.
The process of normal rehash
I drew a picture and made a presentation.
- I assume that our hash algorithm simply uses the key mod to size the table (that is, the length of the array).
- The top is the old hash table, where the hash table is size=2, so key = 3, 7, 5, after mod 2 all conflict in table[1] here.
- The next three steps are the hash table resize into 4, and then all <key,value> re-rehash the process
Rehash under the concurrency
1) Suppose we have two threads. I marked it in red and light blue.
Let's look back at this detail in our transfer code:
int i = indexfor (E.hash, newcapacity);//Assuming that a thread executes to this loss of running privileges // to point the current entry's next chain to the new index location, Newtable[i] may be empty, there may also be a entry chain, if it is a chain of entry, directly inserted in the list header. // first time newtable[i] = null== next;
And our thread two execution is done. So we have the following look.
Note that because Thread1 's e points to key (3), and next points to key (7), after thread two rehash, it points to the linked list of threads two reorganization. We can see that the order of the linked list is reversed.
2) The line Cheng is dispatched back to execute.
- First executes newtalbe[i] = e;
- Then E = Next, which led to the E pointing to Key (7),
- The next loop, next = E.next, causes next to point to key (3).
3) All is well.
Line Cheng went on to work. Take the key (7) down, put it on the first one of Newtable[i], and move e and next down.
4) Ring link appears.
E.next = Newtable[i] Causes key (3). Next points to key (7)
Note: At this point the key (7). Next has pointed to key (3), and the ring list appears.
So, when our thread was called to Hashtable.get (11), the tragedy appeared--infinite Loop.
Third, problem solving: JDK1.8 optimization by increasing the tail pointer, it avoids the dead loop problem (the data is inserted directly to the tail of the team) and avoids the tail traversal. Personal feeling this improvement is much better, in the jdk1.8 LinkedList class is also through a head and tail to achieve the design, so as to avoid errors, but also improve operational efficiency. The code is as follows:
if(Oldtab! =NULL) { for(intj = 0; J < Oldcap; ++j) {Node<K,V>e; if((e = oldtab[j])! =NULL) {Oldtab[j]=NULL; if(E.next = =NULL) Newtab[e.hash& (newCap-1)] =e; Else if(EinstanceofTreeNode) ((TreeNode<K,V>) (e). Split ( This, NewTab, J, Oldcap); Else{//Preserve OrderNode<k,v> Lohead =NULL, Lotail =NULL;//JDK1.8 improved the rehash algorithm, the capacity doubled, the new expansion part, identified as Hi, the original old part is identified as Lo node<k,v> hihead = null, Hitail = NULL; //the tail and team head pointers are declared. Node<k,v>Next; Do{Next=E.next; if((E.hash & oldcap) = = 0) { if(Lotail = =NULL) Lohead=e; ElseLotail.next=e; Lotail=e; } Else { if(Hitail = =NULL) Hihead=e; ElseHitail.next=e; Hitail=e; } } while((e = next)! =NULL); if(Lotail! =NULL) {Lotail.next=NULL; NEWTAB[J]=Lohead; } if(Hitail! =NULL) {Hitail.next=NULL; Newtab[j+ Oldcap] =Hihead; } } } } }
Dead loop problem in Resezi method of HashMap Tail traversing (multithreading)