The causes and solutions of the HashMap death cycle in Java

Source: Internet
Author: User
Tags hash rehash

Taobao in the intranet to see colleagues posted a CPU was 100% of the online fault, and this happened many times, because the Java language in the concurrency of the use of HashMap caused race Condition, resulting in a dead loop. This thing I have experienced 4, 5 years ago, I thought there was nothing to write, because Java HashMap is not thread-safe, so in the concurrent under the inevitable problems. But I find that a lot of people have been through this in recent years (HashMap Infinite Loop on the internet to see a lot of people talking about it) so it's a common problem to write a vaccine article about this and show everyone a perfect "Race Condition "is how to form.

Symptoms of the problem

In the past our Java code used HashMap for some reason, but the program was single-threaded and everything was fine. Later, our program performance problems, so we need to become multi-threaded, so, become more than a thread after the line, found that the program often accounted for 100% of the CPU, view the stack, you will find that the program is hang in the Hashmap.get () This method, restart the program after the problem disappears. But it will come again after a while. Moreover, this problem may be difficult to reproduce in a test environment.

Let's take a quick look at our own code and we know that HashMap is manipulated by multiple threads. The Java documentation says HASHMAP is not thread safe and should be concurrenthashmap.

But here we can look at the reasons.

Hash table Data Structure

I need to speak briefly about the classic data structure of HashMap.

HashMap usually uses an array of pointers (assumed to be table[]) to disperse all keys, and when a key is added, the hash algorithm is used to calculate the subscript I of the array via key, and then insert this into the table[i]. If there are two different keys that are counted in the same I, then it is called a collision, and then a table[i to form a linked list.

We know that if the size of the table[] is very small, for example, there are only 2, if you want to put in 10 keys, then the collision is very frequent, so an O (1) search algorithm, it becomes a linked list traversal, performance becomes O (n), this is the flaw of the hash table (see hash Collision DoS Issue ").

Therefore, the size and capacity of the hash table is very important. In general, the hash table this container when there is data to insert, will check the capacity is not more than the set of Thredhold, if more than, you need to increase the size of the hash table, but this way, the entire hash of the element needs to be counted again. This is called rehash, the cost is quite large.

I believe you are familiar with this basic knowledge.

Rehash source code for HashMap

Next, let's take a look at the Java HashMap source code.

Put a key,value pair into the hash table:

Public v-Put (K key, V value)
{
......
Calculate hash value
int hash = hash (Key.hashcode ());
int i = indexfor (hash, table.length);
If the key has been inserted, replace the old value (link action)
for (entry<k,v> e = table[i]; e!= null; e = e.next) {
Object K;
if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))) {
V oldValue = E.value;
E.value = value;
E.recordaccess (this);
return oldValue;
}
}
modcount++;
The key does not exist and a node needs to be added.
AddEntry (hash, key, value, I);
return null;
}

Check whether the capacity is exceeded

void AddEntry (int hash, K key, V value, int bucketindex)
{
Entry<k,v> e = Table[bucketindex];
Table[bucketindex] = new entry<k,v> (hash, key, value, E);
See if the current size exceeds the threshold we set threshold, if it exceeds, it needs resize
if (size++ >= threshold)
Resize (2 * table.length);
}

Create a larger hash table and migrate the data from the old hash table to the new hash table.

void Resize (int newcapacity)
{
entry[] oldtable = table;
int oldcapacity = Oldtable.length;
......
Create a new hash Table
entry[] newtable = new Entry[newcapacity];
Migrate data on old hash table to new hash table
Transfer (newtable);
Table = newtable;
threshold = (int) (newcapacity * loadfactor);
}

Migrated source code, note highlights:

void Transfer (entry[] newtable)
{
entry[] src = table;
int newcapacity = Newtable.length;
The following code means:
Pick an element from the oldtable and put it in the newtable.
for (int j = 0; J < Src.length; J + +) {
Entry<k,v> e = src[j];
if (e!= null) {
SRC[J] = null;
do {
Entry<k,v> next = E.next;
int i = indexfor (E.hash, newcapacity);
E.next = Newtable[i];
Newtable[i] = e;
e = next;
while (e!= null);
}
}
}

Well, this code is fairly normal. And there's nothing wrong with that.

The normal process of rehash

drew a picture and made a demo.

I assume that our hash algorithm is simply using the key to mod the size of the table (that is, the length of the array).

The top is the old hash table, where the hash table of the size=2, so key = 3, 7, 5, after mod 2 all conflict in table[1] here.

The next three steps are hash table resize into 4, and then all the process of rehash again

Concurrent Rehash

1) Suppose we have two threads. I marked it in red and light blue.

Let's look back at this detail in our transfer code:

do {
Entry<k,v> next = E.next; <--assumes that the thread was scheduled to be suspended as soon as it was executed.
int i = indexfor (E.hash, newcapacity);
E.next = Newtable[i];
Newtable[i] = e;
e = next;
while (e!= null);

And our thread two execution was done. So we have the following look.

Note that because Thread1 's e points to key (3), and next points to key (7), its thread two rehash after it points to the threaded two-reorganized list. We can see that the order of the linked list is reversed.

2) The line Cheng is dispatched back to execution.

First Execute newtalbe[i] = e;

Then E = Next, which leads to e pointing to key (7),

And next in the loop = E.next causes next to point to key (3)

3) Everything is OK.

Line Cheng and then work. Take the key (7) off, put it in the first newtable[i, and move E and next down.

4) Ring link appears.

E.next = Newtable[i] Causes key (3). Next points to key (7)

Note: Key (7) at this time. Next already points to key (3), and the ring list appears.

So when our thread calls to Hashtable.get (11), the tragedy appears--infinite Loop.

Other

Someone has reported the problem to sun, but Sun doesn't think it's a problem. Because HashMap does not support concurrency. To be concurrent, use Concurrenthashmap.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6423457

I'm here to record this, just to get people to understand and experience the dangers of concurrent environments.

Reference: http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.