ConcurrentHashMap for Java concurrent programming

Source: Internet
Author: User
Tags rehash

ConcurrentHashMap
ConcurrentHashMap is a thread-safe Hash Table. Its main function is to provide a set of methods with the same features but thread-safe features as HashTable. ConcurrentHashMap can read data without locking, and its internal structure allows it to keep the granularity of the lock as small as possible during write operations without locking the entire ConcurrentHashMap.

Internal Structure of ConcurrentHashMap
To improve its concurrency, ConcurrentHashMap adopts a structure called Segment internally. A Segment is actually a Hash Table-like structure, and the Segment maintains an array of linked lists, the following figure shows the internal structure of ConcurrentHashMap:

From the above structure, we can understand that the process of locating an element in ConcurrentHashMap requires two Hash operations. The first Hash operation locates the Segment, and the second Hash operation locates the head of the linked list where the element is located, therefore, the side effect of this structure is that the Hash process is longer than the normal HashMap process, but the advantage is that when writing, you can only lock the Segment where the element is located, without affecting other segments. In this way, in the ideal situation, concurrentHashMap can support write operations with the maximum number of segments at the same time (these write operations are evenly distributed across all segments). Therefore, through this structure, the concurrency of ConcurrentHashMap can be greatly improved.

Segment
Let's take a look at the Segment data structure:

1
Static final class Segment <K, V> extends ReentrantLock implements Serializable {
2
Transient volatile int count;
3
Transient int modCount;
4
Transient int threshold;
5
Transient volatile HashEntry <K, V> [] table;
6
Final float loadFactor;
7
}
The significance of the member variables in Segment is as follows:

Count: number of elements in a Segment
ModCount: number of operations that affect the table size (such as put or remove operations)
Threshold: threshold value. When the number of elements in a Segment exceeds this value, the Segment will be resized.
Table: A linked list array. Each element in the array represents the head of a linked list.
LoadFactor: load factor used to determine threshold
HashEntry
The elements in the Segment are stored in the linked list array in the form of HashEntry. Let's take a look at the structure of the HashEntry:

1
Static final class HashEntry <K, V> {
2
Final K key;
3
Final int hash;
4
Volatile V value;
5
Final HashEntry <K, V> next;
6
}
We can see a feature of HashEntry. Several other variables except value are final. This is done to prevent the chain table structure from being damaged and ConcurrentModification occurs.

ConcurrentHashMap Initialization
Next we will analyze the implementation of ConcurrentHashMap in combination with the source code, first look at the initialization method:

01
Public ConcurrentHashMap (int initialCapacity,
02
Float loadFactor, int concurrencyLevel ){
03
If (! (LoadFactor> 0) | initialCapacity <0 | concurrencyLevel <= 0)
04
Throw new IllegalArgumentException ();
05
 
06
If (concurrencyLevel> MAX_SEGMENTS)
07
ConcurrencyLevel = MAX_SEGMENTS;
08
 
09
// Find power-of-two sizes best matching arguments
10
Int sshift = 0;
11
Int ssize = 1;
12
While (ssize <concurrencyLevel ){
13
++ Sshift;
14
Ssize <= 1;
15
}
16
SegmentShift = 32-sshift;
17
SegmentMask = ssize-1;
18
This. segments = Segment. newArray (ssize );
19
 
20
If (initialCapacity> MAXIMUM_CAPACITY)
21
InitialCapacity = MAXIMUM_CAPACITY;
22
Int c = initialCapacity/ssize;
23
If (c * ssize <initialCapacity)
24
++ C;
25
Int cap = 1;
26
While (cap <c)
27
Cap <= 1;
28
 
29
For (int I = 0; I <this. segments. length; ++ I)
30
This. segments [I] = new Segment <K, V> (cap, loadFactor );
31
}
The initialization of CurrentHashMap has three parameters: initialCapacity, indicating the initial capacity, loadFactor, indicating the load parameter, and concurrentLevel, indicating the number of segments in ConcurrentHashMap, once specified, ConcurrentLevel cannot be changed. If the number of ConcurrentHashMap elements increases subsequently, ConrruentHashMap needs to be expanded. ConcurrentHashMap does not increase the number of segments, but only increases the size of the linked list array in the Segment, the advantage is that the expansion process does not need to rehash the entire ConcurrentHashMap, but only needs to rehash the elements in the Segment.

The initialization method of the entire ConcurrentHashMap is very simple. First, a new Segment is generated based on concurrentLevel. Here, the number of segments is not greater than the maximum 2 Index of concurrentLevel, that is to say, the number of Segment is always 2 exponent. The advantage is that it is convenient to use shift operations for hash and speed up the hash process. The next step is to determine the size of the Segment capacity based on intialCapacity. The size of each Segment is also 2, which also accelerates the hash process.

Pay special attention to the two variables, segmentShift and segmentMask. These two variables will play a major role in the future. Assume that the constructor determines that the number of Segment is the N power of 2, then segmentShift is equal to 32 minus n, and segmentMask is equal to the Npower of 2 minus one.

Get operation of ConcurrentHashMap
As mentioned above, the get operation of ConcurrentHashMap does not need to be locked. Let's take a look at its implementation here:

1
Public V get (Object key ){
2
Int hash = hash (key. hashCode ());
3
Return segmentFor (hash). get (key, hash );
4
}
In the third row, the segmentFor function is used to determine which segment the operation should be performed. This function is used for almost all ConcurrentHashMap operations. Let's look at the implementation of this function:

1
Final Segment <K, V> segmentFor (int hash ){
2
Return segments [(hash >>> segmentShift) & segmentMask];
3
}
This function uses a bitwise operation to determine the Segment. It shifts the segmentShift bit to the right unsigned and unsigned based on the input hash value, and then performs operations with segmentMask to combine the values of segmentShift and segmentMask we mentioned earlier, we can draw the following conclusion: assuming that the number of segments is the nth power of 2, you can determine which Segment the element is in based on the n-bit high of the hash value of the element.

After determining which segment to operate, the next thing is to call the get method of the corresponding Segment:

01
V get (Object key, int hash ){
02
If (count! = 0) {// read-volatile
03
HashEntry <K, V> e = getFirst (hash );
04
While (e! = Null ){
05
If (e. hash = hash & key. equals (e. key )){
06
V v = e. value;
07
If (v! = Null)
08
Return v;
09
Return readValueUnderLock (e); // recheck
10
}
11
E = e. next;
12
}
13
}
14
Return null;
15
}
First, let's look at the second line of code. Here we make a judgment on count. count indicates the number of elements in the Segment. Let's take a look at the definition of count:

1
Transient volatile int count;
We can see that count is volatile. In fact, the volatile semantics is used here:

The write operation for the volatile field is happens-before for each subsequent read operation for the same field.
In fact, put, remove, and other operations will also update the count value, so when competition occurs, the volatile semantics can ensure that the write operation is before the read operation, this ensures that the write operation is visible to subsequent read operations, so that subsequent get operations can obtain the complete element content.

Then, in the third row, call getFirst () to obtain the head of the linked list:

1
HashEntry <K, V> getFirst (int hash ){
2
HashEntry <K, V> [] tab = table;
3
Return tab [hash & (tab. length-1)];
4
}
Similarly, the bitwise operation is used to determine the head of the linked list. The length of the hash value and HashTable is reduced by one. The final result is the low n bits of the hash value, where n is the result of the HashTable length at the bottom of 2.

After the head of the linked list is determined, you can traverse the entire linked list, read the 4th rows, and retrieve the value of the value corresponding to the key. If the value obtained is null, this key and value pair may be in the put process. If this happens, lock the key and value pair to ensure that the retrieved value is complete. If it is not null, the value is directly returned.

Put operation of ConcurrentHashMap
After reading the get operation, let's take a look at the put operation. Before the put operation, it is also the process of determining the Segment. Here we will not repeat it here. Let's look at the key segment put method:

01
V put (K key, int hash, V value, boolean onlyIfAbsent ){
02
Lock ();
03
Try {
04
Int c = count;
05
If (c ++> threshold) // ensure capacity
06
Rehash ();
07
HashEntry <K, V> [] tab = table;
08
Int index = hash & (tab. length-1 );
09
HashEntry <K, V> first = tab [index];
10
HashEntry <K, V> e = first;
11
While (e! = Null & (e. hash! = Hash |! Key. equals (e. key )))
12
E = e. next;
13
 
14
V oldValue;
15
If (e! = Null ){
16
OldValue = e. value;
17
If (! OnlyIfAbsent)
18
E. value = value;
19
}
20
Else {
21
OldValue = null;
22
++ ModCount;
23
Tab [index] = new HashEntry <K, V> (key, hash, first, value );
24
Count = c; // write-volatile
25
}
26
Return oldValue;
27
} Finally {
28
Unlock ();
29
}
30
}
First, the put Operation on the Segment is locked. Then, on the fifth line, if the number of elements in the Segment exceeds the threshold (calculated by loadFactor In the constructor), you need to resize the Segment, rehash is required. You can learn about the rehash process by yourself. I will not go into details here.

Rows 8th and 9th are the getFirst operation, which determines the position of the head of the linked list.

In row 11th, the while loop is used to search for elements with the same key as the elements to be put in the linked list. If found, update the value of the key directly. If not found, then, enter line 21, generate a new HashEntry, add it to the header of the entire Segment, and then update the count value.

Remove operation of ConcurrentHashMap
The first part of the Remove operation is the same as the previous get and put operations. It is the process of locating the Segment and then calling the remove Method of the Segment operation:

01
V remove (Object key, int hash, Object value ){
02
Lock ();
03
Try {
04
Int c = count-1;
05
HashEntry <K, V> [] tab = table;
06
Int index = hash & (tab. length-1 );
07
HashEntry <K, V> first = tab [index];
08
HashEntry <K, V> e = first;
09
While (e! = Null & (e. hash! = Hash |! Key. equals (e. key )))
10
E = e. next;
11
 
12
V oldValue = null;
13
If (e! = Null ){
14
V v = e. value;
15
If (value = null | value. equals (v )){
16
OldValue = v;
17
// All entries following removed node can stay
18
// In list, but all preceding ones need to be
19
// Cloned.
20
++ ModCount;
21
HashEntry <K, V> newFirst = e. next;
22
For (HashEntry <K, V> p = first; p! = E; p = p. next)
23
NewFirst = new HashEntry <K, V> (p. key, p. hash,
24
NewFirst, p. value );
25
Tab [index] = newFirst;
26
Count = c; // write-volatile
27
}
28
}
29
Return oldValue;
30
} Finally {
31
Unlock ();
32
}
33
}
First, the remove operation is used to determine the location of the element to be deleted. However, the method for deleting an element here is not simply to point the next of an element before the element to be deleted to the next one, we have already said that next in HashEntry is final. Once assigned, it cannot be modified. After locating the element to be deleted, the program will copy all the elements before the elements to be deleted, and then re-connect them to the linked list one by one. You can see the figure below to understand the process:

 

If you want to delete element 3 as shown in the original element in the linked list, the linked list after element 3 is deleted is shown in:

ConcurrentHashMap size operation
In the previous chapter, we involved operations in a single Segment, but ConcurrentHashMap has some operations in multiple segments, such as the size operation, the size operation of ConcurrentHashMap also adopts a clever method to avoid locking all segments.

As mentioned above, there is a modCount variable in a Segment, which represents the number of operations that affect the number of elements in the Segment. This value is only increased or decreased, the size operation traverses the Segment twice, records the modCount value of the Segment each time, and then compares the modCount values. If the values are the same, no write operation is performed during the period, returns the original traversal results. If they are different, repeat the process again. If they are not the same, all the segments must be locked and then traversed one by one, for specific implementation, you can refer to the source code of ConcurrentHashMap. Here we will not post it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.