In the previous article introduced the principle of HashMap, this section is the last section of Concurrentmap, so will be a complete introduction of CONCURRENTHASHMAP implementation.
Concurrenthashmap principle
A method of implementing a map with a read-write lock is described in the section on read-write locks. This approach looks like the ability to implement map responses, and the throughput should be good. However, through the analysis of the principle of read-write lock before, read-write lock is suitable for the scene is reading operation >> write operation, that is, read operation should occupy most of the operation, in addition to read-write lock There is a very serious problem is that read and write operations can not occur simultaneously. To solve the problem of reading and writing at the same time (at least the different elements of the read and write separation), then the lock can only be split, different elements have different locks, this technology is "lock separation" technology.
By default, Concurrenthashmap uses 16 similar hashmap structures, each of which has an exclusive lock of HashMap. In other words, the result is a hash algorithm that maps any element evenly to the map.entry of a hashmap, while the operation of one element concentrates on the hashmap of its distribution, regardless of other hashmap. This supports up to 16 concurrent write operations.
is the class diagram of Concurrenthashmap. Referring to the above instructions and the HASHMAP analysis, you can see that concurrenthashmap divides the entire list of objects into segmentmask+1 fragments (Segment). Each of these fragments is a structure similar to HashMap, it has an array of hashentry, each item of the array is a linked list, which is concatenated by the next reference of the Hashentry.
The definition of the data structure above this class diagram is very knowledgeable and will be followed by a targeted analysis.
First, how to navigate from Concurrenthashmap to Hashentry. In the HASHMAP Principle Analysis Section said, for a hash data structure, in order to reduce wasted space and fast positioning data, then the data on the hash distribution is more uniform. For a map lookup, first you need to locate the segment and then navigate from the segment to the Hashentry linked list, and finally get the necessary elements by traversing the list.
Discuss how to navigate to Hashentry without discussing concurrency. In Concurrenthashmap, the segment is obtained by hashing (Key.hashcode ()) and segmentfor (hash). Listing 1 describes how to locate the segment process. where hash (int) is the hashcode of key is encoded two times, so that it can be evenly distributed on SEGMENTMASK+1 segment (default is 16). Can see is here and HashMap or a bit different, the algorithm used here is called Wang/jenkins Hash, interested can reference 1 and reference 2. In short, its purpose is to make the elements evenly distributed on different segment, so as to support up to segmentmask+1 concurrency, where segmentmask+1 is the size of segments.
Listing 1 Locating segment
private static int hash (int h) {
Spread bits to regularize both segment and index locations,
Using variant of Single-word wang/jenkins hash.
H + = (h << +) ^ 0xffffcd7d;
H ^= (H >>> 10);
H + = (h << 3);
H ^= (H >>> 6);
H + = (H << 2) + (h << 14);
Return h ^ (H >>> 16);
}
Final segment<k,v> segmentfor (int hash) {
Return segments[(hash >>> segmentshift) & Segmentmask];
}
Obviously, the size of the segments should be fixed if the segment cannot be enlarged. So segments/segmentmask/segmentshift are constants in Concurrenthashmap, and once initialized they cannot be modified again, segmentshift is a constant offset to find segment.
With segment after positioning Hashentry and HashMap in the same location hashentry, the hash value and segment hashentry in the size minus 1 and operation to locate the Hashentry linked list, Then traverse the linked list to complete the corresponding operation.
The ability to locate elements after Concurrenthashmap has already had the function of HashMap, now to solve is how to concurrency problem. To solve concurrency problems, locking is inevitable. Looking back at segment's class diagram, you can see that segment in addition to the element size count of a volatile type, segment is also integrated from Reentrantlock. In addition, in the previous atomic operation and locking mechanism introduced, to maximize the support of concurrency, then the idea is to use as far as possible to read the operation is not locked, write operation does not lock. If the read operation is not locked, the write operation is locked, and it needs to be defined as a volatile type for competing resources. The volatile type guarantees the Happens-before rule, so volatile can minimize the effect of lock-in when it is approximately guaranteed to be correct, and it also does not conflict with the lock on the write operation.
At the same time, in order to prevent the traversal of the hashentry is destroyed, then for the Hashentry data structure, in addition to the value of other properties should be constant, otherwise the inevitable will be concurrentmodificationexception. This is why Key,hash,next is a constant in the Hashentry data structure (final type).
With the above analysis and conditions to see segment get/put/remove is much easier.
Get operation
Listing 2 segment positioning elements
V Get (Object key, int hash) {
if (count! = 0) {//Read-volatile
Hashentry<k,v> e = GetFirst (hash);
while (E! = null) {
if (E.hash = = Hash && key.equals (E.key)) {
V v = e.value;
if (v! = null)
return v;
Return Readvalueunderlock (e); Recheck
}
e = E.next;
}
}
return null;
}
hashentry<k,v> getfirst (int hash) {
hashentry<k,v>[] tab = table;
Return Tab[hash & (tab.length-1)];
}
V Readvalueunderlock (hashentry<k,v> e) {
Lock ();
try {
return e.value;
} finally {
Unlock ();
}
}
Listing 2 describes how the segment locates the element. First, the size of the segment count>0,segment is described as the number of hashentry NOT NULL (key is not empty). If an element exists in the segment, it is positioned on the head node of the specified Hashentry list through GetFirst and then iterates through the node, returning its corresponding value once the element corresponding to the key is found. However, in Listing 2, you can see that the value of Hashentry is also a judgment operation, and if it is empty, it needs to be locked and read again (Readvalueunderlock). Why is there such an operation? Although CONCURRENTHASHMAP does not allow null values to be added to the value, it is still possible to read an empty value that means that this value is not visible to the current thread (this is because Hashentry has not been fully constructed to assign a value, This mechanism is also discussed later).
Put operation
Listing 3 describes the put operation of segment. First you need to lock, modify a competitive resource must be locked, this is no doubt. It is necessary to note that the segment integration is Reentrantlock, so the lock here is an exclusive lock, that is, the same segment at the same time only can be a put operation.
The next step is to check if there is a need for expansion, which, like HashMap, expands one more time if needed, while rehash operations are performed.
Finding elements is just the same as get operations, and it's good to get the elements to modify their values directly. Here onlyifabsent just to achieve concurrentmap putifabsent operation. The following points need to be explained:
- If you find the key for the hashentry after the direct modification is good, if not found then need to construct a new hashentry to add to the hash of the Hashentry head, while the head is added to the new head behind. This is because the next of Hashentry is final type, so only the head node can be modified to add elements to the linked list.
- If you add a new operation, you need to write the count+1 back. As mentioned earlier, count is a volatile type, and the read operation is not locked, so you can only modify the count value when the element is actually written back to segment, which is put to the end of the entire operation.
- When a new hashentry is written to the table, the value is set by the constructor, which means that the assignment to the table may be preceded by the setting of value, that is, a semi-constructed hashentry. This is the problem that can be caused by reordering. Therefore, in the read operation, once read a value is empty value is the need to lock and re-read once. Why lock? Lock means the lock release of the previous write operation, that is, the data of the previous lock is finished, according to the Happens-before rule, the result of the previous write operation is visible to the current read thread. Of course, this problem does not necessarily occur after JDK 6.0.
- In segment, the table variable is a volatile type, and the overhead of reading the volatile type multiple times is larger than the volatile cost, and the compiler is not optimized, so first create a temporary variable in the put operation, tab to table, The efficiency of multiple read and write tabs is higher than the volatile type of table, which the JVM can also optimize.
Listing 3 put operations for segment
V Put (K key, int hash, V value, Boolean onlyifabsent) {
Lock ();
try {
int c = count;
if (c + + > Threshold)//ensure capacity
Rehash ();
hashentry<k,v>[] tab = table;
int index = hash & (tab.length-1);
Hashentry<k,v> first = Tab[index];
hashentry<k,v> e = first;
while (E! = null && (E.hash! = Hash | |!key.equals (E.KEY)))
e = E.next;
V OldValue;
if (E! = null) {
OldValue = E.value;
if (!onlyifabsent)
E.value = value;
}
else {
OldValue = null;
++modcount;
Tab[index] = new hashentry<k,v> (key, hash, first, value);
Count = C; Write-volatile
}
return oldValue;
} finally {
Unlock ();
}
}
Remove operation
Listing 4 describes the process by which segment deletes an element. As with put, remove also needs to be locked because there may be changes to the table. Since the next node of the Hashentry is final type, once you delete the middle element of the list, you need to re-join the new linked list before or after deleting the element. Instead, segment uses the elements before the delete element to rejoin the deleted element (that is, the link header node) to complete the construction of the new list.
The remove operation for listing 4 segment
V remove (object key, int hash, object value) {
Lock ();
try {
int c = count-1;
hashentry<k,v>[] tab = table;
int index = hash & (tab.length-1);
Hashentry<k,v> first = Tab[index];
hashentry<k,v> e = first;
while (E! = null && (E.hash! = Hash | |!key.equals (E.KEY)))
e = E.next;
V oldValue = null;
if (E! = null) {
V v = e.value;
if (value = = NULL | | value.equals (v)) {
OldValue = v;
All entries following removed node can stay
In list, but all preceding ones need to be
Cloned.
++modcount;
hashentry<k,v> Newfirst = E.next;
for (hashentry<k,v> p = first; P! = e; p = p.next)
Newfirst = new Hashentry<k,v> (P.key, P.hash,
Newfirst, P.value);
Tab[index] = Newfirst;
Count = C; Write-volatile
}
}
return oldValue;
} finally {
Unlock ();
}
}
The following describes how to delete an already existing element. Suppose we want to delete the B3 element. First, locate the segment where B3 is located, and then locate the B1 element in the segment table, which is the linked list where BX is located. Then traverse the list to find B3, and then find the B1 start to build a new node B1 (blue) added to the front of B4, continue to B1 the next node B2 construct B2 (blue), add to the blue B1 and B4 form a new linked list. Continue until you encounter B3 and terminate, so that you construct a new list B2 (blue)->b1 (blue)->b4->b5, and then set the header node of this list B2 (blue) to segment table. This completes the delete operation of the element B3. It should be stated that although the linked list still exists (B1->B2->B3->B4->B5), there is no reference to this linked list, so no references (B1->B2->B3) in this list will eventually be recycled by GC. One of the benefits of this is that if a read is already positioned on the old linked list when it is deleted, the operation will still be able to read the data, just read the old data, which is no problem in multithreading.
In addition to operating on a single element, there are operations on all segment, such as the size () operation.
Size operation
The size operation involves counting the sizes of all segment, which will traverse all the segment, and if each lock will cause the entire map to be locked, any operation that requires a lock will not work. Here is a clever solution to this problem.
In segment, there is a variable modcount, which is used to record the number of changes in the segment structure, including the addition of elements and the deletion of elements, each additional element operation is + 1, each deletion operation +1, each vacuuming operation (clear) 1. This means that each time the number of elements involved in the change of the Operation Modcount will be +1, and has been increased, will not decrease.
Traversing the segments of two times Concurrenthashmap, each traversal is the modcount of each segment, compared to the same modcount value of two traversal, If the same returns the count of the segment obtained during the traversal, that is, the number of all elements. If it's not the same, repeat it again. Once again, all segment are locked, one gets its size (count), and the count is added together to get the total size. Of course, finally, the lock one by one needs to be released. Listing 5 describes the process.
One of the more advanced topics here is why it is always necessary to read count when reading Modcount. Why not read the Modcount first and then read count? In other words, can the following two statements exchange the order?
sum + = Segments[i].count;
Mcsum + = Mc[i] = Segments[i].modcount;
The answer is NO! Why? This is because Modcount is always locked in the case of change, so there will be no multi-threaded simultaneous modification of the situation, that is, the volatile type is not necessary. In addition, Modcount is always modified in the case of Count modification, and count is a volatile variable. Thus, the characteristics of volatile are fully exploited here.
According to the Happens-before rule, article (3): Write operations on volatile fields happens-before the read operation of each subsequent same field. In other words, after a write operation in the volatile field, all actions before the volatile write operation are visible to this operation C. So modifying Modcount is always before the count is modified, that is, if a value of count is read, then the Modcount before count changes can be read, in other words, if you see a change in the count value, Then you must see the change of the Modcount value. And if the above two statements exchange the next order there is no guarantee that this result must exist.
In Concurrenthashmap.containsvalue, you can see that int c = Segments[i].count is executed every time you traverse segments, but the next statement does not use this variable C. Although the JVM still cannot optimize this statement because it is a read operation for a volatile field, it is essential to guarantee the Happens-before order of some column operations. Here you can see:
Concurrenthashmap will be volatile to the extreme!
In addition, the IsEmpty operation is similar to the size operation, no longer described.
Listing 5 the size operation of Concurrenthashmap
public int size () {
Final segment<k,v>[] segments = this.segments;
Long sum = 0;
Long check = 0;
Int[] mc = new Int[segments.length];
Try a few times to get accurate count. On failure due to
Continuous async changes in table, resort to locking.
for (int k = 0; k < Retries_before_lock; ++k) {
Check = 0;
sum = 0;
int mcsum = 0;
for (int i = 0; i < segments.length; ++i) {
sum + = Segments[i].count;
Mcsum + = Mc[i] = Segments[i].modcount;
}
if (mcsum! = 0) {
for (int i = 0; i < segments.length; ++i) {
Check + = Segments[i].count;
if (mc[i]! = Segments[i].modcount) {
check =-1; Force retry
Break
}
}
}
if (check = = SUM)
Break
}
if (check! = SUM) {//Resort to locking all segments
sum = 0;
for (int i = 0; i < segments.length; ++i)
Segments[i].lock ();
for (int i = 0; i < segments.length; ++i)
sum + = Segments[i].count;
for (int i = 0; i < segments.length; ++i)
Segments[i].unlock ();
}
if (Sum > Integer.max_value)
return integer.max_value;
Else
return (int) sum;
}
Concurrentskiplistmap/set
Originally intended to introduce the next concurrentskiplistmap, the results open source A look, completely gave up. The data structures and algorithms inside that I guess the study may not be fully understood for a week. A long time ago, when I looked at the TreeMap, I had a big head, thinking about the complex "red and black two-pronged tree". These are attributed to the former did not study the data structure and algorithms, and now look back to these complex algorithms feel very headache, in order to reduce the death of brain cells, for the time being do not provoke these "gadgets." You can take a look at the introduction to TreeMap in reference 4.
Resources:
- Hash this
- Single-word wang/jenkins Hash in Concurrenthashmap
- Order reordering and Happens-before law
- Through the analysis of the JDK source code research TREEMAP Red black Tree algorithm implementation
In layman's Java Concurrency (18): Concurrent Container Part 3 Concurrentmap (3) [Go]