The evolution of Java's Concurrenthashmap

Source: Internet
Author: User
Tags array length cas modulus rehash visibility volatile concurrentmodificationexception

1. Thread Insecure HashMap

The code in this section is based on the JDK 1.7.0_67
HashMap is non-thread-safe,hashmap thread insecurity is mainly reflected in the dead loop at resize and Fast-fail when using iterators.

1.1 HashMap Working principle 1.1.1 HashMap Addressing method

For newly inserted data or data to be read, HashMap modulo the hash value of the key to the array length, and the result is the index of the entry in the arrays. The cost of modulo is much higher than the cost of a bit operation, so hashmap requires that the length of the array must be 2 of the n-th square. at this point, the hash value of key is 2^n-1 and calculated, and its effect is equivalent to the modulus. HashMap does not require the user to specify the HASHMAP capacity must pass a 2 of the N-square integer, but will be calculated by integer.highestonebit than the specified integer smaller than the largest 2^n value, the implementation method is as follows.

publicstaticinthighestOneBit(int i) {  i |= (i >>  1);  i |= (i >>  2);  i |= (i >>  4);  i |= (i >>  8);  16);  return1);}

Because the distribution of the hash value of key directly determines the distribution of all data on the hash table or determines the likelihood of the hash conflict, so to prevent the hashcode implementation of the bad key (for example, the low level is the same, only the high level is not the same, and the result is the same as the 2^n-1), the JDK The HashMap of 1.7 uses the following method to make the final hash value in the binary form of 1 as evenly distributed as possible to minimize the hash collisions.

int h = hashSeed;h ^= k.hashCode2012);return74);
1.2 Resize Dead Cycle 1.2.1 Transfer method

When the size of the HashMap exceeds Capacity*loadfactor, the HashMap needs to be expanded. The method is to create a new, twice times the length of the original capacity array, to ensure that the new capacity is still 2 of the N-side, so that the above addressing method is still applicable. It is also necessary to re-insert all the original data (rehash) into the new array via the Transfer method (the list is reversed at the time of the transfer).

void Transfer(entry[] newtable,BooleanRehash) {intNewcapacity = newtable.length; for(entry<k,v> e:table) {//Traverse elements in the original array     while(NULL! = e) {//To each node on the linked list: Use next to transfer the next element, transfer E to the head of the new array, insert the node using the head interpolation methodEntry<k,v> next = e.Next;if(rehash) {e.Hash=NULL= = E.Key?0:Hash(E.Key); }inti =indexfor(E.Hash, newcapacity);//Recalculate hashE.Next= Newtable[i];      Newtable[i] = e;    e = next; }  }}

This method does not guarantee thread safety, and there may be a dead loop (example slightly) when multithreaded concurrent calls occur.

1.3 fast-fail1.3.1 Cause

In the process of using iterators, if HashMap is modified, then Concurrentmodificationexception will be thrown, that is, fast-fail policy.

When HashMap's iterator () method is called, a new Entryiterator object is constructed and returned. The Expectedmodcount of Entryiterator is set to HashMap modcount (the variable records the number of times the HASHMAP has been modified).

When the next entry is accessed through the next method of the iterator, it first checks whether the modcount of Expectedmodcount and HashMap are equal, and if not equal, the HashMap is modified, Directly throws Concurrentmodificationexception. The Remove method of the iterator will also perform a similar check. the exception is thrown to alert the user to a thread-safety problem early on.

1.3.2 Thread Safety Solutions

Single-threaded condition, in order to avoid the occurrence of concurrentmodificationexception,

    1. It is necessary to make sure that the data is modified only through HashMap itself or only through iterator, and cannot be modified using HashMap's own method before the end of iterator use.
    2. Because when data is deleted through iterator, the expectedmodcount of HashMap Modcount and iterator are self-increasing without affecting the equality of both.
    3. If the data is added, it can only be done by HashMap itself, and if you want to continue traversing the data, you need to call the iterator () method again to reconstruct a new iterator. The expectedmodcount of the new iterator is equal to the modcount of the updated hashmap.

In multithreaded conditions, you can use the Collections.synchronizedmap method to construct a synchronous map, or to use thread-safe concurrenthashmap directly.

2. Java 7 Concurrenthashmap based on segmented locks

Note: The code in this section is based on the JDK 1.7.0_67

2.1 Data Structures

The underlying data structures of Concurrenthashmap in Java 7 are still arrays and linked lists. Unlike HashMap, the outermost layer of concurrenthashmap is not a large array, but an array of segment. Each segment contains an array of linked lists that are similar to the HASHMAP data structure. The overall data structure is as shown.

2.2 Addressing Mode

When reading or writing a key, take the hash value of the key first. The high n bits of the hash value are modeled on the number of segment to get to which segment the key belongs, and then the segment is manipulated as if it were an operation HashMap. To ensure that different values are evenly distributed across different segment, the hash value needs to be computed by the following method.

private  int   hash  (Object k) {int  h = hashseed; if  ((0 ! = h) && (k instanceof  String) {return< /span> Sun. misc . hashing . stringhash32   ((String) k);  } h ^= k.hashcode  ();  H + = (h << 15 ) ^ 0xffffcd7d ;  H ^= (H >>> 10 );  H + = (h << 3 );  H ^= (H >>> 6 );  H + = (h << 2 ) + (h << 14 ); return  H ^ (H >>> 16 );}  

Also in order to improve the efficiency of the modulus operation, by the following calculation, * * Ssize is greater than concurrencylevel of the smallest 2 of the N-square, while Segmentmask is 2^n-1. This is consistent with the method of calculating the length of the array above. For a hash of a key, it is only necessary to move the segmentshift bit to the right to take the high sshift bit, and then segmentmask and manipulate it to get its index on the segment array. **

int0;int1;while (ssize < concurrencyLevel) {  ++sshift;  1;}this.segmentShift32 - sshift;this.segmentMask1;Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
2.3 Synchronization Mode

Segment inherits from Reentrantlock, so we can easily lock each segment.

For read operations, you need to ensure visibility when you get the segment of the key. you can use the volatile keyword on a specific implementation, or you can use a lock. However, the use of lock overhead is too high, and each write operation with volatile will invalidate all CPU caches and have some overhead. Concurrenthashmap uses the following methods to ensure visibility, get the latest segment (1.8 changed again, changed to volatile).

Segment<K,V>= (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)  

A similar approach was used when getting hashentry in segment

HashEntry<K,V>= (HashEntry<K,V>UNSAFE.getObjectVolatile  (tab, ((long)(((tab.length-1&<<+ TBASE)

For a write operation, it is not required to acquire all segment locks at the same time, because that is the equivalent of locking the entire map. It will first acquire the lock of the Key-value to the segment in which it resides. At the same time, because other segment locks are not acquired, it is theoretically possible to support concurrencylevel (equal to the number of segment) thread-safe concurrent reads and writes.

When a lock is acquired, lock is not used directly, because it hangs when the method acquires the lock (refer to a reentrant lock). in fact, it uses a spin lock, and if Trylock acquires a lock failure, it means that the lock is occupied by another thread, and the lock is requested again in Trylock way through the loop. Resets the retry number of times if the linked header corresponding to the key is modified during the loop. If the number of retry exceeds a certain value, use the Lock method to request a lock. (JDK8 is through CAS, there are three ways to get locks: synchronized/volatile/cas)

The spin lock is used here because the spin lock is more efficient, but it consumes more CPU resources and therefore switches to a mutex when the spin count exceeds the threshold value.

2.4 Size operation

The put, remove, and get operations only need to care about one segment, while the size operation needs to traverse all segment to calculate the size of the entire map. A simple solution is to lock all the sgment, and then unlock it after the calculation is done. However, when doing the size operation, not only can not write to the map, but also can not read operation, not conducive to the parallel operation of the map.

To better support concurrent operations, Concurrenthashmap calculates the size 3 times per segment without locking, If the number of updates for all segment obtained by an adjacent two-time calculation (each segment is the same as HashMap by Modcount tracking its own number of modifications, segment each time its Modcount plus one) is equal, indicating that there is no update operation during the two calculations, The total size calculated for these two times is equal and can be returned directly as the final result. If the map has an update during the three calculations, the size is recalculated for all segment locking.

2.5 differences

Concurrenthashmap compared with HashMap, there are the following differences.

    1. Concurrenthashmap thread-safe, while HashMap is non-thread safe.
    2. HashMap allows key and value to be null, while Concurrenthashmap is not allowed.
    3. HashMap does not allow iterator traversal through HashMap while Concurrenthashmap allows the behavior, and the update is visible to subsequent traversal.
3. Java 8 CAS-based Concurrenthashmap

Note: The code in this section is based on the JDK 1.8.0_111

3.1 Data structures

Java 7 for the implementation of parallel access, the introduction of the structure of segment, the realization of a segmented lock, the theoretical maximum concurrency and the number of segment equal. Java 8 is to further improve concurrency, discarding the scheme of segmented locks, instead of using a large array directly. In order to improve addressing performance under hash collisions, Java 8 converts the linked list (address time complexity O (n)) to a red-black tree when the list length exceeds a certain threshold (8) (Addressing time complexity is O (long (n))). the data structure is as shown

3.2 Addressing Mode

The Concurrenthashmap of Java 8 also determines the index of the key in the array by using the hash value of the key and the array length modulo. also to avoid the hashcode design of a less-than-good key, it calculates the final hash value of the key by the following method. The difference is that the Concurrenthashmap author of Java 8 believes that the introduction of red-black trees, even if the hash conflict is more serious, the addressing efficiency is high enough, so the author does not do the calculation of the hash value too much design, Just make the hashcode value of the key different from its high 16 bit, and ensure that the highest bit is 0 (thus guaranteeing that the final result is a positive integer).

staticfinalintspread(int h) {    return16)) & HASH_BITS;}
3.3 Synchronization Mode

for a put operation, if the array element corresponding to key is NULL, it is set to the current value through the CAS operation. If the array element that the key corresponds to (that is, the list header or the root element of the tree) is not null, the element is requested for the lock using the Synchronized keyword, and then the action is made. If the put operation causes the current list length to exceed a certain threshold, the linked list is converted to a tree, which improves addressing efficiency.

for read operations, because the array is decorated with the volatile keyword, there is no need to worry about the visibility of the array. at the same time each element is a node instance (each element in Java 7 is a hashentry), its key value and hash value are final decorated, immutable, without concern for the visibility of their modified. and its value and the reference to the next element are modified by the volatile, visibility is also guaranteed.

staticclassimplements Map.Entry<K,V> {  finalint hash;  final K key;  volatile V val;  volatile Node<K,V> next;}

The visibility of the array elements corresponding to the key is guaranteed by the unsafe Getobjectvolatile method.

staticfinaltabAtint i) {  return (Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE);}
3.4 Size operation

Both the Put method and the Remove method maintain the size of the map through the Addcount method. The size method obtains the size of the map maintained by the Addcount method through Sumcount.

4. Summary
    1. Thinking about the similarities and differences between JDK7 and JDK8 in reading and writing operations

Reference:
http://www.jasongj.com/java/concurrenthashmap/

The evolution of Java's Concurrenthashmap

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.