In-depth analysis of concurrenthashmap

Source: Internet
Author: User

Term definitions
Terminology English explain
Hashing algorithm Hash algorithm is an encryption that transforms an input of arbitrary content into an output of the same length, whose output is called a hash value.
Hash table Hash table A set of keywords is mapped to a limited address range based on the Set hash function h (key) and the handling conflict method, and the key is stored in the table as a storage location in the address range, which is called a hash table or hash, and the resulting storage location is called a hash address or hash address.

Thread Insecure HashMap

Because a put operation using hashmap causes a dead loop in a multithreaded environment, resulting in a CPU utilization of nearly 100%, you cannot use HashMap in a concurrency scenario, such as the following code

Finalhashmap<string, string> map =NewHashmap<string, string> (2); Thread T=NewThread (NewRunnable () {@Override Public voidrun () { for(inti = 0; I < 10000; i++) {            NewThread (NewRunnable () {@Override Public voidrun () {Map.put (Uuid.randomuuid (). toString (),""); }            }, "FTF" +i). Start (); }    }}, "FTF"); T.start (); T.join ();
Inefficient Hashtable container

Hashtable containers use synchronized to ensure thread safety, but Hashtable is inefficient in the event of intense online competition. Because when a thread accesses the Hashtable synchronization method, other threads may enter the blocking or polling state when they access the Hashtable synchronization method. If thread 1 adds elements using put, thread 2 cannot add elements using the Put method, nor can it use the Get method to get elements, so the more competitive the more efficient the less.

Lock Segment Technology

The reason why Hashtable containers are inefficient in a competitive concurrency environment is that all threads accessing Hashtable must compete for the same lock, if there are multiple locks in the container, and each lock is used to lock a portion of the container's data, then when multithreading accesses data from different data segments in the container, There is no lock competition between the threads, which can effectively improve the efficiency of concurrent access, which is the lock segmentation technique used by Concurrenthashmap, first dividing the data into a section of storage, then giving each piece of data a lock, when a thread occupies a lock to access one of the data, Data from other segments can also be accessed by other threads.

Structure of the Concurrenthashmap

We analyze the structure of Concurrenthashmap by Concurrenthashmap class diagrams.

Concurrenthashmap is composed of the segment array structure and the HASHENTRY array structure. Segment is a re-entry lock Reentrantlock that plays the role of lock in Concurrenthashmap, and Hashentry is used to store key-value pairs of data. A concurrenthashmap contains a segment array, the structure of segment is similar to HashMap, is an array and a list structure, a segment contains a hashentry array, Each hashentry is an element of a linked list structure, and each segment guardian is an element of the Hashentry array, and when the data for the Hashentry array is modified, it must first obtain its corresponding segment lock.

Initialization of the Concurrenthashmap

The Concurrenthashmap initialization method is initialized by Initialcapacity,loadfactor, concurrencylevel several parameters to initialize the segments array, the segment offset Segmentshift, The segment mask Segmentmask and the Hashentry array in each segment.

Initializes an array of segments. Let's take a look at the source code that initializes the Segmentshift,segmentmask and segments arrays.

 if  (concurrencylevel > max_segments) Concurrencylevel  = max_segments;  //  Find power-of-two sizes best matching arguments  int  sshift = 0;  int  ssize = 1;  while     (ssize < Concurrencylevel) { ++sshift; Ssize  <<= 1;} Segmentshift  =- Sshift;segmentmask  = Ssize-1; this . Segments = Segment.newarray (ssize); 

The above code shows that the length of the segments array Ssize calculated by Concurrencylevel. In order to locate the index of the segments array by a bitwise-and-hash algorithm, it is necessary to ensure that the length of the segments array is 2 N-th (power-of-two size). So it must be calculated that a minimum of 2 is greater than or equal to the concurrencylevel of the nth square value to be the length of the segments array. If Concurrencylevel equals 14, 15 or 16,ssize will equal 16, that is, the number of locks in the container is also 16. Note that the maximum size of the concurrencylevel is 65535, meaning that the segments array has a maximum length of 65536, and the corresponding binary is 16 bits.

Initialize Segmentshift and Segmentmask. These two global variables need to be used in the hashing algorithm when locating segment, Sshift equals ssize shift from 1 to the left, Concurrencylevel equals 16 By default, 1 shifts 4 times to the left, so Sshift equals 4. Segmentshift used to locate the number of bits involved in the hash operation, Segmentshift equals 32 minus Sshift, so equals 28, where 32 is because the Concurrenthashmap hash () method output the maximum number is 32 bits, We can see this in the subsequent tests. Segmentmask is the mask of the hash operation, which equals ssize minus 1, or 15, and the value of each bit of the mask's binary is 1. Because the maximum length of the ssize is 65536, the Segmentshift maximum is 16,segmentmask maximum is 65535, the corresponding binary is 16 bits, and each bit is 1.

Initializes each of the segment. The input parameter initialcapacity is the initial capacity of the Concurrenthashmap, Loadfactor is the load factor for each segment, and in the construction method it is necessary to initialize each segment in the array with these two parameters.

if(Initialcapacity >maximum_capacity) initialcapacity=maximum_capacity;intc = initialcapacity/ssize;if(c * Ssize <initialcapacity)++C;intCap = 1; while(Cap <c) Cap<<= 1; for(inti = 0; I < This. segments.length; ++i) This. segments[i] =NewSegment<k,v> (Cap, loadfactor);

The variable cap in the above code is the length of the Hashentry array in the segment, which is equal to initialcapacity divided by the multiple C of ssize, and if C is greater than 1, it takes a value greater than or equal to 2 of the nth square of C, so the cap is not 1, which is 2 of the n-th square. Segment capacity threshold= (int) cap*loadfactor, by default initialcapacity equals 16,loadfactor equals 0.75, by operation the cap equals 1,threshold equals zero.

Positioning segment

Now that Concurrenthashmap uses segmented lock segment to protect data from different segments, you must first locate the segment through the hashing algorithm when inserting and retrieving the elements. You can see that Concurrenthashmap will first use the Wang/jenkins hash variant algorithm to hash the hashcode of the element once.

Private Static int hash (int  h) {        + = (h << +) ^0xffffcd7d        ; ^= (H >>>);         + = (h << 3)        ; ^= (H >>> 6);         + = (h << 2) + (h << +)        ; return H ^ (h >>>);    }

The purpose of the re-hashing is to reduce the hash collisions, so that the elements can be distributed evenly on different segment, thus improving the container's access efficiency. If the quality of the hash is poor to the pole, then all elements are in a segment, not only the access element is slow, the segmented lock loses its meaning. I made a test that hashes are not executed directly by hashing them.

System.out.println (Integer.parseint ("0001111", 2) &); System.out.println (Integer.parseint ("0011111", 2) &); System.out.println (Integer.parseint ("0111111", 2) &); System.out.println (Integer.parseint ("1111111", 2) & 15);

The hash of the output after the calculation is all 15, this example can be found that if no re-hashing, the hash conflict is very serious, because as long as the low, no matter what the high number, its hash is always the same. We re-hash the above binary data after the result is as follows, in order to facilitate reading, less than 32 bits of the high 0, every four bits with a vertical line split.

0100|0111|0110|0111|1101|1010|0100|11101111|0111|0100|0011|0000|0001|1011|10000111|0111|0110|1001|0100|0110|0011|11101000 |0011|0000|0000|1100|1000|0001|1010

It can be found that each bit of data is hashed out, which allows each bit of the number to participate in the hash operation, thus reducing the hash conflict. The Concurrenthashmap locates the segment with the following hashing algorithm.

Final Segment<k,v> segmentfor (int  hash) {        return segments[(hash >>> Segmentshift) & Segmentmask];    }

By default, Segmentshift is 28,segmentmask to 15, and then the maximum number of hashes is 32-bit binary data, the right unsigned 28-bit move, meaning to let high 4 bits participate in the hash operation, (hash >>> Segmentshift) & Segmentmask The results are 4,15,7 and 8 respectively, you can see that the hash value does not conflict.

Get operations for Concurrenthashmap

The get operation of the segment is very simple and efficient. The hash is then used to navigate to the segment using the hash, and then the hash algorithm locates the element with the following code:

 Public V get (Object key) {    int hash = Hash (key.hashcode ());     return segmentfor (hash). Get (key, hash);}

The efficiency of the get operation is that the entire get process does not need to be locked, unless the read value is empty to lock the accent, we know that the Hashtable container's Get method needs to be locked, then the Concurrenthashmap get operation is how to do not lock it? The reason is that the shared variables that will be used in its Get method are defined as volatile, such as the Count field used to count the current segement size and the value of the hashentry used to store the values. Variables defined as volatile, able to maintain visibility between threads, can be read at the same time, and guaranteed not to read expired values, but can only be single-threaded write (there is a case can be multi-threaded writing, that is, the value written is not dependent on the original value), In the get operation only need to read not to write the shared variable count and value, so you can not lock. The reason for not reading expired values is based on the happen before principle of the Java memory model, where write operations on volatile fields precede read operations, even if two threads modify and get the volatile variable at the same time, the get operation can get the most recent value. This is a classic scenario for replacing locks with volatile.

transient volatile int count; volatile V value;

In the code that locates the element, we can find that the hashing algorithm of locating hashentry and locating segment, although same as the length of the array minus one phase, but not the same as the value of the element, the positioning segment uses the hashcode of the elements to get the high value of the hash, The location Hashentry directly uses the value after the hash. The goal is to avoid the same value after two hashes, causing the element to hash out in the segment, but not in the hashentry.

Hash >>> segmentshift) & Segmentmask// positioning segment using the hash algorithm int index = hash & (Tab.length-1); // the hash algorithm used to locate the Hashentry
Put operation for Concurrenthashmap

Because of the need to write to shared variables in the Put method, for thread safety, it is necessary to lock the shared variable when it is being manipulated. The Put method first navigates to the segment and then inserts in the segment. The insert operation takes two steps, the first step is to determine if you need to expand the hashentry array in segment, and the second step is to position the element and place it in the hashentry array.

Whether or not you need to enlarge. Before inserting an element, it is determined whether the Hashentry array in segment exceeds the capacity (threshold), and if the threshold is exceeded, the array is expanded. It is worth mentioning that segment's expansion judgment is more appropriate than HashMap, because HashMap is after inserting elements to determine whether the element has reached the capacity, if it arrives to expand, but it is likely that there is no new element expansion after the insertion, then hashmap an invalid expansion.

How to enlarge. When you expand, you first create an array that is twice the size of the original, and then hash the elements in the original array into the new array. For efficient concurrenthashmap, the entire container is not scaled up, and only one segment is scaled.

The size operation of the Concurrenthashmap

If we want to count the size of the elements in the entire concurrenthashmap, we must count the size of all the elements in the segment and sum them. Segment the global variable count is a volatile variable, then in the multi-threaded scenario, do we simply add all the segment count to get the entire concurrenthashmap size? No, although you can get the latest value of the count of each segment when it is added, the count will not be counted until it has changed before it can accumulate. So the safest thing to do is to lock all segment's Put,remove and clean methods when counting the size, but this is obviously very inefficient. Because in the cumulative count operation, the probability of the previous cumulative count change is very small, so the Concurrenthashmap practice is to first try 2 times by not locking segment way to count the size of each segment, if the process of statistics, When the container's count changes, the lock is used to count the size of all segment.

So how does concurrenthashmap determine if the container has changed at the time of the statistic? With the Modcount variable, the variable modcount is added 1 before the put, remove, and clean methods are manipulated, so the size of the container is changed before and after the statistic size is modcount.

In-depth analysis of concurrenthashmap

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.