Two kinds of common concurrent containers in Java multithreading programming

Two kinds of common concurrent containers in Java multithreading programming _java

Last Update:2017-01-19 Source: Internet

Author: User

Tags rehash volatile

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Concurrenthashmap Concurrent Container
Concurrenthashmap can read data without locking, and its internal structure allows it in the write operation can keep the size of the lock as small as possible, do not have to lock the entire concurrenthashmap.

The internal structure of the Concurrenthashmap

Concurrenthashmap in order to improve its concurrency capability, in the internal use of a structure called segment, a segment is actually a class hash table structure, segment internal maintenance of a list of linked lists, Let's look at the internal structure of the Concurrenthashmap with the following picture:

From the structure above, we can see that the process of locating an element requires two hash operations, the first hash to the segment, and the second hash to the head of the linked list where the element is located, so the Concurrenthashmap The side effect of this kind of structure is that the process of the hash is longer than the ordinary hashmap, but the benefit is that the writing operation can only add the lock to the segment of the element, and will not affect the other segment, so that, in the ideal case, Concurrenthashmap can support a segment number of write operations at the same time (just as these writes are evenly distributed across all segment), so the concurrency capability of Concurrenthashmap can be greatly improved through this structure.

Segment

Let's take a concrete look at the data structure of segment:

Static final class Segment<k,v> extends Reentrantlock implements Serializable {
 transient volatile int count;
   transient int modcount;
 transient int threshold;
 Transient volatile hashentry<k,v>[] table;
 Final float loadfactor;
}

Explain in detail the meaning of the member variables within segment:

Number of elements in count:segment
Modcount: The number of operations that affect the size of the table (for example, put or remove operations)
Threshold: Threshold, segment the number of elements inside more than this value will still be the expansion of segment
Table: An array of linked lists in which each element in the array represents the head of a linked list
Loadfactor: Load factor, used to determine threshold

Hashentry

The elements in the segment are stored as hashentry in the array of linked lists, and look at the structure of the Hashentry:

Static Final class Hashentry<k,v> {
 final K key;
 final int hash;
 volatile V value;
 Final hashentry<k,v> next;

You can see one of the characteristics of hashentry, in addition to value, several other variables are final, this is to prevent the chain list structure is broken, the occurrence of concurrentmodification.

Initialization of Concurrenthashmap

Here we combine the source code to specifically analyze the implementation of CONCURRENTHASHMAP, first look at the initialization method:

 public concurrenthashmap (int initialcapacity, float loadfactor, int concurrencyle Vel) {if (!) ( Loadfactor > 0) | | Initialcapacity < 0 | |
 
 Concurrencylevel <= 0) throw new IllegalArgumentException ();
 
 if (Concurrencylevel > max_segments) concurrencylevel = max_segments;
 Find Power-of-two sizes best matching arguments int sshift = 0;
 int ssize = 1;
  while (Ssize < concurrencylevel) {++sshift;
 Ssize <<= 1;
 } segmentshift = 32-sshift;
 Segmentmask = ssize-1;
 
 this.segments = Segment.newarray (ssize);
 if (initialcapacity > maximum_capacity) initialcapacity = maximum_capacity;
 int c = initialcapacity/ssize;
 if (c * ssize < initialcapacity) ++c;
 int cap = 1;
 
 while (Cap < c) Cap <<= 1;
for (int i = 0; i < this.segments.length ++i) this.segments[i] = new segment<k,v> (cap, loadfactor); }

Currenthashmap initialization A total of three parameters, a initialcapacity, representing the initial capacity, a loadfactor, representing the load parameters, the last one is Concurrentlevel, Represents the number of segment within the CONCURRENTHASHMAP, Concurrentlevel once specified, cannot be changed, and subsequent increases in the number of CONCURRENTHASHMAP elements cause Conrruenthashmap to require expansion, Concurrenthashmap does not increase the number of segment, but only increases the capacity of the array of linked lists in segment, the advantage of which is that the expansion process does not need to rehash the entire concurrenthashmap, And you just need to do a rehash on the elements inside the segment.

The entire Concurrenthashmap initialization method is very simple, first based on the Concurrentlevel to new segment, where the number of segment is not more than the largest 2 of the index of Concurrentlevel, That is to say, the number of segment is always 2 of the number, the advantage is to facilitate the use of shift operations to hash, speeding up the process of hashing. The next step is to determine the size of the segment capacity based on intialcapacity, and the size of each segment is also 2 of the index, which also makes the hash process faster.

This side needs to pay special attention to two variables, respectively, Segmentshift and Segmentmask, these two variables will play a large role, assuming that the constructor determines the number of segment is 2 of the n-th, then segmentshift equals 32 minus N, And Segmentmask is equal to 2 of n times to minus one.

Concurrenthashmap Get operation

As mentioned earlier, the Concurrenthashmap get operation is not unlocked, let's look at its implementation here:

Public V get (Object key) {
 int hash = hash (Key.hashcode ());
 return Segmentfor (hash). Get (key, hash);

Look at the third line, segmentfor This function is used to determine in which segment the operation should be performed, almost all operations on Concurrenthashmap need to use this function, we look at the implementation of this function:

Final segment<k,v> segmentfor (int hash) {return
 segments[(hash >>> segmentshift) & Segmentmask] ;
}

This function uses a bitwise operation to determine the segment, according to the incoming hash value to the right unsigned segmentshift bit, and then and segmentmask with the operation, combined with the segmentshift and Segmentmask values we said before, We can draw the following conclusions: Assuming that the number of segment is 2 n times, according to the elements of the hash value of the high N-bit can determine the element in exactly which segment.

After determining which segment to work with, the next thing to do is call the corresponding segment get method:

V Get (Object key, int hash) {
 if (count!= 0) {//Read-volatile
  hashentry<k,v> e = GetFirst (hash);
  while (e!= null) {
   if (E.hash = = Hash && key.equals (E.key)) {
    v v = e.value;
    if (v!= null) return
     V;
    Return Readvalueunderlock (e); Recheck
   }
   e = E.next;
  }
 }
 return null;
}

First look at the second line of code, where count is judged, where count represents the number of elements in segment, we can look at the definition of Count:

transient volatile int count;

You can see that count is volatile, and in fact it uses the semantics of volatile:

The write operation on the volatile field is happens-before to each subsequent read operation of the same field.
Because in fact the put, remove and other operations will also update the value of count, so when the competition occurs, the semantics of volatile can ensure that the write operation in the read operation, but also ensure that the subsequent read operations are visible, This enables subsequent operations of the back get to obtain the full element content.

Then, on the third line, call GetFirst () to get the head of the list:

hashentry<k,v> getfirst (int hash) {
 hashentry<k,v>[] tab = table;
 Return Tab[hash & (tab.length-1)];

Similarly, here is also the use of bit operations to determine the head of the linked list, hash value and hashtable length minus one and operation, the final result is the hash value of the low n, where n is hashtable length of 2 as the end result.

After determining the head of the linked list, you can iterate through the entire list, look at line 4th, take out the value of the key corresponding to, if the value of values is NULL, this key,value may be in the process of put, if this happens, The lock is then added to ensure that the value being removed is complete and, if not NULL, returns value directly.

Concurrenthashmap's put operation

After reading the get operation, and then look at the put operation, put the front is also determine the segment process, here no longer repeat, directly to see the key segment put method:

V Put (K key, int hash, V value, Boolean onlyifabsent) {
 lock ();
 try {
  int c = count;
  if (c + + > Threshold)//Ensure capacity
   rehash ();
  hashentry<k,v>[] tab = table;
  int index = hash & (tab.length-1);
  Hashentry<k,v>-Tab[index];
  Hashentry<k,v> e = A;
  while (e!= null && (e.hash!= Hash | |!key.equals (E.KEY)))
   e = E.next;
 
  V OldValue;
  if (e!= null) {
   oldValue = E.value;
   if (!onlyifabsent)
    e.value = value;
  }
  else {
   oldValue = null;
   ++modcount;
   Tab[index] = new hashentry<k,v> (key, hash, a, value);
   Count = C; Write-volatile
  } return
  OldValue
 } finally {
  unlock ();
 }
}

First, the put operation on segment is locked, and then on line fifth, if the number of elements in segment exceeds the threshold (calculated by the Loadfactor in the constructor), this requires a segment expansion and rehash, The process of rehash can be understood by ourselves, not in detail here.

The operation of lines 8th and 9th is the GetFirst process that determines the position of the head of the linked list.

11th Line This while loop here is the element of the same key that is looking for and put in the list, and if found, updates the value of the key directly, and if not, enter 21. Generate a new hashentry and add it to the entire segment header, and then update the value of count.

Remove operation of Concurrenthashmap

The previous part of the remove operation, like the previous get and put operation, is the process of locating the segment and then invoking the segment Remove method:

 V Remove (object key, int hash, object value) {lock ();
  try {int c = count-1;
  hashentry<k,v>[] tab = table;
  int index = hash & (tab.length-1);
  Hashentry<k,v>-Tab[index];
  Hashentry<k,v> e = A;
 
  while (e!= null && (e.hash!= Hash | |!key.equals (E.KEY))) e = E.next;
  V oldValue = null;
   if (e!= null) {v v = e.value;
    if (value = null | | value.equals (v)) {oldValue = v;
    All entries following removed node can stay/into list, but all preceding ones to be//need.
    ++modcount;
    hashentry<k,v> Newfirst = E.next;
             for (hashentry<k,v> p = i p!= e; p = p.next) Newfirst = new Hashentry<k,v> (P.key, P.hash,
    Newfirst, P.value);
    Tab[index] = Newfirst; Count = C;
 Write-volatile}} return oldValue;
 finally {unlock (); }
}

The Remove action is also the location of the element that you want to delete, but the way to delete the element here is not simply to point to the next one of the elements in front of the element to be deleted, and we've already said that next in Hashentry is final, After the assignment is not modified, after positioning to the location of the element to be deleted, the program will be deleted elements before the element to copy all the elements, and then one by one back to the linked list, look at the following picture to understand this process:

Assuming that the original element in the list is as shown in the previous illustration, now that you want to delete element 3, the list after deleting element 3 is shown in the following illustration:

Copyonwritearraylist Concurrent Container
copy-on-write abbreviation Cow, is a kind of optimization strategy used in programming. The basic idea is that from the beginning, everyone is sharing the same content, when someone wants to modify this content, it will really content copy out to form a new content and then change, this is a delay lazy strategy. The Java concurrency package, starting with JDK1.5, provides two concurrent containers implemented using the Copyonwrite mechanism, which are copyonwritearraylist and Copyonwritearrayset. The Copyonwrite container is useful and can be used in a very many concurrent scenarios.

What is a copyonwrite container

The Copyonwrite container is the container that is copied when it is written. The popular understanding is that when we add elements to a container, we do not add them directly to the current container, but we copy the current container, copy the new container, add the element to the new container, add the element, and then point the original container's reference to the new container. The advantage of this is that we can read the Copyonwrite container concurrently, without requiring a lock, because the current container will not add any elements. So the Copyonwrite container is also a read-write separation of ideas, reading and writing different containers.

The realization principle of copyonwritearraylist

Before using copyonwritearraylist, we read its source code to see how it is implemented. The following code is the implementation of the Add method to the Copyonwritearraylist (add elements to the copyonwritearraylist), you can find that in addition to the need to lock, otherwise multithreading will copy out of the n copy out.

/**
  * Appends the specified element to the "end of" this list.
  *
  * @param e element to is appended to this list
  * @return <tt>true</tt> (as specified by {@link Colle Ction#add})
  *
 /public boolean add (E e) {
 final reentrantlock lock = This.lock;
 Lock.lock ();
 try {
  object[] elements = GetArray ();
  int len = elements.length;
  object[] newelements = arrays.copyof (elements, Len + 1);
  Newelements[len] = e;
  SetArray (newelements);
  return true;
 } finally {
  lock.unlock ();
 }
 }

Read the time does not need to lock, if there are more than one thread to read copyonwritearraylist to add data, read or read the old data, because the writing will not lock the old copyonwritearraylist.

Public E get (int index) {return get
 (GetArray (), index);
}

The JDK does not provide copyonwritemap, we can refer to Copyonwritearraylist to implement one, the basic code is as follows:

Import java.util.Collection;
Import Java.util.Map;
Import Java.util.Set;
 
public class Copyonwritemap<k, v> implements Map<k, V>, cloneable {
 private volatile map<k, v> Inter Nalmap;
 
 Public Copyonwritemap () {
  internalmap = new hashmap<k, v> ();
 }
 
 Public V-Put (K key, V value) {
 
  synchronized (this) {
   map<k, v> newmap = new hashmap<k, v> (internalmap );
   V val = newmap.put (key, value);
   Internalmap = Newmap;
   return val;
  }
 }
 
 Public V get (Object key) {return
  internalmap.get (key);
 }
 
 public void Putall (map<. Extends K,? extends V> NewData) {synchronized
  (this) {
   map<k, v> newmap = New Hashmap<k, v> (internalmap);
   Newmap.putall (NewData);
   Internalmap = Newmap;
  }
 }

Implementation is very simple, as long as we understand the copyonwrite mechanism, we can implement a variety of copyonwrite containers, and in different scenarios to use.

Copyonwrite's application Scenario

Copyonwrite concurrent containers are used to read and write less concurrent scenes. For example, whitelist, blacklist, product class purpose to access and update the scene, if we have a search site, users in this Site search box, enter keyword search content, but some keywords are not allowed to be searched. These keywords that cannot be searched are placed in a blacklist, and the Blacklist is updated every night. When the user searches, it checks that the current keyword is not in the blacklist, and if so, the prompt cannot be searched. The implementation code is as follows:

Package Com.ifeve.book;
 
Import Java.util.Map;
 
Import Com.ifeve.book.forkjoin.CopyOnWriteMap;
 
/**
 * Blacklist service
 *
 * @author Fangtengfei * */Public
class Blacklistserviceimpl {
 
 private Static copyonwritemap<string, boolean> Blacklistmap = new copyonwritemap<string, boolean> (
   1000);
 
 public static Boolean isblacklist (String ID) {return
  blacklistmap.get (id) = null false:true;
 }
 
 public static void Addblacklist (String id) {
  blacklistmap.put (ID, boolean.true);
 }
 
 /**
  * Bulk blacklist
  * *
  @param ids
  /public
 static void Addblacklist (map<string,boolean> IDS) {
  blacklistmap.putall (IDs);
 }
 
}

The code is simple, but there are two things you need to be aware of using Copyonwritemap:

1. Reduce the cost of expansion. According to the actual need, initialize the size of the copyonwritemap to avoid the overhead of copyonwritemap expansion during writing.

2. Use bulk Add. Because each time it is added, the container replicates every time, so reducing the number of additions can reduce the number of times the container is replicated. such as using the Addblacklist method in the above code.

The disadvantage of Copyonwrite

Copyonwrite containers have many advantages, but at the same time there are two problems, that is, memory footprint and data consistency issues. So you need to pay attention when developing.

Memory footprint issue. Because of Copyonwrite's write-time replication mechanism, so in the process of writing, memory will be at the same time the memory of two objects, the old object and the newly written object (note: In the copy is only a copy of the container reference, only when writing will create a new object to add to the new container, The objects in the old container are still in use, so there are two objects of memory. If these objects occupy a large amount of memory, say about 200M, then write 100M data into the memory will occupy 300M, then it is likely to cause frequent Yong GC and full GC. Before we used a service in our system. The copyonwrite mechanism is used every night to update large objects, resulting in a full GC of 15 seconds per night, with the application response time becoming longer.

For memory footprint problems, you can reduce the memory consumption of large objects by compressing the elements in the container, for example, if the element is full of 10 digits, consider compressing it into a 36-or 64-system. Or, instead of using the Copyonwrite container, use other concurrent containers, such as Concurrenthashmap.

Data consistency issues. The Copyonwrite container can only guarantee the final consistency of the data and cannot guarantee the real-time consistency of the data. So if you want to write the data, you can read it right away, please do not use the Copyonwrite container.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More