I. BACKGROUND
A container is a highly used component in Java programming, but the basic containers (ARRAYLIST,HASHMAP, etc.) provided by default in Java are not thread-safe. When the container and multithreaded concurrent programming meet, where should programmers go?
There are usually two options:
1, using the Synchronized keyword, will be the operation of the container staggered, to ensure that the same time there is only one operation of the same container. Vector,hashtable and other packaging after the nature of the container is also the solution, but the Synchronized keyword does not need us to write it.
2. Use the concurrent containers provided under the Java.util.concurrent package. such as the common Concurrenthashmap, copyonwritearraylist and so on.
The advantage of the first choice is to get started fast, simple and straightforward, easy to debug, if not consider the performance, there is almost no use of the scene restrictions, can guarantee the strong consistency of data operation, then its shortcomings are very obvious, because each time the operation of the container locked the entire container, if the container is high concurrency operation, will result in a sharp decrease in operational performance.
The advantage of the second option is that the concurrent containers under the concurrent package are typically highly optimized for performance, guaranteeing high concurrency scenarios, but the disadvantage is that the implementation of these containers is relatively complex, and the use of the scene has a certain limit, generally only guarantee the weak consistency of data operations.
This article will focus on the typical design ideas and implementation principles behind the concurrent containers, and after the readers understand these ideas, they can also better understand the limitations of the usage scenarios of concurrent containers.
Second, the design concept of Concurrenthashmap
On the implementation of the principle of concurrenthashmap, before JDK1.8 and JDK1.8 have different implementations, about their specific implementation details on the internet has a lot of excellent articles to introduce, such as:
1, "JDK1.7 concurrenthashmap Principle Analysis"
2, "JDK1.8 concurrenthashmap Principle Analysis"
3, "Concurrenthashmap in the JDK1.7 and JDK1.8 in the contrast"
We are not here to repeat.
This paper focuses on the principle of high concurrency of Concurrenthashmap in JDK1.8 by using simple and understandable language to guide readers quickly.
2.1 General HashMap Realization Principle Review
First, we briefly review the implementation principle of the common hashmap.
As shown, we abstract each entry stored in the map into a node. Node maps to a slot on a table (a node array) to store it, based on the hash of its key value. If there is a hash conflict (that is, the two node's key value hash result is the same), then in the form of a linked table on the conflicting table slot continues to append node. If one slot stores too many node (more than 8) in the form of a linked list, the linked list is converted to a red-black tree store, avoiding the traversal of long lists when querying node to reduce the time complexity of querying node. When the total number of node in the map is greater than the table length multiplied by the load factor factor (default 0.75), map multiplies the table to reduce the probability of hash collisions.
2.2 Concurrenthashmap concurrency Optimization idea one: Minimizing the range of locks (lock segments)
The traditional hashtable is poor in concurrency because the lock is too large to update any data to lock the whole map.
In fact, the realization of the principle of hashmap is not difficult to see, hashmap itself naturally shows a clear boundary of the segmented storage characteristics, that is, each table in a slot, can be considered a storage section. Then, if we make the precision of the lock to each storage section, we can update each data, only the local data segment associated with the data is locked. The head node of each storage segment can be used as a lock object.
The core source code in JDK1.8 is as follows:
node<k,v>// Remove the head node in the tab specified slot synchronized/ /// ... ...}
If there is no head node in a slot (that is, the head node is null), we cannot lock null at this point, and how can we circumvent the concurrency conflicts that the slot might encounter when it first inserts node?
You can use CAS (Compare and Swap (Set)) for the first insertion of node. The core principle of CAS is to check whether the value of the data is still the old value obtained before updating it, and if it is not yet modified by another thread, it can be modified directly to the new value, otherwise the setting fails if the value has been modified by another thread. Check that the old values are modified and set a new value this two-step operation is done directly by the CPU-provided single instruction, guaranteeing atomicity.
The use of CAS technology plus the failure of CAs to continuously retry, can be implemented without locking update data. After all, the probability of a CAS failure is very low, and retries do not consume too much CPU. (Optimistic lock and spin lock concept)
The core source code in JDK1.8 is as follows:
for (Node<k, v>[] tab = table;;) { ifnull) { ifnull, New Null) ) Break//CAs failed to jump out of the loop, start the next loop, reread the head node }}
2.3 Concurrenthaspmap concurrency Optimization idea two: Only update lock, read no lock (weak consistency)
CONCURRENTHASHMAP read operations are not locked. It is guaranteed that reading the value of a specified key can read the results of the most recent update completion. More standard is that the last update to the Keya results Happens-before subsequent read operations to the Keya.
Note: Happens-before is a partial-order relationship that the JVM uses to define between two actions (Acitona and ACTIONB), thus making it clear that, in the case of the CPU allowing reordering, the result of Actiona must be visible to subsequent actionb.
Because the read operation is not locked, the read operation may overlap with other threads ' writes, and Concurrenthashmap may read in the middle state of the other thread's write operations. For example, if the Putall has concurrent get operations during execution, then the get operation may read only some of the inserted data, while the resulting return of the concurrent size operation is inaccurate, and can only be used for estimating class operations and not for precise control of process judgments. For example, when iterating through a map using an iterator, another thread is deleting the map, and the data that happens to be deleted during the read is read, and the data that has been deleted is not read (Concurrentmodificationexception is not thrown).
Three, copyonwritearraylist design idea 3.1 copyonwritearraylist concurrency optimization thinking: Write-time replication and weak consistency
The so-called copy-on-write, that is, any operation to change the copyonwritearraylist (add, set, etc.), its internal implementation is to deep copy a copy of the copyonwritearraylist of the underlying array, and then on the deep copy of the copy of the data modification. After the modification is complete, replace the original copyonwritearraylist with the new copy with the underlying array.
The core code in JDK1.8 is as follows:
Public BooleanAdd (e e) {FinalReentrantlock lock = This. Lock; Lock.lock (); Try{object[] elements=GetArray (); intLen =elements.length; Object[] Newelements= arrays.copyof (elements, Len + 1);//deep copy of the underlying arrayNewelements[len] = e;//make modifications on the replicaSetArray (newelements);//replace the underlying array with a copy after the modification is complete return true; } finally{lock.unlock (); }}
The benefit of copy-on-write is that any read operation is not locked and guaranteed to read the full snapshot data of the list at that moment. For example, when the Copyonwritearraylist iterator is created, regardless of how the list itself changes, the iterator can perceive the state of the list at the moment it was created, and any other thread's changes to the list are not visible to this iterator. An iterator that does not appear concurrenthashmap may read into the middle state of the container during the modification of another thread. Because the Copyonwritearraylist read operation is unable to perceive the latest changing data, copyonwritearraylist is also weakly consistent.
Copyonwritearraylist can guarantee that the read operation can read the results of the last update completion.
Copy-on-write technology because each modification requires a full copy of the underlying array, there is an additional performance overhead, but it is especially useful for reading and writing less data access scenarios.
Iv. Summary
1, Concurrenthashmap and copyonwritearraylist are non-locking reads, so the read operation does not ensure that all other threads are currently written, not for scenarios that require strong data consistency.
2, Concurrenthashmap and copyonwritearraylist can ensure that the read can be aware of the completed write operation.
3. CONCURRENTHASHMAP read operations may perceive the intermediate state of the container write operation to other threads at the same time. Copyonwritearraylist will always only read the snapshot state of the container at the time of reading.
4, Concurrenthashmap use lock segmentation technology, reduce the range of locks, increase the amount of write concurrency. Copyonwritearraylist uses write-time replication technology to ensure that data is written concurrently without interfering with read operations that have been turned on.
5. Concurrenthashmap is suitable for scenarios where there is no strong consistency requirement for data access under high concurrency. The copyonwritearraylist is suitable for scenarios where high concurrency can tolerate only reading to historical snapshot data and read-write-less.
Java Advanced Knowledge Point 6: Design concepts behind concurrent containers-lock segmentation, write-time replication, and weak consistency