In the context of concurrency, the non-blocking algorithm is an algorithm that allows threads to access the shared state when blocking other threads. In most projects, if the suspension of a thread does not cause the suspension of other threads in the algorithm, we will say that this algorithm is non-blocking.
To better understand the differences between the blocking algorithm and the non-blocking algorithm, I will first explain the blocking algorithm and then explain the non-blocking algorithm.
Blocking concurrency algorithm
A blocking concurrency algorithm is generally divided into the following two steps:
The operation that executes the thread request
The thread is blocked until the operation can be safely executed.
Many algorithms and concurrent data structures are blocked. For example, different implementations of java. util. concurrent. BlockingQueue block the data structure. If a thread inserts an element into a blocking queue, there is not enough space in the queue. The thread executing the insert operation will be blocked until there is a space in the queue that can store the inserted element.
The following figure shows how a blocking algorithm ensures a shared data structure:
Non-blocking concurrent algorithm
A non-blocking concurrency algorithm generally includes the following two steps:
Java also contains several non-blocking data structures.AtomicBoolean
,AtomicInteger
,AtomicLong
,AtomicReference
Examples of non-blocking data structures.
The following figure shows how a non-blocking algorithm ensures a shared data structure:
Non-blocking Algorithms vs blocking algorithms
The main difference between a blocking algorithm and a non-blocking algorithm lies in the second step of their behavior described in the above two parts. In other words, the difference between them lies in how the blocking algorithm and non-blocking algorithm do when the request operation cannot be executed.
The blocking algorithm blocks the thread to know that the request operation can be executed. The non-blocking algorithm notifies the request thread that the operation cannot be executed and returns the result.
A thread using the blocking algorithm may be congested until it is possible to process the request. Generally, the actions of other threads make it possible for the first thread to execute the request. If the thread is blocked somewhere in the program for some reason, the request action of the first thread cannot be executed, the first thread will be blocked-it may be blocked until other threads finish the necessary action.
For example, if a thread inserts an element into a full blocking queue, the thread will block until other threads remove some elements from the blocking queue. For some reason, if the thread that gets elements from the blocking queue is blocked somewhere in the program, the thread that tries to add new elements to the blocking queue will be blocked, either keep blocking, or the thread that knows to fetch elements from the blocking queue finally removes an element from the blocking queue.
Non-blocking concurrent data structure
In a multi-threaded system, threads usually communicate with each other through some data structures. For example, it can be any data structure, from variable to more advanced Russian data structure (queue, stack, etc ). To ensure correctness, when concurrent threads access these data structures, these data structures must be ensured by some concurrent algorithms. These concurrent algorithms make these data structuresConcurrent Data structure.
If an algorithm ensures that a concurrent data structure is blocked, it is calledBlocking algorithm. This data structure is also calledBlocking and concurrent data structure.
If an algorithm ensures that a concurrent data structure is non-blocking, it is calledNon-blocking algorithm. This data structure is also calledNon-blocking, concurrent data structure.
Each concurrent data structure is designed to support a specific communication method. Which concurrent data structure is used depends on your communication needs. In the next section, I will introduce some non-blocking concurrent data structures and explain their respective application scenarios. The working principles of these concurrent data structures should be able to inspire the design and implementation of non-blocking data structures.
Volatile variable
JavaVolatile
A variable is a variable that reads values directly from the primary storage. When a new value is assigned toVolatile
This value is always immediately written back to the primary storage. This ensures thatVolatile
The latest value of the variable is always visible to threads running on other CPUs. Each time other threads read the value of the variable from the main memory, rather than the CPU cache of the CPU running by the thread.
Colatile
The variable is not blocked. ModifyVolatile
The variable value is an atomic operation. It cannot be interrupted. HoweverVolatile
A read-update-write operation on a variable is not atomic. Therefore, if the following code is executed by multiple threadsRace condition.
Volatile myVar = 0;... int temp = myVar; temp ++; myVar = temp;
First,MyVar
The value of this volatile variable is read from the primary storage and assignedTemp
Variable. Then,Temp
Variable auto-Increment 1. Then,Temp
The variable value is assigned again.MyVar
This volatile variable means that it will be written back to the primary storage.
If two threads execute this code, then both of them readMyVar
Value, after adding 1, write its value back to the primary storage. In this wayMyVar
Only 1 is added without the risk of 2.
You may think that you will not write code like the above, but in practice, the code above is equivalent to the following code:
MyVar ++;
When executing the above code,MyVar
The value is read to a CPU register or a local CPU cache, and myVar is added with 1. Then the value in the CPU register or CPU cache is written back to the primary memory.
Scenarios of a single write thread
In some scenarios, you only have one thread writing to a shared variable, and multiple threads are reading this variable. When only one thread is updating a variable, no matter how many threads are reading the variable, no race condition will occur. Therefore, you can declare this variableVolatile
.
A race condition occurs only when multiple threads execute a read-update-write sequential operation on a shared variable. If only one thread is executing a raed-update-write sequential operation, other threads are executing read operations, no race condition will occur.
The following is an example of a single write thread. It does not adopt synchronous means but is concurrent.
Public class SingleWriterCounter {private volatile long count = 0;/*** Only one thread may ever call this method * or it will lead to race conditions */public void inc () {this. count ++;}/*** specify reading threads may call this method * @ return */public long count () {return this. count ;}}
Multiple threads access the sameCounter
Instance, as long as only one thread callsInc ()
METHOD. Here, I am not talking about a thread at a certain time point. I mean, there are only the same ones. A single thread is allowed to callInc ()>
Method. Multiple threads can callCount ()
Method. In such a scenario, no competing conditions will occur.
The following figure shows how the thread accessesCount
Volatile variable.
More advanced data structure based on volatile variables
Use multipleVolatile
Variable to create a data structure is acceptable.Volatile
Variables are written by only one single thread and read by multiple threads. EachVolatile
Variables may be written by a different thread (but only one ). Multiple threads can use data structures like this.Volatile
Variables send messages to each other in a non-blocking way.
The following is a simple example:
Public class DoubleWriterCounter {private volatile long countA = 0; private volatile long countB = 0;/*** Only one (and the same from thereon) thread may ever call this method, * or it will lead to race conditions. */public void incA () {this. countA ++;}/*** Only one (and the same from thereon) thread may ever call this method, * or it will lead to race conditions. */public void incB () {this. countB ++;}/*** specify reading threads may call this method */public long countA () {return this. countA;}/*** specify reading threads may call this method */public long countB () {return this. countB ;}}
As you can see,DoubleWriterCoounter
Now there are twoVolatile
Variable and two pairs of auto-increment and read methods. At a certain time point, only one single thread can callInc ()
, Only one single thread can accessIncB ()
. However, different threads can callIncA ()
AndIncB ()
.CountA ()
AndCountB ()
It can be called by multiple threads. This will not cause race conditions.
DoubleWriterCoounter
It can be used for inter-thread communication. CountA and countB can be used to store the number of production tasks and the number of consumed tasks respectively. The following figure shows that two threads communicate through a data structure similar to the preceding one.
Smart readers should have realized that twoSingleWriterCounter
Available for useDoubleWriterCoounter
. If needed, you can even use multiple threads andSingleWriterCounter
Instance.
Use the optimistic lock of CAS
Variable is not suitable. You will need some types of exclusive locks (pessimistic locks) to access this variable. The following code demonstrates exclusive access using the synchronization block in Java.
Public class SynchronizedCounter {long count = 0; public void inc () {synchronized (this) {count ++ ;}} public long count () {synchronized (this) {return this. count ;}}}
Note that ,,Inc ()
AndCount ()
Each method contains a synchronization block. This is also something we want to avoid-synchronous block and wait ()-notify call.
We can use a Java atomic variable to replace the two synchronization blocks. In this exampleAtomicLong
. The following is the AtomicLong implementation version of the SynchronizedCounter class.
Import java. util. concurrent. atomic. AtomicLong; public class AtomicLong {private AtomicLong count = new AtomicLong (0); public void inc () {boolean updated = false; while (! Updated) {long prevCount = this. count. get (); updated = this. count. compareAndSet (prevCount, prevCount + 1) ;}} public long count () {return this. count. get ();}}
Method implementation. The method does not contain a synchronization block. It is replaced by the following code:
Boolean updated = false; while (! Updated) {long prevCount = this. count. get (); updated = this. count. compareAndSet (prevCount, prevCount + 1 );}
The above code is not an atomic operation. That is to say, for two different threads to callInc ()
Method, and then executeLong prevCount = this. count. get ()
Therefore, the previous count of the counter is obtained. However, the above code does not contain any race condition.
The secret lies inWhile
The second line of code in the loop.CompareAndSet ()
A method call is an atomic operation. It compares the two values with the expected values inside AtomicLong. If the two values are the same, it replaces the internal values of AtomicLong with a new value.CompareAndSet ()
GenerallyCompare-and-swap
Commands are directly supported. Therefore, synchronization and thread suspension are not required.
Assume thatAtomicLong
The internal value of is 20 ,. Then, both threads try to callCompareAndSet (20, 20 + 1)
. AlthoughCompareAndSet ()
Is an atomic operation, and this method will be executed by these two threads one after another (only one at a time ).
The first thread compares the expected value 20 (the previous value of this counter) with the internal value of AtomicLong. Since the two values are equal, AtomicLong will update its internal value to 21 (20 + 1 ). VariableUpdated
Modified to true. The while loop ends.
Now, the second thread callsCompareAndSet (20, 20 + 1)
. Since the internal value of AtomicLong is no longer 20, this call will not succeed. The value of AtomicLong will not be changed to 21 again. Variable,Updated
If this parameter is set to false, the thread will spin out of the while loop again. During this time, it reads the value 21 and tries to update the value to 22. If no other thread callsInc ()
. The second iteration will successfully update the internal value of AtomicLong to 22.
Why is it called an optimistic lock?
The code displayed in the previous section is calledOptimistic lock(Optimistic locking ). Optimistic locks are different from traditional locks.Pessimistic lock. Traditional locks use synchronization blocks or other types of locks to block access to critical areas. A synchronization block or lock may cause thread suspension.
Optimistic locks allow all threads to create a copy of shared memory without blocking. These threads may next modify their copies and attempt to write the modified versions back to the shared memory. If no other thread modifies the shared memory, the CAS operation allows the thread to write its changes back to the shared memory. If another thread has modified the shared memory, the thread will have to get a new copy and make changes to the new copy, and try to write them back to the shared memory again.
The reason for this "optimistic lock" is that the thread obtains the copy of the data they want to modify and makes modifications. In this case, the optimistic false thread does not modify the shared memory during this period. When this optimistic assumption is established, the thread only updates the shared memory without a lock. When this assumption fails, the work done by the thread will be discarded, but no lock will be used.
Optimistic Locking is not very high for shared memory competition. If there is a lot of content in the shared memory, just because the update of the shared memory fails, it will waste a lot of CPU cycles for copying and modifying. However, if there is a lot of content on the shared memory, in any case, you should make your code design less competitive.
Optimistic lock is non-blocking
The optimistic lock mechanism we mentioned here is non-blocking. If a thread obtains a copy of the shared memory, a blocking occurs when it tries to modify it, and other threads access this memory area without blocking.
For a traditional lock/unlock mode, when a thread holds a lock, all other threads will be blocked until the thread holding the lock releases the lock again. If the thread holding the lock is blocked somewhere, the lock will not be released for a long time, or even never be released.
Irreplaceable data structure
A simple CAS optimistic lock can be used to share data results, so that the entire data structure can be replaced with a new data structure through a single CAS operation. Although it is not always feasible to replace a data structure with a modified copy.
Assume that the shared data structure is a queue. Every time a thread tries to insert or retrieve an element from a queue, it must copy the queue and make the expected modifications on the copy. We can useAtomicReference
To achieve the same purpose. Copy the reference, copy and modify the queue, and try to replaceAtomicReference
To point to the new queue.
However, a large data structure may require a large amount of memory and CPU cycles for replication. This will cause your program to occupy a large amount of memory and waste a lot of time on the copy operation. This will reduce the performance of your program, especially when the competition for this data structure is very high. Furthermore, the longer a thread spends copying and modifying the data structure, the more likely other threads are to modify the data structure during this period. As you know, if another thread modifies the data structure after it is copied, all other threads will not execute the copy-MODIFY operation again. This will increase the performance impact and memory waste, or even more.
The following section describes how to implement a non-blocking data structure that can be modified concurrently, not just copying and modifying.
Share expected changes
It is used to replace copying and modifying the entire data structure. A thread can share their expected modifications to the shared data structure. A thread changes to the following process for modifying a data structure:
Check whether another thread has submitted modifications to this data structure
If no other thread commits an expected modification, create an expected modification and submit the expected modification to the data structure.
Modify the shared data structure
Remove the reference to the expected modification and send a signal to other threads to inform them that the expected modification has been executed.
As you can see, the second step can block other threads from submitting an expected modification. Therefore, the actual work in step 2 is to act as a lock for this data structure. If a thread has successfully submitted an expected modification, other threads cannot submit another expected modification until the first expected modification is completed.
If a thread commits an expected modification and does some other work, the shared data structure is actually locked. Other threads can detect that they cannot submit an expected modification and then go back to do something else. Obviously, we need to solve this problem.
Expected modifications that can be completed
To prevent a submitted expected modification from locking the shared data structure, a submitted expected modification must contain sufficient information for other threads to complete the modification. Therefore, if a thread that has submitted the expected modification has never completed the modification, other threads can complete the modification with its support to ensure that the shared data structure is available to other threads.
The following figure shows the blueprint of the non-blocking algorithm described above:
The modification must be performed as one or more CAS operations. Therefore, if two threads attempt to complete the same expected modification, only one thread can perform all CAS operations. Once a CAS operation is completed, the attempt to complete this CAS operation will not "succeed ".
A-B-A problems
The algorithm demonstrated above can be calledA-B-A problems. A A-B-A problem refers to A scenario where A variable is modified from A to B and then modified back to. Other threads have no idea about this situation.
If thread A checks ongoing data updates, copies, and is suspended by the thread scheduler, A thread B may be able to access this shared data structure during this period. If the thread executes all updates to the data structure and removes the expected modifications, it seems that thread A has not made any changes to the data structure since it copied the data structure. However, a modification has indeed occurred. When thread A continues executing its update based on the expired data Copy, this data modification has been damaged by thread B's modification.
The following figure illustrates the A-B-A problem mentioned above:
Solutions to A-B-A problems
The common solution for A-B-A is not just to replace the pointer to an expected modified object, but to combine the pointer with a counter and then replace the pointer + counter with a single CAS operation. This is feasible in languages supporting pointers such as C and C ++. Therefore, although the currently modified pointer is set back to "no ongoing modification", the counter of pointer + counter will be auto-incremented, make modifications visible to other threads.
In Java, you cannot combine a reference and a counter to form a single variable. However, Java providesAtomicStampedReference
Class, you can use a CAS operation to automatically replace a reference and a tag (stamp ).
A non-blocking algorithm Template
The following code is intended to provide some inspiration for implementing non-blocking algorithms. This template is based on what this tutorial describes.
Note:: I am not an expert in non-blocking algorithms. Therefore, the following template may be incorrect. Do not implement your own non-blocking algorithms based on the templates I provide. This template is intended to give you an idea about what a non-blocking algorithm looks like. If you want to implement your own non-blocking algorithms, first learn the time of some real industrial non-blocking algorithms and learn more about the implementation of non-blocking algorithms in practice.
Import java. util. concurrent. atomic. atomicBoolean; import java. util. concurrent. atomic. atomicStampedReference; public class NonblockingTemplate {public static class IntendedModification {public AtomicBoolean completed = new AtomicBoolean (false);} private AtomicStampedReferenceOngoinMod = new AtomicStampedReference(Null, 0); // declare the state of the data structure here. public void modify () {while (! AttemptModifyASR ();} public boolean attemptModifyASR () {boolean modified = false; IntendedMOdification currentlyOngoingMod = ongoingMod. getReference (); int stamp = ongoingMod. getStamp (); if (currentlyOngoingMod = null) {// copy data structure-for use // in intended modification // prepare intended modification IntendedModification newMod = new IntendModification (); boolean modSubmitted = ongoingMod. compareAndSet (null, newMod, stamp, stamp + 1); if (modSubmitted) {// complete modification via a series of compare-and-swap operations. // note: other threads may assist in completing the compare-and-swap // operations, so some CAS may fail modified = true ;}} else {// attempt to complete ongoing modification, so the data structure is freed up // to allow access from this thread. modified = false;} return modified ;}}
Non-blocking algorithms are not easy to implement.
It is not easy to correctly design and implement non-blocking algorithms. Before trying to design your non-blocking algorithm, check whether someone has designed a non-blocking algorithm to meet your needs.
Java already provides some non-blocking implementations (such as concurrent1_queue). I believe that more non-blocking algorithms will be implemented in future Java versions.
In addition to the built-in non-blocking data structure in Java, there are many open-source non-blocking data structures available. For example, non-blocking HashMap implemented by LAMX Disrupter and Cliff Click. Check my Java concurrency references page to view more resources.
Benefits of using non-blocking algorithms
Non-blocking algorithms have several advantages over blocking algorithms. Let's take a look at the following:
Select
The first benefit of the non-blocking algorithm is that it gives the thread a choice of what to do when the requested action cannot be executed. The request thread has a choice about what to do. Sometimes, a thread cannot do anything. In this case, it can choose to block or wait for itself, such as giving the CPU right to other tasks. But at least gave the request thread a chance to choose.
In a single CPU system, a thread that cannot execute the request action may be suspended, so that other threads can obtain the CPU usage right. However, even if a single CPU system is blocked, it may lead to deadlocks, thread hunger, and other concurrency problems.
No deadlock
The second advantage of the non-blocking algorithm is that the suspension of one thread cannot cause the suspension of other threads. This also means that no deadlock will occur. Two threads cannot wait for each other to obtain the lock held by the other party. Because threads are not blocked. When they cannot execute their request actions, they cannot block mutual waiting. The non-blocking algorithm may even generate live lock. The two threads keep requesting some actions, but are always told that they cannot be executed (because of the actions of other threads ).
No threads suspended
It is expensive to suspend and restore a thread. Yes, over time, the operating system and the thread library have become more efficient, and the cost of thread suspension and recovery has also been reduced. However, thread suspension and user-to-user actions require a high price.
A thread is suspended at any time. Therefore, it causes thread suspension and recovery overload. Because non-blocking algorithm threads are not suspended, this overload will not occur. This means that the CPU may spend more time executing the actual business logic rather than context switching.
In a system with multiple CPUs, blocking algorithms have an important impact on blocking algorithms. A thread running on CPUA blocks a thread waiting for running on cpu B. This reduces the parallel level inherent in the program. Of course, cpu a can schedule other threads to run, but it is expensive to suspend and activate the thread (context switch. The fewer threads to be suspended, the better.
Reduce thread latency
The latency we mentioned here refers to the time between a request generation and the actual execution of the thread. Because threads are not suspended in non-blocking algorithms, they do not need to be expensive and slow thread activation costs. This means that a request can receive a faster response and reduce their response latency.
Non-blocking algorithms usually wait until the request action can be executed to reduce latency. Of course, in a system with a non-blocking data structure that has a high thread contention, the CPU may stop consuming a large number of CPU cycles during their busy waiting periods. This must be kept in mind. Non-blocking algorithms may not be the best choice if your data structure has a high thread contention. However, there are often lower thread contention by refactoring your program.