Http://www.ibm.com/developerworks/cn/java/j-jtp11234/
Fifteen years ago, a multi-processor system was a highly dedicated system that costs hundreds of thousands of dollars (most with two to four processors ). At present, the multi-processor system is very cheap, and there are a large number of, almost every major Microprocessor has built-in multi-processing support, many of which support dozens or hundreds of processors.
To use a multi-processor system, you usually need to use multiple threads to construct applications. But as anyone who writes concurrent applications can tell you, it is not enough to simply split the work in multiple threads to achieve good hardware utilization, it is also necessary to ensure that the thread does work most of the time, rather than waiting for more work, or waiting to lock the shared data structure.
Problem: Coordination between threads
IfNoIf coordination is required, almost no task can be truly parallel. Taking the thread pool as an example, the tasks executed are usually independent of each other. If the thread pool uses a public work queue, the process of deleting elements from the work queue or adding elements to the work queue must be thread-safe, in addition, this means to coordinate access by the link pointer between the first, last, or between nodes. This coordination has caused all problems.
Standard Method: Locked
In Java, the traditional method for coordinating access to shared fields is to use synchronization to ensure that all access to shared fields is completed and properly locked. Through synchronization, we can make sure that (assuming the class is correctly written) All threads that have locked a group of given variables will have exclusive access to these variables, when other threads obtain the lock in the future, you can see the changes to these variables. The disadvantage is that if the lock competition is too strong (the thread often requires to obtain the lock when other threads have the lock), it will damage the throughput, because the competition synchronization is very expensive. (Public service announcement: for modern JVM, non-competitive synchronization is now very cheap.
Another problem with locking-based algorithms is that if latency has a locked thread (due to page errors, scheduled latency, or other unexpected latency ),NoThe thread to be locked can continue to run.
Variable variables can also be used to store shared variables at a lower cost than synchronization, but they have limitations. Although other variables can be seen immediately, the read-Modify-write sequence of atomic operations cannot be displayed, which means (for example) variable variables cannot be used to reliably implement mutex (mutex lock) or counter.
Use lock to implement counter and mutual exclusion
If the development thread is secure, this will exposeget()
,increment()
Anddecrement()
Operation. Listing 1 shows how to implement this class using lock (synchronization. Note that all methods and even synchronization are required.get()
To make the Class A thread-safe class, so that no update information is lost, and all threads see the latest value of the counter.
Listing 1. synchronous counter class
public class SynchronizedCounter { private int value; public synchronized int getValue() { return value; } public synchronized int increment() { return ++value; } public synchronized int decrement() { return --value; }} |
increment()
Anddecrement()
An operation is an atomic read-Modify-write operation. To implement a counter securely, you must use the current value and add a value to it or write a new value. All these operations are considered as one operation, other threads cannot interrupt it. Otherwise, if two threads attempt to execute an increase at the same time, the unfortunate crossover of the operation will result in the counter being implemented only once rather than twice. (Note that you cannot perform this operation reliably by making the value instance variable .)
Many concurrent algorithms show atomic read-Modify-write combinations. The code in Listing 2 implements simple mutex,acquire()
The method is also an atomic read-Modify-write operation. To obtain mutex, make sure that no one else has the mutex (curOwner = Thread.currentThread()
), And then record the fact that you own the mutex (curOwner = Thread.currentThread()
), All these make it impossible for other threads to appear and modify in the middle.curOwner field
.
Listing 2. Synchronization mutex
public class SynchronizedMutex { private Thread curOwner = null; public synchronized void acquire() throws InterruptedException { if (Thread.interrupted()) throw new InterruptedException(); while (curOwner != null) wait(); curOwner = Thread.currentThread(); } public synchronized void release() { if (curOwner == Thread.currentThread()) { curOwner = null; notify(); } else throw new IllegalStateException("not owner of mutex"); }} |
The counter class in Listing 1 can work reliably and can be executed well when there is little competition or no competition. However, when the competition is fierce, this will greatly damage the performance, because the JVM uses more time to schedule threads, manage competition and wait for thread queues, while the actual work (such as adding counters) time is very small. You can recall the figure in last month's column, which shows how much throughput will be greatly reduced once multiple threads compete for a built-in monitor using synchronization. Although this column illustrates the newReentrantLock
But there are still better solutions to some problems.
Locking Problem
Use lock. If a thread attempts to obtain a lock that another thread already has, the thread will be blocked until the lock is available. This method has some obvious disadvantages, including when the thread is blocked to wait for the lock, it cannot perform any other operations. If the blocked thread is a high-priority task, this solution may cause very bad results (calledPriority inversion).
There are also some other risks when locking, such as deadlocks (deadlocks occur when multiple locks are obtained in an inconsistent order ). There is no such danger. Locking is only a coarse-grained coordination mechanism. It is also very suitable for managing simple operations, such as adding counters or updating mutex owners. If more fine-grained mechanisms are available for reliable management of concurrent updates to individual variables, it will be better; this mechanism is available in most modern processors.
Back to Top
Hardware synchronization primitive
As mentioned above, most modern processors support multi-processing. Of course, this support includes multi-processor sharing of external devices and main memory, and it usually includes special requirements for the addition of command systems to support multi-processing. In particular, almost every modern processor has instructions to update shared variables by detecting or blocking concurrent access from other processors.
Compare and exchange (CAS)
The first processor that supports concurrency provides atomic testing and setup operations. This operation is usually run on the unit. The most common method currently used by the processor (including the intel and Intel or Intel®) is to implementCompare and convertOr CAS primitive. (In an Intel processor, the cmpxchg series of commands are compared and exchanged. The PowerPC processor has a pair of commands named "load and retain" and "Conditional storage", which implement the same purpose; MIPs is similar to the PowerPC processor, except for the first command, it is called "load link ".)
The CAS operation contains three operands: memory location (V), expected original value (A), and new value (B ). If the memory location value matches the expected original value, the processor automatically updates the Location value to the new value. Otherwise, the processor does not perform any operations. In either case, it returns the value of this position before the CAS command. (In some special cases, CAS only returns whether CAS is successful without extracting the current value .) CAS effectively states that "I think location V should contain the value A. if it contains the value, place location B. Otherwise, do not change the location, only tell me the current value of this location."
CAS is usually used for synchronization by reading value a FROM address V, performing multi-step calculation to obtain the new value B, and then using CAS to change the value of V from A to B. If the value at V has not been changed at the same time, the CAS operation is successful.
Commands similar to CAS allow the algorithm to perform read-Modify-write operations without fear that other threads modify the variable at the same time, because if other threads modify the variable, CAS will detect it (and fail ), the algorithm can re-calculate the operation. Listing 3 illustrates the behavior of CAS operations (rather than performance characteristics), but the value of CAS is that it can be implemented in hardware and is extremely lightweight (in most processors ):
Listing 3. code describing the comparison and exchange behavior (rather than performance)
public class SimulatedCAS { private int value; public synchronized int getValue() { return value; }public synchronized int compareAndSwap(int expectedValue, int newValue) { int oldValue = value; if (value == expectedValue) value = newValue; return oldValue; }} |
Implement counters using CAS
Cas-based concurrent algorithms are calledNo lockAlgorithm, because the thread does not have to wait for locking (sometimes called mutex or key part, depending on the thread platform terminology ). No matter whether the CAS operation succeeds or fails, in any case, it is completed within a predictable period of time. If CAS fails, the caller can retry the CAS operation or take other appropriate operations. Listing 4 shows how to rewrite the counter class to use CAs to replace the lock:
Listing 4. Use comparison and exchange to implement counters
public class CasCounter { private SimulatedCAS value; public int getValue() { return value.getValue(); } public int increment() { int oldValue = value.getValue(); while (value.compareAndSwap(oldValue, oldValue + 1) != oldValue) oldValue = value.getValue(); return oldValue + 1; }} |
Back to Top
No lock and no waiting Algorithm
If each thread continues to operate at any latency (or even failure) of other threads, it can be said that the algorithm isNo waiting. In contrast,No lockAlgorithm requirements onlyAThe thread always performs operations. (Another definition without waiting is to ensure that each thread correctly calculates its own operations in its limited steps, regardless of the operation, timing, crossover, or speed of other threads. This limit can be a function of the number of threads in the system. For example, if there are 10 threads, each thread is executed once.CasCounter.increment()
In the worst case, each thread will have to retry up to nine times to complete the increase .)
In the past 15 years, people have applied a non-wait and non-lock algorithm (also knownNon-Blocking AlgorithmA lot of research has been done, and many people have found non-Blocking Algorithms for common data structures. Non-Blocking Algorithms are widely used in operating systems and JVM-level tasks such as thread and process scheduling. Although their implementation is complicated, they have many advantages over lock-based alternative algorithms: they can avoid risks such as priority inversion and deadlock, and the competition is relatively cheap, coordination takes place at a finer granularity level and allows a higher level of parallel mechanism.
Atomic variable class
Before JDK 5.0, if you do not use the local code, you cannot use the Java language to write a no-wait, no-lock algorithm. Injava.util.concurrent.atomic
After the atomic variable class is added to the package, this situation changes. All atomic variable classes are publicly compared and set primitives (similar to comparison and exchange). These primitives use the fastest native structure available on the platform (compare and exchange, load links/conditional storage, the worst case is the rotation lock.java.util.concurrent.atomic
The package provides nine atomic variables (AtomicInteger
;AtomicLong
;AtomicReference
;AtomicBoolean
; Atomic integer; long type; reference; and the array form of atomic tag reference and stamp reference class, which updates a pair of values atomically ).
The atomic variable class can be consideredvolatile
Extends the definition of variable to support atomic condition comparison and update settings. Reading and Writing atomic variables share the same access semantics as reading and writing variable variables.
Although the surface of the atomic variable class looks likeSynchronizedCounter
The example is the same, but the similarity is only superficial. On the surface, the operation of atomic variables changes to the hardware primitives provided by the platform for concurrent access, such as comparison and exchange.
More fine-grained means more lightweight
The general technology used to adjust the scalability of competing concurrent applications is to reduce the granularity of locked objects used, and it is hoped that more locked requests will change from competition to non-competition. The same results can be obtained from locking to atomic variables. by switching to a more fine-grained coordination mechanism, fewer competing operations will increase the throughput.
ABA problem because before changing V, CAS mainly asks "whether the value of V is still a", so before reading V for the first time and performing CAS operations on V, if you change the value from A to B and then back to A, the CAS-based algorithm will be chaotic. In this case, the CAS operation is successful, but in some cases, the result may not be as expected. (Note that counters and mutex examples in Listing 1 and list 2 do not have this problem, but not all algorithms do .) This type of problem is called
ABA ProblemsGenerally, this type of problem is handled by associating the tag or version number with each value for the CAS operation and updating the value and tag atomically.
AtomicStampedReference
Class supports this method.
Atomic variables in Java. util. Concurrent
Whether it is direct or indirect, almostjava.util.concurrent
All classes in the package use atomic variables instead of synchronization. SimilarConcurrentLinkedQueue
Class also uses atomic variables to directly implement the no-Wait algorithm, similarConcurrentHashMap
Class usageReentrantLock
Lock when necessary. Then,ReentrantLock
Use atomic variables to maintain the thread queue waiting for lock.
Without JVM improvements in JDK 5.0, these classes cannot be constructed. These improvements expose (to class libraries rather than user classes) interfaces to access hardware-level synchronization primitives. Then, the atomic variable classes and other classes in Java. util. Concurrent expose these features to the user class.
Back to Top
Use atomic variables to get higher throughput
Last month, I introducedReentrantLock
How to provide scalability advantages over synchronization, and construct a simple and highly competitive sample benchmark for simulating the dice by using a pseudo-random number generator. I showed you that through synchronization,ReentrantLock
And fairnessReentrantLock
And the result is displayed. This month, I will add other implementations to this benchmark, usingAtomicLong
Update the implementation of the PRNG status.
Listing 5 shows how to use the PRNG implementation for synchronization and the CAS alternative implementation. Note: You must execute CAs in a loop because it may fail once or multiple times to succeed. This is always the case with CAS.
Listing 5. Using synchronization and atomic variables for thread security PRNG
public class PseudoRandomUsingSynch implements PseudoRandom { private int seed; public PseudoRandomUsingSynch(int s) { seed = s; } public synchronized int nextInt(int n) { int s = seed; seed = Util.calculateNext(seed); return s % n; }}public class PseudoRandomUsingAtomic implements PseudoRandom { private final AtomicInteger seed; public PseudoRandomUsingAtomic(int s) { seed = new AtomicInteger(s); } public int nextInt(int n) { for (;;) { int s = seed.get(); int nexts = Util.calculateNext(s); if (seed.compareAndSet(s, nexts)) return s % n; } }} |
The figures in Figure 1 and figure 2 below are similar to those in the previous month. They only add one more line for the atomic method. These figures show the random throughput (in bytes per second) using different numbers of threads on the 8-way ultrasparc3 and the single-processor Pentium 4 ). The number of threads in the test is not real; these threads are usually more competitive, so they are displayed with a much lower number of threads than the actual programReentrantLock
Balance with atomic variables. You will see that althoughReentrantLock
It has more advantages than synchronization, but comparedReentrantLock
Atomic variables provide other improvements. (Because little work is done in each work unit, it is impossible to fully describe the scalability advantages of atomic variables compared with reentrantlock .)
Figure 1. Baseline throughput of synchronization, reentrantlock, fair lock, and atomiclong in 8-way ultrasparc3
Figure 2. Benchmark throughput of synchronization, reentrantlock, fair lock, and atomiclong in a single processor Pentium 4
Most users are unlikely to use atomic variables to develop their own non-Blocking Algorithms-they are more likely to usejava.util.concurrent
For exampleConcurrentLinkedQueue
. However, if you want to know how to improve the performance of these classes compared to similar functions in the previous JDK, you can use fine-grained and hardware-level concurrency primitives exposed through the atomic variable class.
Developers can directly use atomic variables as high-performance replacements for shared counters, sequence number generators, and other independent shared variables. Otherwise, these variables must be protected through synchronization.
Back to Top
Conclusion
JDK 5.0 is a huge improvement in developing high-performance concurrent classes. By exposing new low-level coordination primitives internally and providing a set of public atomic variable classes, it is now feasible to develop a no-wait and no-lock algorithm in Java for the first time. Then,java.util.concurrent
Classes in are built based on these low-level atomic variable tools, providing them with more significant scalability advantages than previously executed classes with similar functions. Although you may never directly use atomic variables, you should cheer for them.
References
- Participate in Forum discussions.
- For more information, see the original article on the developerworks global site.
- Read the complete section by Brian Goetz.Java Theory and PracticeSeries of articles.
- The atomic variable class can be well understood from the package documentation of the Java. util. Concurrent. Atomic package.
- Websites such as Wikipedia have the definition of lock-free and wait-free synchronization.
- C2 wiki also provides the definition of wait-free and lock-free synchronization.
- Keir Fraser and Tim Harris's article "concurrent programming without locks" describes alternative locking methods, including comparison and exchange, to build concurrent algorithms.
- See summarizes research in wait-free algorithms at the warping group (real-time processing without waiting technology) site.
- "A more flexible and scalable locking mechanism in JDK 5.0" (developerworks, October 2004)
ReentrantLock
This section describes the random number generation benchmark used in this column.
- Doug Lea'sConcurrent Programming in Java, Second Edition(Addison-Wesley professional 1999) is an authoritative book on subtle issues related to Java multithreading programming.
- In the developerworks Java technology area, you can also find hundreds of references on Java technology.
- See developer bookstore for a complete list of technical books, including hundreds of books on Java-related topics.