Java Theory and Practice: popular Atoms

Last Update:2018-12-06 Source: Internet

Author: User

Tags one more line

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.ibm.com/developerworks/cn/java/j-jtp11234/

Fifteen years ago, a multi-processor system was a highly dedicated system that costs hundreds of thousands of dollars (most with two to four processors ). At present, the multi-processor system is very cheap, and there are a large number of, almost every major Microprocessor has built-in multi-processing support, many of which support dozens or hundreds of processors.

To use a multi-processor system, you usually need to use multiple threads to construct applications. But as anyone who writes concurrent applications can tell you, it is not enough to simply split the work in multiple threads to achieve good hardware utilization, it is also necessary to ensure that the thread does work most of the time, rather than waiting for more work, or waiting to lock the shared data structure.

Problem: Coordination between threads

IfNoIf coordination is required, almost no task can be truly parallel. Taking the thread pool as an example, the tasks executed are usually independent of each other. If the thread pool uses a public work queue, the process of deleting elements from the work queue or adding elements to the work queue must be thread-safe, in addition, this means to coordinate access by the link pointer between the first, last, or between nodes. This coordination has caused all problems.

Standard Method: Locked

In Java, the traditional method for coordinating access to shared fields is to use synchronization to ensure that all access to shared fields is completed and properly locked. Through synchronization, we can make sure that (assuming the class is correctly written) All threads that have locked a group of given variables will have exclusive access to these variables, when other threads obtain the lock in the future, you can see the changes to these variables. The disadvantage is that if the lock competition is too strong (the thread often requires to obtain the lock when other threads have the lock), it will damage the throughput, because the competition synchronization is very expensive. (Public service announcement: for modern JVM, non-competitive synchronization is now very cheap.

Another problem with locking-based algorithms is that if latency has a locked thread (due to page errors, scheduled latency, or other unexpected latency ),NoThe thread to be locked can continue to run.

Variable variables can also be used to store shared variables at a lower cost than synchronization, but they have limitations. Although other variables can be seen immediately, the read-Modify-write sequence of atomic operations cannot be displayed, which means (for example) variable variables cannot be used to reliably implement mutex (mutex lock) or counter.

Use lock to implement counter and mutual exclusion

If the development thread is secure, this will exposeget(),increment()Anddecrement()Operation. Listing 1 shows how to implement this class using lock (synchronization. Note that all methods and even synchronization are required.get()To make the Class A thread-safe class, so that no update information is lost, and all threads see the latest value of the counter.

Listing 1. synchronous counter class

        public class SynchronizedCounter {    private int value;    public synchronized int getValue() { return value; }    public synchronized int increment() { return ++value; }    public synchronized int decrement() { return --value; }}

increment()Anddecrement()An operation is an atomic read-Modify-write operation. To implement a counter securely, you must use the current value and add a value to it or write a new value. All these operations are considered as one operation, other threads cannot interrupt it. Otherwise, if two threads attempt to execute an increase at the same time, the unfortunate crossover of the operation will result in the counter being implemented only once rather than twice. (Note that you cannot perform this operation reliably by making the value instance variable .)

Many concurrent algorithms show atomic read-Modify-write combinations. The code in Listing 2 implements simple mutex,acquire()The method is also an atomic read-Modify-write operation. To obtain mutex, make sure that no one else has the mutex (curOwner = Thread.currentThread()), And then record the fact that you own the mutex (curOwner = Thread.currentThread()), All these make it impossible for other threads to appear and modify in the middle.curOwner field.

Listing 2. Synchronization mutex

        public class SynchronizedMutex {    private Thread curOwner = null;    public synchronized void acquire() throws InterruptedException {        if (Thread.interrupted()) throw new InterruptedException();        while (curOwner != null)             wait();        curOwner = Thread.currentThread();    }    public synchronized void release() {        if (curOwner == Thread.currentThread()) {            curOwner = null;            notify();        } else            throw new IllegalStateException("not owner of mutex");    }}

The counter class in Listing 1 can work reliably and can be executed well when there is little competition or no competition. However, when the competition is fierce, this will greatly damage the performance, because the JVM uses more time to schedule threads, manage competition and wait for thread queues, while the actual work (such as adding counters) time is very small. You can recall the figure in last month's column, which shows how much throughput will be greatly reduced once multiple threads compete for a built-in monitor using synchronization. Although this column illustrates the newReentrantLockBut there are still better solutions to some problems.

Locking Problem

Use lock. If a thread attempts to obtain a lock that another thread already has, the thread will be blocked until the lock is available. This method has some obvious disadvantages, including when the thread is blocked to wait for the lock, it cannot perform any other operations. If the blocked thread is a high-priority task, this solution may cause very bad results (calledPriority inversion).

There are also some other risks when locking, such as deadlocks (deadlocks occur when multiple locks are obtained in an inconsistent order ). There is no such danger. Locking is only a coarse-grained coordination mechanism. It is also very suitable for managing simple operations, such as adding counters or updating mutex owners. If more fine-grained mechanisms are available for reliable management of concurrent updates to individual variables, it will be better; this mechanism is available in most modern processors.

Hardware synchronization primitive

As mentioned above, most modern processors support multi-processing. Of course, this support includes multi-processor sharing of external devices and main memory, and it usually includes special requirements for the addition of command systems to support multi-processing. In particular, almost every modern processor has instructions to update shared variables by detecting or blocking concurrent access from other processors.

Compare and exchange (CAS)

The first processor that supports concurrency provides atomic testing and setup operations. This operation is usually run on the unit. The most common method currently used by the processor (including the intel and Intel or Intel®) is to implementCompare and convertOr CAS primitive. (In an Intel processor, the cmpxchg series of commands are compared and exchanged. The PowerPC processor has a pair of commands named "load and retain" and "Conditional storage", which implement the same purpose; MIPs is similar to the PowerPC processor, except for the first command, it is called "load link ".)

The CAS operation contains three operands: memory location (V), expected original value (A), and new value (B ). If the memory location value matches the expected original value, the processor automatically updates the Location value to the new value. Otherwise, the processor does not perform any operations. In either case, it returns the value of this position before the CAS command. (In some special cases, CAS only returns whether CAS is successful without extracting the current value .) CAS effectively states that "I think location V should contain the value A. if it contains the value, place location B. Otherwise, do not change the location, only tell me the current value of this location."

CAS is usually used for synchronization by reading value a FROM address V, performing multi-step calculation to obtain the new value B, and then using CAS to change the value of V from A to B. If the value at V has not been changed at the same time, the CAS operation is successful.

Commands similar to CAS allow the algorithm to perform read-Modify-write operations without fear that other threads modify the variable at the same time, because if other threads modify the variable, CAS will detect it (and fail ), the algorithm can re-calculate the operation. Listing 3 illustrates the behavior of CAS operations (rather than performance characteristics), but the value of CAS is that it can be implemented in hardware and is extremely lightweight (in most processors ):

Listing 3. code describing the comparison and exchange behavior (rather than performance)

        public class SimulatedCAS {     private int value;     public synchronized int getValue() { return value; }public synchronized int compareAndSwap(int expectedValue, int newValue) {         int oldValue = value;         if (value == expectedValue)             value = newValue;         return oldValue;     }}

Implement counters using CAS

Cas-based concurrent algorithms are calledNo lockAlgorithm, because the thread does not have to wait for locking (sometimes called mutex or key part, depending on the thread platform terminology ). No matter whether the CAS operation succeeds or fails, in any case, it is completed within a predictable period of time. If CAS fails, the caller can retry the CAS operation or take other appropriate operations. Listing 4 shows how to rewrite the counter class to use CAs to replace the lock:

Listing 4. Use comparison and exchange to implement counters

        public class CasCounter {    private SimulatedCAS value;    public int getValue() {        return value.getValue();    }    public int increment() {        int oldValue = value.getValue();        while (value.compareAndSwap(oldValue, oldValue + 1) != oldValue)            oldValue = value.getValue();        return oldValue + 1;    }}

No lock and no waiting Algorithm

If each thread continues to operate at any latency (or even failure) of other threads, it can be said that the algorithm isNo waiting. In contrast,No lockAlgorithm requirements onlyAThe thread always performs operations. (Another definition without waiting is to ensure that each thread correctly calculates its own operations in its limited steps, regardless of the operation, timing, crossover, or speed of other threads. This limit can be a function of the number of threads in the system. For example, if there are 10 threads, each thread is executed once.CasCounter.increment()In the worst case, each thread will have to retry up to nine times to complete the increase .)

In the past 15 years, people have applied a non-wait and non-lock algorithm (also knownNon-Blocking AlgorithmA lot of research has been done, and many people have found non-Blocking Algorithms for common data structures. Non-Blocking Algorithms are widely used in operating systems and JVM-level tasks such as thread and process scheduling. Although their implementation is complicated, they have many advantages over lock-based alternative algorithms: they can avoid risks such as priority inversion and deadlock, and the competition is relatively cheap, coordination takes place at a finer granularity level and allows a higher level of parallel mechanism.

Atomic variable class

Before JDK 5.0, if you do not use the local code, you cannot use the Java language to write a no-wait, no-lock algorithm. Injava.util.concurrent.atomicAfter the atomic variable class is added to the package, this situation changes. All atomic variable classes are publicly compared and set primitives (similar to comparison and exchange). These primitives use the fastest native structure available on the platform (compare and exchange, load links/conditional storage, the worst case is the rotation lock.java.util.concurrent.atomicThe package provides nine atomic variables (AtomicInteger;AtomicLong;AtomicReference;AtomicBoolean; Atomic integer; long type; reference; and the array form of atomic tag reference and stamp reference class, which updates a pair of values atomically ).

The atomic variable class can be consideredvolatileExtends the definition of variable to support atomic condition comparison and update settings. Reading and Writing atomic variables share the same access semantics as reading and writing variable variables.

Although the surface of the atomic variable class looks likeSynchronizedCounterThe example is the same, but the similarity is only superficial. On the surface, the operation of atomic variables changes to the hardware primitives provided by the platform for concurrent access, such as comparison and exchange.

More fine-grained means more lightweight

The general technology used to adjust the scalability of competing concurrent applications is to reduce the granularity of locked objects used, and it is hoped that more locked requests will change from competition to non-competition. The same results can be obtained from locking to atomic variables. by switching to a more fine-grained coordination mechanism, fewer competing operations will increase the throughput.

ABA problem because before changing V, CAS mainly asks "whether the value of V is still a", so before reading V for the first time and performing CAS operations on V, if you change the value from A to B and then back to A, the CAS-based algorithm will be chaotic. In this case, the CAS operation is successful, but in some cases, the result may not be as expected. (Note that counters and mutex examples in Listing 1 and list 2 do not have this problem, but not all algorithms do .) This type of problem is called ABA ProblemsGenerally, this type of problem is handled by associating the tag or version number with each value for the CAS operation and updating the value and tag atomically. AtomicStampedReferenceClass supports this method.

Atomic variables in Java. util. Concurrent

Whether it is direct or indirect, almostjava.util.concurrentAll classes in the package use atomic variables instead of synchronization. SimilarConcurrentLinkedQueueClass also uses atomic variables to directly implement the no-Wait algorithm, similarConcurrentHashMapClass usageReentrantLockLock when necessary. Then,ReentrantLockUse atomic variables to maintain the thread queue waiting for lock.

Without JVM improvements in JDK 5.0, these classes cannot be constructed. These improvements expose (to class libraries rather than user classes) interfaces to access hardware-level synchronization primitives. Then, the atomic variable classes and other classes in Java. util. Concurrent expose these features to the user class.

Use atomic variables to get higher throughput

Last month, I introducedReentrantLockHow to provide scalability advantages over synchronization, and construct a simple and highly competitive sample benchmark for simulating the dice by using a pseudo-random number generator. I showed you that through synchronization,ReentrantLockAnd fairnessReentrantLockAnd the result is displayed. This month, I will add other implementations to this benchmark, usingAtomicLongUpdate the implementation of the PRNG status.

Listing 5 shows how to use the PRNG implementation for synchronization and the CAS alternative implementation. Note: You must execute CAs in a loop because it may fail once or multiple times to succeed. This is always the case with CAS.

Listing 5. Using synchronization and atomic variables for thread security PRNG

        public class PseudoRandomUsingSynch implements PseudoRandom {    private int seed;    public PseudoRandomUsingSynch(int s) { seed = s; }    public synchronized int nextInt(int n) {        int s = seed;        seed = Util.calculateNext(seed);        return s % n;    }}public class PseudoRandomUsingAtomic implements PseudoRandom {    private final AtomicInteger seed;    public PseudoRandomUsingAtomic(int s) {        seed = new AtomicInteger(s);    }    public int nextInt(int n) {        for (;;) {            int s = seed.get();            int nexts = Util.calculateNext(s);            if (seed.compareAndSet(s, nexts))                return s % n;        }    }}

The figures in Figure 1 and figure 2 below are similar to those in the previous month. They only add one more line for the atomic method. These figures show the random throughput (in bytes per second) using different numbers of threads on the 8-way ultrasparc3 and the single-processor Pentium 4 ). The number of threads in the test is not real; these threads are usually more competitive, so they are displayed with a much lower number of threads than the actual programReentrantLockBalance with atomic variables. You will see that althoughReentrantLockIt has more advantages than synchronization, but comparedReentrantLockAtomic variables provide other improvements. (Because little work is done in each work unit, it is impossible to fully describe the scalability advantages of atomic variables compared with reentrantlock .)

Figure 1. Baseline throughput of synchronization, reentrantlock, fair lock, and atomiclong in 8-way ultrasparc3

Figure 2. Benchmark throughput of synchronization, reentrantlock, fair lock, and atomiclong in a single processor Pentium 4

Most users are unlikely to use atomic variables to develop their own non-Blocking Algorithms-they are more likely to usejava.util.concurrentFor exampleConcurrentLinkedQueue. However, if you want to know how to improve the performance of these classes compared to similar functions in the previous JDK, you can use fine-grained and hardware-level concurrency primitives exposed through the atomic variable class.

Developers can directly use atomic variables as high-performance replacements for shared counters, sequence number generators, and other independent shared variables. Otherwise, these variables must be protected through synchronization.

Conclusion

JDK 5.0 is a huge improvement in developing high-performance concurrent classes. By exposing new low-level coordination primitives internally and providing a set of public atomic variable classes, it is now feasible to develop a no-wait and no-lock algorithm in Java for the first time. Then,java.util.concurrentClasses in are built based on these low-level atomic variable tools, providing them with more significant scalability advantages than previously executed classes with similar functions. Although you may never directly use atomic variables, you should cheer for them.

References

Participate in Forum discussions.
For more information, see the original article on the developerworks global site.
Read the complete section by Brian Goetz.Java Theory and PracticeSeries of articles.
The atomic variable class can be well understood from the package documentation of the Java. util. Concurrent. Atomic package.
Websites such as Wikipedia have the definition of lock-free and wait-free synchronization.
C2 wiki also provides the definition of wait-free and lock-free synchronization.
Keir Fraser and Tim Harris's article "concurrent programming without locks" describes alternative locking methods, including comparison and exchange, to build concurrent algorithms.
See summarizes research in wait-free algorithms at the warping group (real-time processing without waiting technology) site.
"A more flexible and scalable locking mechanism in JDK 5.0" (developerworks, October 2004)ReentrantLockThis section describes the random number generation benchmark used in this column.
Doug Lea'sConcurrent Programming in Java, Second Edition(Addison-Wesley professional 1999) is an authoritative book on subtle issues related to Java multithreading programming.
In the developerworks Java technology area, you can also find hundreds of references on Java technology.
See developer bookstore for a complete list of technical books, including hundreds of books on Java-related topics.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More