Talk about concurrency (v) The implementation principle of atomic operation

Source: Internet
Author: User

Introduction

Atom (Atom) is intended to be "the smallest particle that cannot be further divided", while atomic manipulation (atomic operation) means "one or a series of operations that cannot be interrupted". Implementing atomic operations on multiprocessor is a bit more complicated. Let's talk about how atomic operations are implemented in inter processors and Java.

Term definitions

Cache line : the minimum operating unit of the cache.
Compare and Exchange : CAS operations need to enter two values, an old value (expected before the operation of the value) and a new value, during the operation compared to the old value has not changed, if not changed, only to replace the new value, the change is not exchanged.
CPU pipelining : The CPU pipelining works like an assembly line in the industrial production, in the CPU consists of a 5~6 of different functions of the circuit unit to form an instruction processing line, and then a X86 instruction is divided into 5~6 step and then by these circuit units are executed separately, This enables an instruction to be completed in a CPU clock cycle, thus increasing the CPU's computational speed.
Memory Order conflicts: Memory sequence conflicts are generally caused by false sharing, which means that multiple CPUs simultaneously modify different parts of the same cache line and cause one of the CPUs to be invalid, and the CPU must empty the pipeline when this memory sequence conflict occurs.

How the processor implements atomic operations

The 32-bit IA-32 processor uses atomic operations between multiple processors using a method based on cache locking or bus locking .

The processor automatically guarantees the atomicity of basic memory operations

first the processor automatically guarantees the atomicity of basic memory operations . The processor guarantees that a byte is read or written from the system memory, meaning that when a processor reads one byte, the other processor cannot access the memory address of the byte. Pentium 6 and the latest processors can automatically guarantee that single-processor 16/32/64 bits in the same cache line are atomic, but complex memory manipulation processors cannot automatically guarantee their atomicity, such as cross-bus widths, across multiple cache lines, and cross-page table access. However, the processor provides bus locking and cache locking two mechanisms to ensure the atomicity of complex memory operations.

Guaranteed atomicity with bus lock

the first mechanism is to ensure atomicity through a bus lock . If multiple processors simultaneously read and overwrite shared variables (i++ is the classic read overwrite operation), then the shared variable will be manipulated by multiple processors at the same time, so that the read rewrite operation is not atomic, and the value of the shared variable will be inconsistent with the expected after the operation, for example: if I=1, we do two times I + + operation, we expect the result to be 3, but it is possible that the result is 2. Such as:

The reason is that it is possible for multiple processors to read the variable i from their respective caches at the same time, adding one operation separately and then writing to the system memory. If you want to ensure that the operation of the read overwrite of the shared variable is atomic, you must ensure that the CPU1 read overwrites the shared variable, and CPU2 cannot manipulate the cache that caches the shared variable's memory address.

The processor uses a bus lock to solve this problem . The so-called bus lock is a lock# signal that is provided by the processor, and when a processor outputs this signal on the bus, the other processor's request is blocked, and the processor can use the shared memory exclusively.

Using cache locks to ensure atomicity

the second mechanism is to ensure atomicity through cache locking . At the same time we just need to ensure that the operation of a memory address is atomic, but the bus lock the CPU and memory communication between the lock, which makes the lock during the other processors can not manipulate the data of other memory addresses, so bus locking overhead is relatively large, Recent processors use cache locking instead of bus locking in some situations to optimize.

Frequently used memory is cached in the processor's L1,L2 and L3 caches, so atomic operations can be done directly in the processor's internal cache, without the need to declare bus locks, which can be used to achieve complex atomicity in the form of "cache locking" in Pentium 6 and most recent processors. The so-called "cache lock" is if the cache in the processor cache line in the memory area is locked during the lock operation, when it performs a lock operation writeback memory, the processor does not claim the lock# signal on the bus, but modifies the internal memory address, and allows its cache consistency mechanism to ensure the atomicity of operations, Because the cache coherency mechanism prevents the simultaneous modification of memory region data that is cached by more than two processors, the cache row is invalid when the other processor writes back the data of the cached row that has been locked, and in Example 1, when CPU1 modifies I in the cache row using cache locking, CPU2 cannot cache the I cache row at the same time.

However, there are two cases in which the processor does not use cache locking . The first scenario is when the data of the operation cannot be cached inside the processor, or the operation's data spans multiple cache lines, the processor calls the bus lock. The second scenario is that some processors do not support cache locking. For Inter486 and Pentium processors, bus locking is also called in the cache line of the processor, even if the locked memory area is in the process.

The above two mechanisms can be implemented through the inter processor with a lot of lock prefix instructions. For example, the bit test and modify instruction BTS,BTR,BTC, Exchange instruction Xadd,cmpxchg and some other operands and logic instructions, such as add (plus), or (or), etc., the memory area that is manipulated by these instructions is locked, causing the other processor to not access it at the same time.

How Java Implements atomic operationsAtomic operation with cyclic CAS

CAS operations in the JVM are implemented using the CMPXCHG instructions provided by the processors mentioned in the previous section. The basic idea of the spin CAS implementation is to cycle through the CAS operation until successful, the following code implements a CAS thread-safe counter method Safecount and a non-thread-safe counter count.

Private Atomicinteger Atomici = new Atomicinteger (0);p rivate int i = 0;public static void Main (string[] args) {final C    Ounter cas = new Counter ();    list<thread> ts = new arraylist<thread> (600);    Long start = System.currenttimemillis (); for (int j = 0; J <; J + +) {thread t = new Thread (new Runnable () {@Override public vo                    ID run () {for (int i = 0; i < 10000; i++) {cas.count ();                Cas.safecount ();        }            }        });    Ts.add (t);    } for (Thread t:ts) {T.start ();        }//waits for all threads to execute completion for (thread T:ts) {try {t.join ();        } catch (Interruptedexception e) {e.printstacktrace ();    }} System.out.println (CAS.I);    System.out.println (Cas.atomicI.get ()); System.out.println (System.currenttimemillis ()-start);} /** * Use CAs to implement thread safety counters */private void Safecount () {for (;;) {int i = ATOMici.get ();        Boolean suc = Atomici.compareandset (i, ++i);        if (suc) {break; }}}/** * Non-thread safe counter */private Void count () {i++;}

The concurrency package for JDK starting with Java1.5 provides classes to support atomic operations such as Atomicboolean (a Boolean updated atomically), Atomicinteger (an int value that is updated atomically), Atomiclong (a Long value that is updated atomically), these atomic wrapper classes also provide useful tool methods, such as atomically increasing the current value by 1 and self minus 1.

There are some concurrency frameworks in Java and bundles that also use spin CAs to implement atomic operations, such as the Xfer method of the LinkedTransferQueue class. CAS is an efficient solution to atomic operations, but CAS still has three major problems. ABA problem, the long cycle time overhead and the only guarantee of a shared variable atomic operation.

1. ABA Issues
Because CAs needs to check that the value is not changed when the value is manipulated, if it does not change, it is updated, but if a value is a, B, and a, then checking with CAS will show that its value has not changed, but actually it has changed. The solution to the ABA problem is to use the version number. Append the version number before the variable, and add one to the version number each time the variable is updated, then A-b-a will become 1a-2b-3a.

A class atomicstampedreference is provided in the atomic package of the JDK starting with Java1.5 to solve the ABA problem. The Compareandset method of this class is to first check whether the current reference is equal to the expected reference, and whether the current flag is equal to the expected flag, and if all is equal, then atomically sets the reference and the value of the flag to the given update value.

public boolean compareAndSet(               V      expectedReference, // 预期引用               V      newReference, // 更新后的引用              int    expectedStamp,  // 预期标志              int    

2. Long cycle time overhead
Spin CAs can cause very large execution overhead for CPUs if they are unsuccessful for a long time. If the JVM can support the pause instruction provided by the processor then the efficiency will be improved, the pause command has two functions, first it can delay the pipelining instruction (de-pipeline), so that the CPU does not consume excessive execution resources, the delay depends on the specific implementation of the version, On some processors, the delay time is zero. Second, it avoids the CPU pipelining being emptied (CPU pipeline flush) when exiting the loop due to memory order collisions (violation), which improves CPU execution efficiency.

3. Only one atomic operation of a shared variable can be guaranteed
When performing operations on a shared variable, we can use the method of circular CAs to guarantee atomic operations, but for multiple shared variables, the cyclic CAS cannot guarantee the atomicity of the operation, it is possible to use locks at this time, or there is a trickery way to combine multiple shared variables into a single shared variable to operate. For example, there are two shared variable i=2,j=a, merge ij=2a, and then use CAs to manipulate IJ. Starting with Java1.5 The JDK provides the Atomicreference class to guarantee atomicity between reference objects, and you can put multiple variables in an object for CAS operations.

Atomic operation with lock mechanism

The lock mechanism ensures that only the thread that obtains the lock can manipulate the locked memory area. There are many kinds of locking mechanisms inside the JVM, there are biased locks, lightweight locks and mutexes, it is interesting that in addition to biased locks, the JVM implements the way the lock is used by the cyclic CAs, when a thread wants to enter the synchronization block using a circular CAs method to obtain the lock, when it exits the synchronization block using a circular CAS release lock.

Talk about concurrency (v) The implementation principle of atomic operation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.