Translation Atomic VS. Non-atomic operation

Source: Internet
Author: User

Original link: atomic-vs-non-atomic-operations

There have been many articles on the internet that have been written about atomic operations, but they are usually focused on the atomic read-Modify-write (RMW. Read-modify-write) operation. But these are all atomic operations. Equally important for atomic operations is the load (read) and store (write). In this article, I will compare both the atomic and non-atomic load and store at the processor level and the C + + language level. By the way, we will clarify the following "Data competition" concepts in c++11.

If the operation of a shared variable can be done in one step relative to the other, then the operation is Atomic . When an atomic store operation is performed on a shared variable, other threads can only observe the data after it has been modified. When an atomic load operation is performed on a shared variable, it reads the full value displayed at a single moment. The non-atomic store and load do not have the above guarantee.

Leaving the above guarantee, the lock-free programming (Lock-free programming) will become impossible, since it is not possible to have multiple threads operating on the same shared variable at the same time. We can express this explicitly as a rule:

at any time, two threads operate concurrently on a shared variable, one of these operations performs a write action, and all threads must use atomic operations.

If you violate this rule and one of the threads uses non-atomic operations, you will fall into a situation called data contention in the C++11 standard (not to be confused with the race concept in Java and the more general race condition). The C++11 standard does not tell programmers why data competition is not good. But if you trigger data contention, you get a result of an "undefined behavior (undefined behavior)". The real reason that data competition is bad is only one: they lead to "torn reads" (torn reads) and "Torn writes" (Torn writes. is a non-complete read and write).

A memory operation may be non-atomic because it uses multiple CPU instructions, even if a single CPU instruction is used, it may be non-atomic. It may also be because the programmer writes portable code. But it is not easy to make this assumption. Let's look at a few examples.

Due to the non-atomic operation of multiple CPU instructions

Suppose there is a 64-bit global variable initialized to 0.

  1 uint64_t sharedvalue = 0;
View Code

At this point, update the value of a 64-bit to this variable:

  1void storevalue ()  2 {  3     sharedvalue = 0x100000002;   4 }
View Code

On a 32-bit x86 platform, compiling this function with GCC produces the following assembly code:

  1 $ gcc-o2-s-masm=intel test.c  2 $ cat Test.s  3         ...   4         mov    dword ptr sharedvalue, 2  5         mov    dword ptr sharedvalue+4, 1   6         ret  7...         
View Code

As you can see, the compiler implements a 64-bit integer assignment that is passed through two separate machine instructions. The first instruction sets the low 32 bits to 0x00000002, and the second instruction sets the high 32 bits to 0x00000001. Obviously, this assignment operation is not an atomic operation. If Sharedvalue is accessed concurrently by different threads, an error will occur.

1. If a thread has exclusive access to the sharedvalue between the two instructions, then in memory, Sharedvalue will be set to 0x0x0000000000000002,
A "RIP write" (torn). At this point, if another thread reads the value of Sharedvalue, it will read a completely false value.
2. Worse, if one thread has exclusive access between the two instructions, the other modifies the variable sharedvalue before the first thread resumes,
Causes a permanent "rip write": A high 32 bits originate from one thread, and the low 32 bits originate from another thread. Torn
3. In multicore devices, there is no need for a thread to preempt a resource that would cause a "rip-write". Because when a thread calls Sharedvalue, it's on a different core.
Any thread may read sharedvalue at some point, and sharedvalue may be in half of the changes.

Concurrently reading from Sharedvalue also poses some problems:

  1 uint64_t loadvalue ()  2 {  3     return sharedvalue;   4 }  5  6 $ gcc-o2-s-masm=intel test.c  7 $ cat Test.s  8< /c16> ...           9         mov    eax, dword ptr sharedvalue ten         mov    edx, DWORD ptr sharedvalue+4  One         RET ...          -
View Code

Similarly, the compiler implements the read operation with two machine instructions: the first instruction reads the low 32-bit value to the EAX register, and then the second instruction reads the high 32-bit value into the edx register. In this case, a write operation occurs concurrently, resulting in a "torn read (torn read)". Even if this concurrent write is an atomic operation.

These problems are not just theoretical. A test case called Test_load_store_64_fail is included in the Mintomic test suite. A thread updates the value of a 64-bit variable with a normal assignment operator, and another thread periodically performs a read operation from the same variable, verifying the results of each read back. On x86 multicore machines, this test often fails as expected.

Non-atomic CPU instructions

Even if a single CPU instruction is executed, a memory operation may be non-atomic. For example: In the ARMv7 instruction set, a STRD instruction is included to store the values of two 32-bit registers in a 64-bit variable.

  1 strd r0, R1, [R2]
View Code

In some ARMV7 processors, this instruction is non-atomic in nature. When the processor encounters this instruction, it actually executes 2 32-bit separate storage actions. Once again, any thread running at another core might observe a "torn read (torn write)". Interestingly, "torn read (torn write)" may even occur on a single-core device: Because of a system outage. In the middle of 2 32-bit storage instructions, a scheduling switchover of the thread context may occur. In this case, the STRD instruction will be re-executed once the thread resumes from the interrupt.

     Another example is on the familiar x86 platform. A 32-bit MOV instruction is atomic only if the memory operand is naturally aligned! Other cases are non-atomic. In other words, a 32-bit shape can be guaranteed only if its memory address is an integer multiple of 4. Mintomic has another test case test_load_store_32_fail, which can be verified in this case. When writing this article (June 2013), this test case is always successful on the x86 platform. However, if you force the address of the test variable sharedint to a non-aligned memory address, the test result will fail. On my Core 2 Quad Q6600 machine, the test fails if the Sharedint crosses a single cache line limit (crosses a cache lines boundary).

  1//force Sharedint to cross a cache line boundary:  2 #pragma pack (2)  3 mint_decl_aligned (staticstruct,  4 ) {  5     Char PADDING[62];   6     mint_atomic32_t sharedint;   7 }  8 g_wrapper;
View Code


For the specific processing of the situation has been said enough, and then look at the language level of the C + + atom.

All C + + operations are assumed to be non-atomic

In C and C + +, each operation is assumed to be non-atomic, even if it is an ordinary 32-bit integer assignment. Unless the compiler or hardware vendor has special instructions.

  1 uint32_t foo = 0;   2  3void storefoo ()  4 {  5     foo = 0x80286;   6 }  7
View Code

There is no mention of atomicity in the language standard. Maybe the shaping assignment is atomic, maybe not. Because non-atomic operations do not make any guarantees, it is not atomic to define normal shaping assignments in C.

In practice, we usually have a better understanding of our target platform. For example: In all modern X86,x64,itanium,sparc,arm and PowerPC processors, normal 32-bit shaping, as long as the memory address is aligned, then the assignment operation is atomic operation. You can verify this by looking at the processor manual or the compiler documentation. In the game industry, many 32-bit assignments depend on this particular guarantee.

Nonetheless, when it comes to writing real portable C and C + + code, there is a long-term camouflage tradition where we know only what is recorded in the language standards, and beyond that. The portable C + + code is to run on every possible device, past devices, current devices, and imagined devices. Personally, I like to imagine a machine that can only be changed by the chaos of the beginning.


On such a machine, you never want to perform concurrent read operations at the same time, even if it is a normal assignment. You may end up reading only a completely random value.
In C++11, there is a way to actually perform portable load atomic operations and store atomic operations: the C++11 Atomic Library. With the C++11 Atomic library, you can perform atomic load and store even on an imaginary machine. Even if you secretly use mutexes inside the C++11 Atomic library to make each operation atomic. There's also a library called Mintomic, which I released last month. June 2013, this library is now obsolete. )。 Although the supported platforms may not be many, they can still work on several older compilers, which are manually optimized and guaranteed to be unlocked.

Non-Strict (Relaxed) atomic operation

Let's go back to the original Sharedvalue example. We'll use mintomic to rewrite it. This allows all operations to be atomic on Mintomic supported platforms. First, Sharedvalue must be declared as one of the atomic data types of mintomic.

  1 #include <mintomic/mintomic.h>  2  3 mint_atomic64_t sharedvalue = {0} ;   4
View Code

The mint_atomic64_t type is on different platforms, ensuring that atomic access has the correct memory alignment. This is important. Because the compiler in some platforms does not make a similar guarantee. For example, on arm and Xcode 3.2.5 binding GCC4.2 version, there is no guarantee that the normal uint64_t is 8-byte aligned.

When modifying Sharedvalue, the normal, non-atomic assignment operation is not called, but the mint_store_64_relaxed is called.

  1void storevalue ()  2 {  3     mint_store_64_relaxed (& Sharedvalue, 0x100000002);   4 }
View Code

Similarly, when reading the value of the Sharedvalue variable, we use the mint_load_64_relaxed

  1 uint64_t loadvalue ()  2 {  3     return mint_load_64_ Relaxed (&sharedvalue);   4 }
View Code

In terms of c++11, the above approach is data race-free-free. There is absolutely no possibility of a "torn read" or "torn write" when performing concurrent operations. Whether it's running on armv6/armv7,x86,x64 or PowerPC.

Here is the version of C++11

  1 #include <atomic>  2  3 std::atomic<uint64_t> sharedvalue (0);   4   5void storevalue ()  6 {  7     Sharedvalue.store ( 0x100000002, std::memory_order_relaxed);   8 }  9 uint64_t loadvalue ()           Return sharedvalue.load (std::memory_order_relaxed);  + 
View Code

You may notice that both the mintomic and c++11 versions of the code use relaxed semantic atomic operations, which are memory sequence parameters with _relaxed suffixes.

In particular, atomic operations on relaxed semantics can be affected by instructions before or after the atomic operation, which is executed in a disorderly order. This could be because the compiler instruction is out of order or the processor's command is out of order. The compiler may also do some optimizations on repetitive relaxed atomic operations, as in non-atomic operations. In all cases, this is an atomic operation.

When working concurrently with shared variables, it is a good practice to consistently use the C++11 Atomic library or mintomic, even if you know that the normal load or store operation is atomic in the platform you are targeting. A atomic library method can serve as a hint that the variable is accessed concurrently.

Translation Atomic VS. Non-atomic operation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.