Atomicity of operations in multithreaded programs

Source: Internet
Author: User
Tags intel core 2 duo

[Turn]http://www.parallellabs.com/2010/04/15/atomic-operation-in-multithreaded-application/the atomicity of operations in multithreaded programs0. Background

An atomic operation is an operation that cannot be divided. Atomic manipulation is a very important concept in multi-threaded programs, and it is often used to implement some synchronization mechanisms and is also the source of some common multithreaded bugs. This paper mainly discusses three questions: 1. Are the read and write operations for variables in multithreaded programs atomic? 2. Is thread-safe read and write to bit field (bit field) in multithreaded programs? 3. How should programmers use atomic operations?

1. Are the read and write operations for variables in multi-threaded environments atomic?

Let's start with a very popular Baidu pen test. Many people do not understand the rationale behind it, we will be here to analyze it (in fact, this topic a bit ambiguous, we will talk about later):

The following multi-threaded operation of the int variable x, which needs to be synchronized: ()
A. x=y; B. x + +; C. ++x; D. X=1;

To understand the problem thoroughly, we need to start with the hardware first. With the common X86 CPU, according to Intel's reference manual, it is based on the following three mechanisms to ensure the locking of atomic operations in multicore (Section 8.1):
(1) Guaranteed Atomic Operations (Note: 8.1.1 sections are described in detail)
(2) Bus locking, using the lock# signal and the LOCK instruction prefix
(3) Cache coherency protocols that ensure, atomic operations can is carried out on cached data structures (cache lock); This mechanism was present in the Pentium 4, Intel Xeon, and P6 family processors

These three mechanisms are independent of each other and complement one another. The simple understanding is
(1) Some basic memory read and write operation is itself already provided by the hardware of the atomic guarantee (such as read and write a single byte operation);
(2) Some operations that require assurances of atomicity but not supported by the (1) mechanism (for example, Read-modify-write) can lock the bus by using "lock#" to ensure the atomicity of the operation
(3) because a lot of memory data is already stored in the L1/L2 cache, the atomic operation of this data only need to deal with the local cache, and do not need to deal with the bus, so the CPU provides the cache Coherency mechanism to ensure that the other ones that also cache the data processor can read the latest value (about the cache coherency can participate in one of my blog posts).

So what is the atomic support for the basic operations of the CPU (1)? According to the Intel Manual 8.1.1 section:

Starting with Intel486 processor, the following basic memory operations are atomic:
reading or writing a byte (read and write of one byte)
reading or writing a word aligned on a 16-bit boundary (reads and writes of words aligned to 16-bit boundaries)
reading or writing a doubleword aligned on a 32-bit boundary (two-word read and write aligned to 32-bit boundary)

Starting with Pentium processor, the following atomic operations were added in addition to the previously supported atomic operations:
reading or writing a quadword aligned on a 64-bit boundary (aligned to 64-bit boundary four-word read and write)
16-bit accesses to uncached memory locations, fit within a 32-bit data bus (access to memory addresses not cached and within the 32-bit data bus range)

Starting with P6 family processors, the following atomic operations were added in addition to the previously supported atomic operations:
unaligned 16-, 32-, and 64-bit accesses to cached memory the fit within a cache line (unaligned 16/32/64 bit access to the cache address in a single cache line) )

So which operations are non-atomic?
Accesses to cacheable memory that is split across bus widths, cache lines, and
Page boundaries is not guaranteed to being atomic by the Intel Core 2 Duo, Intel®
atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 Family, Pentium, and
Intel486 processors. (To be simple, the access to the separated memory addresses by bus bandwidth, cache line, and page size is not atomic, and if you want to make sure that these operations are atomic, you have to turn to the mechanism (2), A corresponding control signal is sent to the bus).

It is important to note that even though some non-aligned read and write operations have been provided with atomic safeguards starting from P6 family, non-aligned access is very performance-impacting and needs to be avoided as much as possible. Of course, for a typical programmer, you don't need to worry too much about this, because most compilers will automatically help you with memory alignment.

Back to the very beginning of the pen question. Let's disassemble and see what they do:

01020304050607080910111213141516 x = y;mov eax,dword ptr [y]mov dword ptr [x],eaxx++;mov eax,dword ptr [x]add eax,1mov dword ptr [x],eax ++x;mov eax,dword ptr [x]add eax,1mov dword ptr [x],eaxx = 1;mov dword ptr [x],1

(1) It is clear that X=1 is an atomic operation.
Because x is of type int, the 32-bit CPU has an int of 32 bits, and the X86 provides atomic support directly on the hardware. In fact, no matter how many threads execute an assignment such as x=1, the value of x will eventually be assigned (instead of appearing as a thread that only updates the low 16 bits of x and then is blocked, and the other thread then updates the low 24 bits of x and then blocks again, resulting in the corruption of the value of x).

(2) See X + + and ++x.
In fact, similar to x + +, x+=2, ++x such operations in the multi-threaded environment is required to synchronize. Because X86 will handle this statement in the form of three instructions: read the value of x from memory into the register, add 1 to the register, and write the new value back to the memory address where X is located (see the disassembly code above).

For example, there are two threads, which are executed in the following order (note that reading X and writeback X are atomic operations and two threads cannot execute simultaneously):

Time thread 1 Thread 2
0 load eax, x
1 load eax, X
2 add eax, 1 add eax, 1
3 store X, eax
4 Store X, eax

We will find that the final x value will be 1 instead of 2 because the result of thread 1 is overwritten. In this case, we need to locking operations such as X + + (for example, mutexes in pthread) to ensure synchronization, or to use some libraries that provide atomic operations, such as the Atomic library in the Windows API, The Atomic.h,java concurrent library in the Linux kernel will support Atomic_int in atomic integer,c++0x, and so on, these libraries use the hardware mechanism provided by the CPU to do a layer of encapsulation, providing some API that guarantees atomicity.

(3) Finally look at x=y.
On X86 It contains two operations: reads Y to register, and then writes the value to X. Read the value of Y the operation itself is atomic, the value is written to the X is also atomic, but the two together is not atomic operation? I personally think that X=y is not an atomic operation because it is not an operation that cannot be divided. But does it need to be synchronized? In fact, the crux of the problem lies in the context of the program.

For example, there are two threads, thread 1 to execute {y = 1; x = y;}, thread 2 to execute {y = 2; y = 3;}, assuming that they are executed in the following chronological order:

Time thread 1 Thread 2
0 Store Y, 1
1 Store y, 2
2 load eax, y
3 Store Y, 3
4 Store X, eax

Then the value of x in the final thread 1 is 2, not the 1 that it originally wanted. We need to add the corresponding synchronization statement to ensure that y = 2 does not occur between two statements of thread 1. y = 3 the statement, although executed between load Y and store x, does not affect the semantics of the x=y statement itself. So you can say x=y need to sync, also can say x=y don't need to sync, see how you understand test instructions. The same is true of whether x=1 need to be synchronized, although it is atomic, but if there is another thread to read the value after x=1, it must also be synchronized, otherwise the other thread will read the old value of x instead of 1.

2. Is the read and write operation for bit field (bit field) thread-safe?

Bit field is commonly used to efficiently store variables with limited number of digits, and is useful in kernel/bottom-level development. In general, multithreaded access to different bit members within the same structure is not guaranteed to be thread-safe.

For example, the following example in Wikipedia:

0102030405060708091011121314151617181920 struct foo {    int flag : 1;    int counter : 15;};struct foo my_foo; /* ... *//* in thread 1 */pthread_mutex_lock(&my_mutex_for_flag);my_foo.flag = !my_foo.flag;pthread_mutex_unlock(&my_mutex_for_flag);/* in thread 2 */pthread_mutex_lock(&my_mutex_for_counter);++my_foo.counter;pthread_mutex_unlock(&my_mutex_for_counter);

Two threads read and write to My_foo.flag and My_foo.counter respectively, but even with the above lock-in mode there is no guarantee that it is thread-safe. The reason is that different members in memory in the specific arrangement of "with the byte order, Bit order, alignment, and other issues related to different platforms and compilers may be arranged very differently, to write portable code can not assume that Bit-field is a fixed arrangement of" [3]. And in general, the smallest unit of CPU-to-memory operations is word (X86 Word is 16bits), not 1bit. This means that if My_foo.flag and My_foo.counter are stored in the same word, the CPU reads two values together into the register when reading or writing to any bit member, resulting in read and write collisions. The correct way to do this is to use a mutex to protect My_foo.flag and my_foo.counter at the same time to ensure that read and write are thread-safe.

This is defined in the c++0x draft for bit field:
A continuous number of non-0bit bit fields belong to the same memory location, and a bit field with a length of 0bit will take up a separate memory location. Reading and writing to the same memory location is not thread-safe, and reading and writing to different memory location is thread-safe.
For example, in the example Bf1 and BF2 are the same memory location,bf3 is a separate memory location,bf4 is a separate memory location:

There is a bug in the Linux kernel because bit field is not a thread-safe cause.

Quote a summary of Pongba:

So, if you have multiple bitfields that are contiguous and want to read them without conflict, there are two ways to separate them in the middle with a size of 0 bitfield, but this practice actually eliminates Bitfield's memory-saving intentions, because in order to keep them out of conflict, The two bitfield that are at least separated must not be able to share byte. The other way of course is to use locks.

3. How does the programmer use atomic operation?

In general, programmers do not have to deal directly with the atomic operations provided by the CPU, so they only need to select the language or platform-provided atomic API. And the advantage of using encapsulated APIs is that they often also provide packages for more complex operations such as Compare_and_swap,fetch_and_add, which are both read and write.

The common APIs are as follows:

Interlockedxxxx APIs on Windows
Gnu/linux on Linux kernel atomic_32.h
Atomic builtins in GCC (__sync_fetch_and_add (), etc.)
Java.util.concurrent.atomic in Java
The atomic operation in c++0x
Atomic operation in Intel TBB

4. Reference documents:

[1] The Atomic (atomicity) FAQ about variable operations
[2] Http://en.wikipedia.org/wiki/Atomic_operation
[3] about memory alignment, bit field, etc. – "Linux C programming One-stop learning"
[4] Do you need mutexes to protect an ' int '?
[5] C + + Concurrency in Action
[6] Multithreaded simple data type access and atomic variables

Atomicity of operations in multithreaded programs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.