Atomic atomic operations in the development of C++11Nicol's BlogOriginal https://taozj.org/2016/09/C-11%E5%BC%80%E5%8F%91%E4%B8%AD%E7%9A%84Atomic%E5%8E%9F%E5%AD%90%E6%93%8D%E4%BD%9C/ThemeC + +
Atomic operations are often used in multi-threaded development, such as in counters, sequence generator and other places, such cases of data have concurrency risk, but with the lock to protect and appear to be a bit wasteful, so the atomic type operation is very convenient.
Atomic operations are simple to use, but their backgrounds are far more complex than we thought. The main reason is that the modern computing system is too complex: multiprocessor, multicore processors, processors and core-unique and core-shared multilevel caches, in which case one core modifies a variable, and when other cores are visible is a very serious issue. At the same time in the extreme most performance-seeking era, processors and compilers tend to behave very smart, for extreme optimization, such as what disorderly order execution, command rearrangement, although can be in the current context to do a good optimization, but placed in a multi-core environment will often lead to new problems, it is necessary to prompt the compiler and the processor some kind of hint , telling you that the order of execution of some code cannot be optimized.
So here's the atomic operation, which basically contains the semantics we care about in our three areas: the operation itself is indivisible (atomicity), when one thread's operation on one data is visible to another (Visibility), and whether the order of execution can be re-ordered (ordering).
First, Legacy GCC __sync
It is said that before the c++11 standards come out, we all criticized the C + + standard does not have a clear memory model, with the popularization of multi-threaded development of this problem appears more and more urgent. Of course, the implementation of each C + + compiler is fragmented, gcc nature is a pragmatic, so according to Intel's development manual has been developed a series of __sync atomic operation function set, which is the majority of programmers familiar with the common operation of the bar, listed as follows:
Type __sync_fetch_and_op (Type *PtrType value, ...)type __sync_op_and_fetch (type * ptr, type value, ...) Bool__sync_bool_compare_and_swap (type *ptr, type oldval, type newval, ...) type __sync_val_compare_and_swap (type *ptr, type oldval, type newval, ...) __sync_synchronize (...) type __sync_lock_test_and_set (type * ptr, type value, ...) Void__sync_lock_release (type *ptr, ...)
The OP operations above include the common mathematical operations of Add, sub, or, and, XOR, NAND, while type represents the type of data that Intel officially allows for signed and unsigned types of int, long, long long, but GCC expands to allow arbitrary 1/2/4 /8 scalar type; The CAS operation has two versions that return bool for success, and the other one returns the value stored at the PTR address before the operation; __sync_synchronize directly into a full memory barrier, Of course you may also often see things like asm volatile (""::: "Memory"); The preceding atomic operations are all full barrier types, which means that the instructions for any memory operation are not allowed to be reordered across these operations.
The __sync_lock_test_and_set is used to write value to the location of PTR, while returning the value stored before PTR, whose memory model is acquire barrier, meaning that after the operation The store directive does not allow the operation to be re-queued, but the memory store before the operation can go after that operation, while the __sync_lock_release is more like a release of the previous operation lock, which usually means writing 0 to the location of PTR, The operation is release barrier, which means that the previous memory store is globally visible and all memory load has been completed, but the subsequent ram reads may be sorted before the operation is performed. Can be compared here, translation is also a bit of a mouthful, but as far as I see, many of these are used in spin-lock similar operations, such as:
Staticvolatileint_sync; Staticvoidlock_sync () {While(__sync_lock_test_and_set (&_sync,1));} Staticvoidunlock_sync () {__sync_lock_release (&_sync);}
In fact, the 1 here can be any Non-zero value, mainly as a bool effect.
Second, the memory model in c++11 new standard
The above gcc kind of full barrier operation is really effective, but just as the system kernel from a single core switch to multicore with a large particle lock as simple rough, not to mention this situation in the compiler and the processor can not be optimized, the light to the variable to make it visible to his processor, you need to do hardware-level synchronization between processing, is obviously very resource-intensive. The memory model particles specified in the new C++11 standard are more granular, and if you are familiar with these memory models, you can minimize the impact on performance while keeping the business right.
The generic interface for atomic variables is accessed using store () and load (), which can accept an additional memory order parameter without passing the word default to the strongest mode sequentially consistent.
The memory model under the new standard can be divided into the following categories, depending on the intensity of synchronization requirements between the executing threads:
2.1 Sequentially consistent
The model is the strongest synchronization mode, with the parameters represented as STD::MEMORY_ORDER_SEQ_CST and the default model.
1if (x.load () = =2) x.store (assert (y = =1)
For the example above, even if x and Y are irrelevant, typically the processor or compiler may rearrange its access, but in SEQ_CST mode, all memory before X.store (2) Accesses will happens-before in this store operation.
Another angle: For operations in SEQ_CST mode, the rearrangement of all memory accesses operations does not allow cross-domain operation, and this restriction is bidirectional.
2.2 Acquire/release
The GCC wiki may not be very clear, so look at the following examples of typical acquire/release use:
std::atomic<int> a{0};intb =1;a.store (2-While(A.load (memory_order_acquire)! =1) /*waiting*/; std::cout<< b <<' \ n ';
Without a doubt, if it is SEQ_CST, then the above operation must be successful (print variable b is shown as 1).
A. Memory_order_release guarantees that the memory accesses before this operation will not be re-queued, but the memory accesses after this operation may be re-queued to this operation. Usually this is mainly used to prepare some resources before, by store+memory_order_release the way "Release" to other threads;
B. Memory_order_acquire guarantees that the memory accesses after this operation will not be re-queued to this operation, but the memory accesses before this operation may be re-queued to the operation. It is common to judge or wait for a resource through load+memory_order_acquire, and once a condition is met, it is safe to consume these resources "acquire".
2.3 Consume
This is a more relaxed memory model than acquire/release, and the non-dependent variable also removes the Happens-before limit, reducing the amount of data that needs to be synchronized to speed up execution.
2-t = P.load (memory_order_acquire); 1&& m = =3-t = p.load (memory_order_consume); 1&& m = =1);
The Assert for thread 2 passes, and the Assert for thread 3 may fail, because N appears in the store expression and is a dependency variable that ensures that the memory of the variable access is happends-before before the store. But M has no dependency, so the variable is not synchronized and its value is not guaranteed.
Comsume mode because it reduces the number of synchronization between the hardware, so it is theoretically executed faster than the memory model block above, especially in the case of large volume of shared memory data, there should be more obvious differences show.
Here, Acquire/consume~release's mechanism for synchronizing collaboration between threads is completely exposed, often forming acquired/consume to wait for a state update of release. It is important to note that such communications require a pair of two threads to make sense, and that there is no effect on third-party threads that do not use this memory model.
2.4 Relaxed
In the most relaxed mode, memory_order_relaxed does not have happens-before constraints, and the compiler and processor can do any re-order to memory access, so that another thread cannot make any assumptions about it. The only guarantee that this pattern can be made is that once the thread has read the latest value of the variable var, the thread will no longer see the value before the Var modification.
This is usually the case when an atomic variable is required, but it is not used to synchronize shared data between threads, and when relaxed saves a data, another thread will need a time to relaxed read the value, and the cache must be refreshed on a non-cache-consistent architecture. At the time of development, if your context does not share variables that need to be synchronized between threads, choose relaxed.
2.5 Summary
See here, your atomic atomic operation, should not only stay at the indivisable level, because all the memory model can ensure that the changes to the variables are atomic, c++11 the new standard of the atom should rise to the thread of data synchronization and collaboration between the problem, The relationship with the previous lockfree is also relatively close.
The manual also warns novice programmers: Unless you know what this is, you need to reduce the coupling between threads of atomic context synchronization to increase execution efficiency, before considering the memory model here to optimize your program, or you can honestly use the default MEMORY_ORDER_SEQ_CST, Although the speed may be slower, but more secure, in case of your immature optimization caused by problems, it is difficult to debug.
Third, c++11 GCC __atomic
After GCC implements the c++11, the __sync series operation becomes legacy and is not recommended, and the new atomic Operation interface based on C++11 uses __atomic as the prefix.
For the ordinary mathematical operation function, its function interface form is:
__atomic_op_fetch (type *val,intmemorder); __ATOMIC_FETCH_OP (type *val,intmemorder);
In addition, a number of new interfaces are provided according to the new standards:
Type__atomic_load_n (Type *PtrIntmemorder); Void__atomic_store_n (Type *PtrTypeValIntmemorder);Type__atomic_exchange_n (type *ptr, type val,intmemorder); Bool__atomic_compare_exchange_n (type *ptr, type *expected, type desired, boolweak,intsuccess_memorder, Intfailure_memorder); Bool__atomic_test_and_set (void*ptr,intmemorder); void__atomic_ Clear (Bool*ptr,intmemorder); void__atomic_thread_fence (Intmemorder); Bool__atomic_always_lock_free (size_tsize,< Span class= "Hljs-keyword" >void*ptr) bool__atomic_is_lock_free (Size_tsize,void*ptr);
From the function name, it seems clear that the meaning of the above with _n suffix version if the removal of _n is not provided Memorder SEQ_CST version. The last two functions, is to determine whether the system on a certain length of the object will produce Lock-free atomic operation, the general long long such 8 bytes is no problem, for the support of 128-bit shaping of the framework can reach 16 bytes of the lock-free structure.
Boost.asio here is not listed, but some of the examples are better, based on the memory model of the Wait-free ring buffer, producer-customer example, you can go to see.
Reference documents
Chapter 45. Boost.atomic
Chapter 5. Boost.atomic
6.52 built-in Functions for Memory Model Aware Atomic Operations
6.51 Legacy __sync built-in Functions for Atomic Memory Access
Concurrent programming the fast and dirty way!
N3337.pdf
GCC Wiki on atomic synchronization
Atomic atomic operations in the development of C++11