Compare and Swap (CAS) for lock-free multi-producer

Source: Internet
Author: User

1. CAS principle

Compare and swap, a mechanism that solves the performance loss caused by the use of locks in multithreaded parallel situations, the CAS operation consists of three operands-the memory location (V), the expected original value (A), and the new value (B). If the value of the memory location matches the expected original value, the processor automatically updates the location value to the new value. Otherwise, the processor does nothing. In either case, it will return the value of that location before the CAS directive. CAS effectively explains that "I think position v should contain a value of a; if it is included, B is placed in this position; otherwise, do not change the position, just tell me the current value of this position."
When a read-write thread exists at the same time, it is not guaranteed to be thread-safe by default, and therefore requires the use of semaphores for thread Synchronization (synchronization), such as critical code snippets, mutexes, etc., while the operating system provides the appropriate APIs. However, synchronization is not always satisfying and efficient, such as loss of performance, deadlocks, live locks, and wasted resources when trapped in the kernel.

So the thought of Lock-free and Wait-free appeared, because there is no read-write thread synchronization at this time, so when the write thread runs, the read thread is also running (multi-core two threads are scheduled to run on different cores), and the code volume is reduced, the program runs faster. And this idea is achieved through the CAS mechanism, as follows

template<typename T>bool CAS(T* ptr, T expected, T fresh){    if(*ptr != expected)         returnfalse;    *ptr = fresh;    returntrue;}

The principle of CAs is to compare the old values to a desired value, and if they are equal, update the old value, type T = {char, short, int, __int64, ...} And so on, as well as the pointer (pointer to any type).
Note that the CAS here only illustrates the principle, not the actual source code implementation, the specific implementation please refer to the operating system.
In the Windows API, there are a lot of atomic operations (Atomic Operatoration), such as InterlockedCompareExchange, a series of interlocked functions, from the assembly point of view, Intel's XCHG instruction allows the exchange of data (registers and memory data exchange) to be done in a single clock cycle, using a method that references InterlockedCompareExchange's disassembly code.
Consider a situation where there are multiple read threads and one write thread, and when using the synchronous method, it is likely that the write thread does not immediately acquire the lock, and the worst case is that the write thread will never get the lock, that is, into the live lock state. However, using the CAs method allows the read-write thread to run in parallel, and when the write thread is updated to the new shared data, the reading thread can instantly read the updated data.

class Widget{    Data* p_;    ...    ...... }    void Update() {    Data * pOld, * pNew = new Data;    do    {    pOld = p_;    ...    }while (!CAS(&p_, pOld, pNew));    }};

But then there is the question of when the update function deletes the old data, because there is a good chance that other read threads are using the old data. This is not a problem for locales such as Java, where there is an automatic memory recovery (GC) mechanism, but for environments with no GC-like environment, such as C + +, the recycling of old data can be a tricky issue.
There are, of course, many solutions, which are the most interesting and discussed issues in the CAS mechanism, and the methods are different under different conditions.

2. Realize lock-free multi-producer
struct node{struct node *next;int data;}struct node *queue;//队列头

Multiple consumers (multithreading) need to insert data into this queue

To illustrate the complexity of the problem, it is very simple to insert a queue when you first look at the situation where there is only one consumer:
STEP1) New_head->next = queue->head;
STEP2) Queue->head = New_head;
With multi-threading, the problem becomes complex, with step 2 as an example, where multiple threads may do this at the same time, so the results are unpredictable.

Workaround 1) If any thread acquires a lock before Step1, and then releases the lock after step 2 is completed, this is the simplest method, but the performance overhead of the lock can also be considered as an improved method.

One of the better ideas is:

Confirm that the other producer is not changing the head of the queue before each operation, and that if no other producer is operating, the current producer can operate.

do{                               =queue->head;          new_head->= old_head;       if==queue->head){      queue->= new_head;       }                              }while(queue->!= new_head)

The termination condition of the Italian loop:

When Queue->head equals New_head, it indicates that the producer has successfully manipulated the queue

Otherwise, there are other producers in this round who are operating the queue, and then try again until successful

This seems to ensure that only one producer is working on the queue (other producers), and now the problem is that lines 4th and 5th do not guarantee atomic execution, which means that 4 conditions exist for multiple threads, followed by execution, and there is an error.

How to solve this problem? 4 and 5 If atomic execution is possible, the problem is largely solved. Fortunately, CPUs of different architectures provide cas/cmpxchg-like instructions to ensure the atomic nature of operations

The following C code illustrates the meaning of CAs (the code here is schematic, the actual instruction is atomic)

int compare_and_swap (intintint newval) {  int*reg;  if (old_reg_val == oldval)      *reg = newval;  return old_reg_val;}

With this command, the code above can be rewritten as a multi-producer security

do{                                              =queue->head;                        new_head->= old_head;                     val=cmpxchg(&queue->head, old_head, new_head); }while(val!=old_head)

Notice the judging condition of the cyclic termination:
When val = = Old_head, no producer changes the queue header between 3 and 4 steps, the operation has succeeded
When val! = Old_head, the description already has 3, 4 steps between the other producers have already manipulated the queue, at which point the current producer needs to retry the operation of the queue
Why put 2, 3 steps into the inside of the loop? To illustrate this, what if you put 2, 3 out of the loop? Assuming the first round of other producers to operate the queue, we need to re-come, re-obsolete, Queue->head has been updated by other threads, if placed outside the loop, Old_head can not update, and Val will return to the new head, this time to judge the condition will always fail, Lead to a dead loop.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Compare and Swap (CAS) for lock-free multi-producer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.