Reprint: Implementation of the lock-free queue (CAS synchronization)

Source: Internet
Author: User
Tags lock queue ibm developerworks

Transferred from: http://coolshell.cn/articles/8239.html

About the implementation of the lock-free queue, there are many articles on the Internet, although this article may be and those articles have been repeated, but I still want to in my own way to the important knowledge of these articles strung together with you to talk about this technology. The text begins below.

About atomic operations such as CAs

Before we start talking about a lock-free queue, we need to know that a very important technology is CAS operations--compare & Set, or Compare & Swap, now almost all CPU instructions support CAS atomic operations, X86 the corresponding is CMPXCHG assembly instructions. with this atomic operation, we can use it to implement a variety of lock free data structures.

This operation is described in C as follows: (the code comes from Wikipedia's compare and Swaps) meaning, take a look at the memory *reg value is not oldval, if so, it is assigned to newval.

1234567 int compare_and_swap ( int * Reg, int oldval, int newval) {    int old_reg_val = *reg;    if (Old_reg_val = = oldval)       *reg = newval;    return old_reg_val;

This operation can be variant to return a bool value (the benefit of returning a bool value is that the caller knows if there is a successful update):

12345678 bool compare_and_swap (int *accum, int *dest, int newval){  if ( *accum == *dest ) {      *dest = newval;      return true;  }  return false;}

Similar to CAS, there are the following atomic operations: (These things you see Wikipedia yourself)

    • Fetch and ADD, typically used to do +1 atomic operations on variables
    • Test-and-set, writes a value to a memory location and returns its old value. Assembly instruction BST
    • Test and Test-and-set for low-test-and-set resource contention

Note: in the actual C + + program, the various implementation versions of CAs are as follows:

1) GCC CAs

Atomic operation with CAs supported in version gcc4.1+ (full atomic operation can be see GCC Atomic builtins)

12 bool__sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)

2) CAs for Windows

Under Windows, you can use the following Windows API to complete the CAS: (full Windows atomic operations can be see MSDN interlocked Functions)

123 interlockedcompareexchange   ( __inout long volatile *target,                                    __in long exchange,                                   __in long Comperand);

3) CAs in c++11

The functions of the atomic class in the STL in C++11 allow you to cross the platform. (Full c++11 atomic operation can be see Atomic operation Library)

123456 template< class T >bool atomic_compare_exchange_weak( std::atomic<T>* obj,                                   T* expected, T desired );template< class T >bool atomic_compare_exchange_weak( volatile std::atomic<T>* obj,                                   T* expected, T desired );
Chain list implementation without lock queue

The following is mainly from John D. Valois, a paper at the October 1994 International Conference on Parallel and distributed systems systems at Las Vegas, implementing Lock-free Queues.

Let's take a look at the way in which the queue is implemented with CAS:

12345678910111213 EnQueue(x) //进队列{    //准备新加入的结点数据    q = new record();    q->value = x;    q->next = NULL;    do {        p = tail; //取链表尾指针的快照    } while( CAS(p->next, NULL, q) != TRUE); //如果没有把结点链在尾指针上,再试    CAS(tail, p, q); //置尾结点}

We can see that the Do-while Re-try-loop in the program. That is, it is possible that when I was preparing to join the node at the end of the queue, the other thread was already successful, so the tail pointer changed, and my CAs returned false, so the program tried again until the test was successful. It's a lot like the nonstop replay of our phone-call hotline.

You will see why our "Tail node" operation (line 12th) does not determine whether it succeeds because:

    1. If there is a thread T1, if the CAs in its while are successful, then all the other subsequent threads of the CAS will fail, and then they will be recycled.
    2. At this point, if the T1 thread has not updated the tail pointer, the other threads continue to fail because Tail->next is not null.
    3. Until the T1 thread finishes updating the tail pointer, a thread in the other thread can get a new tail pointer and continue to go down.

There is a potential problem-if the thread stops or hangs before the T1 thread updates the tail pointer with CAs, the other threads go into a dead loop . The following is an improved version of the Enqueue ()

123456789101112131415 EnQueue(x) //进队列改良版{    q = new record();    q->value = x;    q->next = NULL;    p = tail;    oldp = p    do {        while (p->next != NULL)            p = p->next;    } while( CAS(p.next, NULL, q) != TRUE); //如果没有把结点链在尾上,再试    CAS(tail, oldp, q); //置尾结点}

We let each thread fetch its own pointer p to the end of the list. However, such a fetch can affect performance. and the actual situation to see down, 99.9% of the situation will not be wired to stop the situation, so, better practice is that you can join the above two versions, if the number of times retry a value (for example, 3 times), then, you fetch the pointer.

Well, we've solved the enqueue, let's take a look at Dequeue's code: (very simply, I won't explain it)

12345678910 DeQueue() //出队列{    do{        p = head;        if (p->next == NULL){            return ERR_EMPTY_QUEUE;        }    while( CAS(head, p, p->next) != TRUE );    return p->next->value;}

As we can see, Dequeue's code operates on Head->next, not the head itself. This is because of a boundary condition, we need a dummy head pointer to solve the list if there is only one element, head and tail point to the same node problem, so enqueue and dequeue to mutually exclusive .

Note: The tail is in the pre-update state.

The ABA problem in CAs

The so-called ABA (see Wikipedia's ABA Entry), the problem is basically like this:

    1. Process P1 reads a value of a in a shared variable
    2. P1 was preempted, process P2 executed.
    3. P2 the value of the shared variable from a to B, and then to a, at this time by P1 preemption.
    4. P1 returned to see that the value in the shared variable was not changed, and then continued.

Although P1 thought that the value of the variable did not change, it continued to execute, but this raises some potential problems. The ABA problem is most likely to occur in the lock free algorithm, with CAs taking the brunt because CAS determines the address of the pointer. If this address is reused, the problem is huge. (address reuse occurs very often, a memory allocation is released, redistributed, most likely the original address)

such as the dequeue () function above, because we want to separate head and tail, so we introduced a dummy pointer to head, when we do the CAS, if the head of the memory is recycled and reused, and the memory of Reuse is Enqueue () came in, This is going to be a big problem. ( reusing memory in memory management is basically a very common behavior. )

You may not understand this example, and Wikipedia gives a living example

You took a suitcase full of money at the airport, and here comes a hot sexy beauty, and then she is very warm tease you, and when you do not notice, the same suitcase and your full of money in the box to a bag, and then left, you see your suitcase is still there, So he was carrying a suitcase to catch the plane.

This is the problem of ABA.

Solving the problem of ABA

Wikipedia gives a solution-using Double-cas (double insurance CAS), for example, on a 32-bit system, we want to check 64-bit content

1) Check the double-length value once with CAs, the first half is the pointer, and the second part is a counter.

2) Only these two are the same, only to pass the check, to assign a new value. and add 1 to the counter.

This way, the ABA occurs, although the value is the same, but the counter is different (but on a 32-bit system, this counter will overflow back and start from 1, this will still have the problem of ABA)

Of course, our problem with this queue is not to let that memory reuse, so clear business problem is better solved, the paper "Implementing Lock-free Queues" gives a method-- using node memory reference count refcnt!

1234567891011121314151617 SafeRead(q){    loop:        p = q->next;        if (p == NULL){            return p;        }         Fetch&Add(p->refcnt, 1);        if (p == q->next){            return p;        }else{            Release(p);        }    goto loop;}

The Fetch&add and release points are both reference counts and subtract reference counts, all atomic operations, which prevents the memory from being recycled.

Implementing a lock-free queue with arrays

This realization comes from the thesis "Implementing Lock-free Queues"

Using arrays to implement queues is a very common method, because without the memory of the division and release, everything will become simple, the idea of implementation is as follows:

1) Array queue should be an array in ring buffer form (ring array)

2) The elements of the array should have three possible values: Head,tail,empty (and, of course, the actual data)

3) Initially all arrays are initialized to empty, and two adjacent elements are initialized to head and tail, which represents an empty queue.

4) Enqueue operation. Assuming that the data x is going into the queue, locate the TAIL location, and use the Double-cas method to update (TAIL, EMPTY) to (x, TAIL). Note that if you cannot find (TAIL, EMPTY), the queue is full.

5) dequeue operation. Position head, update (head, X) to (EMPTY, head) and return X. It is also important to note that if X is tail, the queue is empty.

One key to the algorithm is how to position head or tail?

1) We can declare two counters, one to count the number of enqueue, and one to count the number of dequeue.

2) The two calculators use fetch&add to accumulate atoms and accumulate when the enqueue or dequeue are complete.

3) After summing up to find a model, you can know the location of tail and head.

As shown in the following:

Summary

The above is basically all the technical details of the lock-free queue, which can be used in other lock-free data structures.

1) The lock-free queue is mainly implemented by CAS, FAA atomic operations, and Retry-loop.

2) for Retry-loop, I personally feel and lock what what two. Only this "lock" size is smaller, mainly "lock" head and tail these two key resources. Rather than the entire data structure.

There are also some articles for lock free and you can go to see:

    • Xiongwen "Yet another implementation of a Lock-free circular array queue" on Code Project
    • Herb Sutter "Writing lock-free Code:a corrected Queue" – c++11 template with Std::atomic.
    • IBM Developerworks's design of concurrent data structures that do not use mutexes
Note: I am equipped with a look-free bike, meaning-if you do not have a special lock, then self-locking themselves!

(End of full text)

Reprint: Implementation of the lock-free queue (CAS synchronization)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.