Using RCU technology to implement read-write threads without locks

Source: Internet
Author: User
Tags exit in

In a system there is a write thread and several read threads, and the read-write thread uses a pointer to share a data structure, the write thread overwrites the structure, and the read thread reads the structure. In the process of rewriting this data structure by a write thread, the read thread will increase in lock-up time because of waiting for the lock.

You can take advantage of the idea of RCU (Read Copy Update What's RCU) to remove this lock. The main implementation code mentioned in this article: Gist

RCU

RCU can be said to be an alternative to read and write locks. It is based on the fact that when a write thread changes a pointer, the read thread gets the pointer, either fetching the old value or acquiring a new value. The basic idea of RCU is actually very simple, and it is easy to understand what RCU toy implementation. A simple RCU process can be described as:

Write Thread:

Old_ptr = _ptrtmp_ptr = Copy (_ptr)     //Copychange (TMP_PTR)          //Change _ptr = tmp_ptr           //Updatesynchroize (TMP_ ptr

When the write thread updates the point, it _ptr copies a new copy, updates the pointer based on the new changes, and _ptr finally releases the old memory.

Read thread:

Tmp_ptr = _ptruse (tmp_ptr) dereference (tmp_ptr)

Read threads are used directly _ptr , and you need to tell the write thread to no longer use it when you are done _ptr . When a read thread is acquired _ptr , it may get to the old or it may get to the new, regardless of which RCU need to ensure that this block of memory is valid. Emphasis on synchroize and dereference . synchroizewill wait for all the old _ptr threads to be used dereference , and for the new _ptr user it does not need to wait. The question is plainly how a write thread knows that old_ptr no read thread is in use and can be safely freed.

This problem actually wait-free has a number of solutions in the various implementations, HOW-WHEN-TO-RELEASE-MEMORY-IN-WAIT-FREE-ALGORITHMS here someone summed up several methods, such as Hazard pointers Quiescence period based reclamation .

Simply using reference counting smart pointers is not a solution to this problem because the smart pointers themselves are not thread-safe, for example:

Tmp_ptr = _ptr      //1tmp_ptr->addref ()   //2usetmp_ptr->release ()

The code 1/2 line is not atomic, so when tmp_ptr you get ready addRef , it tmp_ptr may be just released.

Quiescence period based reclamationMethod means that the read thread needs to declare itself in, that is, when Quiescence period it is not in use _ptr , when it _ptr is actually entering a logical critical section when all the read threads are no longer _ptr in use, the write thread can safely release the memory.

This paper describes a kind of Quiescence period based reclamation implementation. This implementation can be used for scenarios where a write thread and multiple read threads share several data.

Realize

This method essentially decomposes the data into basic memory unit reading and writing. The usage can be described as:

Read thread:

Tmp_ptr = _ptruseupdate ()//identify yourself no longer using any shared data

Write Thread:

Old_ptr = _ptrtmp_ptr = Copy (_ptr) change (tmp_ptr) _ptr = TMP_PTRGC () defer_free (OLD_PTR)

The following describes the implementation of a read-write thread.

Write thread

The write thread is responsible for identifying the memory that needs to be freed, and for checking when memory can actually be freed. It maintains a free memory queue:

void *_pending[8]    uint64_t _head, _tail    void Defer_free (void *p) {        _head + +        _pending[pending_pos (_ Head)] = P    }    GC () {for        (_tail-Find_free_pos ()) Free            (_pending[_tail])    }

find_free_posFind a free memory location in which [_tail, find_free_pos()) all memory can be safely released.

The position of the queue _head/_tail has been increasing, that is, the location of the PENDING_POS model, limited to the size of the queue is also feasible, either way, _head logically >=_tail , but in practice may be less than _tail , so the implementation does not use size determination, but:

GC () {        pos = Find_free_pos () while        (_tail! = pos) {free            (_pending[pending_pos (_tail)))            _tail + +        }    }
Read thread

When a read thread is no longer using shared memory, it identifies itself:

Update () {        static __thread int tid        _tmark[tid] = _head    }

The state of a read thread affects the recovery logic of the write thread, and its state is divided into:

    • Initial
    • Active, will be called toupdate
    • Pause, sync elsewhere, or be suspended
    • Exit

When a read thread is active, it constantly updates itself to free memory locations ( _tmark[tid] ). The write thread checks all read threads _tmark[tid] , which are [_tail, min(_tmark[])) no longer used by all read threads and can be safely freed.

Find_free_pos () {        min = max_integer        pos = 0 for        (tid = 0; tid < max_threads; ++tid) {            TPOs = _tmark[t ID]            offset = tpos-tail            if (offset < min) {                min = offset                pos = TPOs            }        }        return pos< c19/>}

When a read thread pauses, it _tmark[tid] may not be updated for a long time, which prevents the write thread from releasing memory. Therefore, a method is required to identify whether the read thread has entered a paused state. _tfreeds[tid]identifies where the current memory of each thread is freed by setting a last-freed memory location. If a thread is in a paused state, after a certain amount of time _tfreeds[tid] == _tmark[tid] . When looking for a disposable location, you need to ignore the read thread for the paused state:

Find_free_pos () {        min = max_integer        pos = _head for        (tid = 0; tid < max_threads; ++tid) {            TPOs = _tmark [Tid]            if (TPOs = = _tfreeds[tid]) Continue            offset = Tpos-tail            if (offset < min) {                min = offset                pos = tpos<  c12/>}        } for        (tid = 0; tid < max_threads; ++tid) {            if (_tfreeds[tid]! = _tmark[tid])                 _tfreeds[tid] = pos        }        return pos    }

However, when all threads are in a paused state, the write thread may still be working, and the implementation above will return _head , at which point the write thread can still free memory normally.

Summary , the principle of the method can be expressed:

Thread Dynamic Increase/decrease

If the read thread is likely to exit in the middle of a dynamic increase, then _tmark[] it needs to be reused, when tid the thread's allocation is adjusted to dynamic:

Class Threadidpool {public    :        ///dynamically gets a thread tid, each time a thread calls that interface returns the same value        int get ()        //thread exits when the Tid        void put is reclaimed ( int id)    }

ThreadIdPoolImplementation is nothing more than using TLS and being notified when a thread exits to reclaim Tid. Then the implementation for the read thread update becomes:

Update () {        tid = _idpool->get ()        _tmark[tid] = _head    }

When a thread exits, _tmark[tid] and _tfreeds[tid] does not need to do any processing, when the newly created thread is reused tid , it can be reused immediately and the _tmark[tid] _tfreeds[tid] 2 values must be equal at this time.

Above, is the implementation of the entire method.

Thread-readable writable

The above method is not suitable for the scenario or general. There is a simple but instructive implementation (RCU.C) in the NBDs Project (toy project, which implements some lock-free data structures). The implementation supports any thread defer_free , all threads update . updateIn addition to declaring that any shared memory is no longer used, it is possible to reclaim memory. Any thread may maintain some memory to be freed, and any piece of memory may be used by any other thread. So how is it that memory is recycled?

The method described in this article is for all read threads to declare themselves and then be actively checked by the write thread. Unlike this approach, the implementation of NBDs is based on a method of notification diffusion . This approach works in such a way that:

When a thread tries a memory reclaim, it needs to know the idle location of all other threads (the equivalent _tmark[tid] ), which notifies the next thread of the scope I need to release. When the next thread update (leaving the critical section), it will continue to tell the next thread the notification of the last thread until the last notification returns to the initiating thread. So for the initiating thread, this release request goes through all the threads and gets everyone's approval to release it safely. Each thread works in such a way.

void Rcu_defer_free (void *x) {        ...        RCU_[NEXT_THREAD_ID][TID_] = rcu_last_posted_[tid_][tid_] = pending_[tid_]->head;        ...    }    void Rcu_update (void) {        ...        for (i = 0; i < Num_threads_; ++i) {            ...                 uint64_t x = Rcu_[tid_][i]; Other lines Cheng give themselves notice            rcu_[next_thread_id][i] = rcu_last_posted_[tid_][i] = x;//Spread out            ...        }        ...        while (q->tail! = Rcu_[tid_][tid_]) {free        }             ...    }

This implementation is relatively simple, does not support thread pauses, and threads dynamically increase and decrease.

Original address: http://codemacro.com/2015/04/19/rw_thread_gc/
Written by Kevin Lynx posted athttp://codemacro.com

Use RCU technology to implement read-write threads without locks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.