Linux kernel RCU mechanism detailed

Last Update:2016-11-26 Source: Internet

Author: User

Tags switches

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

RCU (Read-copy Update) is a way of synchronizing data and plays an important role in the current Linux kernel. RCU is mainly for the data object is linked list, the purpose is to improve the efficiency of traversing read data, in order to achieve the purpose of using the RCU mechanism to read data when the linked list does not time-consuming lock operation. This allows multiple threads to simultaneously read the list at the same time and allow a thread to modify the linked list (when modified, it needs to be locked). RCU is suitable for the need to read the data frequently, and the corresponding changes in the data is not a lot of scenarios, such as in the file system, often need to find the location directory, and the directory modification is relatively not many, this is the best scenario for RCU to play a role.

In the Linux kernel source code, the document about RCU is quite complete, you can find these files in the/documentation/rcu/directory. Paul E. McKenney is the main creator of RCU source code in the kernel, and he has written many RCU articles. He put together these articles and some links about RCU's papers. http://www2.rdrop.com/users/paulmck/RCU/

http://1126.www.qixoo.qixoo.com/users/paulmck/RCU/

During the implementation of RCU, we mainly solve the following problems:

1, during the read process, another thread deletes a node. The delete thread can remove the node from the linked list, but it cannot destroy the node directly, and it must wait until all the read threads have finished reading before destroying the operation. RCU This process as grace period.

2, during the reading process, another thread inserts a new node, and the read thread reads the node, it is necessary to ensure that the read node is complete. This involves the publish-subscribe mechanism (Publish-subscribe mechanism).

3, ensure the integrity of the reading list. Adding or deleting a node does not cause the traversal of a linked list to break from the middle. However, RCU does not guarantee that a new node will be read or not read to the node to be deleted.

Grace period

It is easy to understand this content through examples. The following example modifies Paul's article.

struct Foo {

int A;

Char b;

Long C;

};

Define_spinlock (Foo_mutex);

struct Foo *gbl_foo;

void Foo_read (void)

{

Foo *FP = Gbl_foo;

if (fp! = NULL)

DoSomething (Fp->a, Fp->b, fp->c);

}

void Foo_update (foo* new_fp)

{

Spin_lock (&foo_mutex);

Foo *OLD_FP = Gbl_foo;

Gbl_foo = NEW_FP;

Spin_unlock (&foo_mutex);

Kfee (OLD_FP);

}

The procedure above is for the operation of the global variable Gbl_foo. Assume the following scenario. When there are two threads running Foo_ read and foo_update at the same time, when Foo_ read executes the assignment, the thread switches, and the other thread starts to execute the Foo_update and executes the completion. When the Foo_ read run process switches back, the FP has been removed when running dosomething, which will harm the system. To prevent the occurrence of such events, RCU added a new concept called grace period. As shown in the following:

Each row in the diagram represents a thread, and the bottom line is the delete thread, and when it finishes the delete operation, the thread enters the grace period. The meaning of the grace period is that after a delete action occurs, it must wait for all the read threads that have started before the grace period begins to finish before the destroy operation can be performed. The reason for this is that it is possible for these threads to read the element to be deleted. The grace period in the figure must wait for 1 and 2 to end, while the read thread 5 is closed before the grace period starts, no need to be considered, and 3,4,6 does not need to be considered because a thread after the grace period begins cannot read the deleted element. This RCU mechanism provides the appropriate API to implement this function.

void Foo_read (void)

{

Rcu_read_lock ();

Foo *FP = Gbl_foo;

if (fp! = NULL)

DoSomething (FP-&GT;A,FP-&GT;B,FP-&GT;C);

Rcu_read_unlock ();

}

void Foo_update (foo* new_fp)

{

Spin_lock (&foo_mutex);

Foo *OLD_FP = Gbl_foo;

Gbl_foo = NEW_FP;

Spin_unlock (&foo_mutex);

Synchronize_rcu ();

Kfee (OLD_FP);

}

Rcu_read_lock and Rcu_read_unlock are added in Foo_read, which are used to mark the beginning and end of a RCU read process. The effect is to help detect whether the grace period is over.

Foo_update adds a function synchronize_rcu (), calling the function to mean the beginning of a grace period, and the function will not return until the grace period ends. We'll look at the diagram again, threads 1 and 2, before Synchronize_rcu may get the old Gbl_foo, that is, foo_update in the OLD_FP, if not the end of their run, call Kfee (OLD_FP), it is very likely to cause a system crash. and 3,4,6 run after Synchronize_rcu, at this time they have not been able to get OLD_FP, this kfee will not affect them.

The grace period is the most complex part of the RCU implementation, because the performance of data deduplication cannot be too poor while improving the performance of read data.

Subscribe--Release mechanism

Most of the compilers currently used will optimize the code to some extent, and the CPU will make some optimizations to the execution instructions to improve the execution efficiency of the code, but such optimizations sometimes lead to undesirable results. As an example:

void Foo_update (foo* new_fp)

{

Spin_lock (&foo_mutex);

Foo *OLD_FP = Gbl_foo;

New_fp->a = 1;

New_fp->b = ' B ';

new_fp->c = 100;

Gbl_foo = NEW_FP;

Spin_unlock (&foo_mutex);

Synchronize_rcu ();

Kfee (OLD_FP);

}

In this code, we expect that the code for the 6,7,8 line is executed before the 10th line of code. But the optimized code does not guarantee the order of execution. In this case, a read thread is likely to read NEW_FP, but the NEW_FP member assignment has not been completed yet. When a read thread executes dosomething (Fp->a, Fp->b, fp->c), an indeterminate parameter is passed into the dosomething, which is likely to result in undesirable results or even a program crash. By optimizing the barrier to solve the problem, the RCU mechanism wraps the optimization barrier and provides a dedicated API to solve the problem. At this point, line tenth is no longer a direct pointer assignment, but instead should read:

Rcu_assign_pointer (GBL_FOO,NEW_FP);

The implementation of Rcu_assign_pointer is relatively simple, as follows:

#define Rcu_assign_pointer (p, v) \

__rcu_assign_pointer (P), (v), __RCU)

#define __rcu_assign_pointer (P, V, space) \

do {\

SMP_WMB (); \

(p) = (typeof (*V) __force space *) (v); \

} while (0)

We can see that its implementation simply adds an optimization barrier SMP_WMB before the assignment to ensure the order in which the code is executed. The other is the __RCU used in the macro, just as the test conditions for the compilation process to use.

There is also a more robust optimization on the DEC Alpha CPU machine, as shown below:

void Foo_read (void)

{

Rcu_read_lock ();

Foo *FP = Gbl_foo;

if (fp! = NULL)

DoSomething (Fp->a, Fp->b, fp->c);

Rcu_read_unlock ();

}

The fp->a,fp->b,fp->c of line six will run in advance when the 3rd line is not executed, and when he and Foo_update run simultaneously, it may lead to part of the incoming dosomething belonging to the old Gbl_foo, And the other belongs to the new. This causes the error to run the result. To avoid this type of problem, RCU provides a macro to resolve the problem:

#define RCU_DEREFERENCE (P) rcu_dereference_check (p, 0)

#define Rcu_dereference_check (P, c) \

__rcu_dereference_check (P), Rcu_read_lock_held () | | (c), __RCU)

#define __rcu_dereference_check (P, c, space) \

({ \

typeof (*p) *_________p1 = (typeof (*p) *__force) access_once (p); \

Rcu_lockdep_assert (c, "suspicious Rcu_dereference_check ()" \

"Usage"); \

Rcu_dereference_sparse (P, space); \

Smp_read_barrier_depends (); \

((typeof (*p) __force __kernel *) (_________P1)); \

})

static inline int Rcu_read_lock_held (void)

{

if (!debug_lockdep_rcu_enabled ())

return 1;

if (Rcu_is_cpu_idle ())

return 0;

if (!rcu_lockdep_current_cpu_online ())

return 0;

Return Lock_is_held (&AMP;RCU_LOCK_MAP);

}

This code adds debugging information, removing debug information, which can be in the following form (in fact, this is the code in the old version):

#define RCU_DEREFERENCE (p) ({\

typeof (P) _________p1 = p; \

Smp_read_barrier_depends (); \

(_________P1); \

})

Add the Optimization barrier smp_read_barrier_depends () after assignment.

Our previous fourth line of code changed to foo *FP = Rcu_dereference (Gbl_foo), which prevents the above problem.

Integrity of Data Read

Let's illustrate the problem by example:

Before we add a node new to a in the original list, the first step is to point the new pointer to the a node, and the second step is to point the head pointer to new. The purpose of this is that when the insert operation completes the first step, there is no effect on the reading of the linked list, and when the second step is finished, the read thread can continue to traverse the linked list if it reads to the new node. If this process is reversed, the first step head points to new, and a thread reads new, because the pointer to new is NULL, which causes the read thread to fail to read to subsequent nodes such as a, B. From the above process, it can be seen that RCU does not guarantee that the read thread reads to the new node. If the node has an impact on the program, then an external call is required to make the appropriate adjustments. such as in the file system, through the RCU location, if the corresponding node can not be found, will be other forms of search, related content and other analysis to the file system when the narrative.

Let's take a look at the example of deleting a node:

We want to delete B, and all we have to do is point A's pointer to C, keep the pointer to B, and then remove the program into Grace Detection. Because the contents of B are not changed, the thread reading to B can still continue to read the subsequent nodes of B. b cannot be destroyed immediately, it must wait for the grace period to finish before the corresponding destroy operation can be performed. Since the node of a has already pointed to C, all subsequent reads after the grace period have been found by a C, and B has been hidden, and subsequent read threads will not read it. This ensures that after the grace period, removing B does not affect the system.

Summary

The principle of RCU is not complicated, and the application is simple. But the implementation of the code is not so easy, the difficulties are concentrated in the grace period of detection, the subsequent analysis of the source code, we can see some very skilled implementation.

Linux kernel RCU mechanism detailed

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More