On the RCU mechanism in Linux kernel

Source: Internet
Author: User

Transferred from: http://blog.chinaunix.net/uid-23769728-id-3080134.html

The design idea of RCU is more clear, and the method of replacing the new and old pointers can realize the sharing protection of lock-free mode. But to the level of the code, it will be difficult to understand how much. In the 4th chapter of the deep Linux device driver kernel mechanism, the rules behind RCU have been very clearly described, from a higher point of view, because I feel that too much code analysis makes it easier for readers to get lost in detail. Recently received the book, I again carefully read the RCU part of the text, I think it should be supplemented a little bit of content, because some things may not be suitable for writing in the book.

The RCU read side enters the critical section of the flag is called Rcu_read_lock, the code of this function is: 

    1. <include/linux/rcupdate.h>
    2. static inline void Rcu_read_lock (void)
    3. {
    4.         __rcu_read_lock ();
    5.         __acquire (RCU);
    6.         rcu_read_acquire ();
    7. }
The implementation seems to have three function calls, but the substantive work is done by the first function __rcu_read_lock (), __rcu_read_lock () closes the kernel preemption by calling preempt_disable (). However, the interrupt is allowed, assuming that the reader is in the RCU critical section and has just read a pointer p for the shared data area (but has not yet accessed the data member in P), an interrupt occurs, and the interrupt handler routine ISR needs to modify the data area pointed to by P, according to the RCU design principle, The ISR will be assigned a new data area of the same size new_p, and then copy the data from the old data area p to the new data area, then the work of data modification on the basis of new_p (because it is modified in new_p space, so there is no concurrent access to P, so RCU is a lock-free mechanism, The reason is here), when the ISR completes the work of updating the data, it assigns new_p to P (p=new_p) and finally registers a callback function to release the old pointer p at the appropriate time. Therefore, if all references on the old pointer P are finished, releasing p will not be a problem. When the interrupt processing routines are done returning, the interrupted process will still have access to the data on the P-space, which is the old data, and the result is allowed by the RCU mechanism. RCU rules allow for short-lived resource view inconsistencies between readers and writers due to pointer switching .

The next interesting question about RCU is: When to release the old pointer. I've seen many of the answers to this in the book: a process switch occurs on all processors in the system. This stylized answer often makes the reader of the RCU mechanism feel confused, why not wait for a process switch on all processors to invoke the callback function to release the old pointer? This is actually determined by the RCU design rules:  all references to old pointers can only occur in the critical sections that are included in the Rcu_read_lock and Rcu_read_unlock, and process switching cannot occur in this critical section . And once out of the critical section there should be no longer any form of reference to the old pointer p . Obviously, this rule requires that the reader cannot have a process switch in the critical section, because once a process switches, the callback function that releases the old pointer is likely to be called, causing the old pointer to be freed, and it is possible to refer to a freed memory space when the process being switched off is re-dispatched.

Now we see why rcu_read_lock only needs to turn off kernel preemption, because it makes it impossible for the current process to be removed even if there is an outage in the critical section.  The kernel developer, to be exact, RCU's designers can only do this to the point . Next is the user's responsibility, if a function is called in the critical section of the RCU, the function may sleep, then the RCU design rule is destroyed and the system enters an unstable state.

This again shows that if you want to use a thing, be sure to figure out its intrinsic mechanism, like the example mentioned above, even if the current program does not have problems, but the system left a hidden danger like a time bomb, can be detonated at any moment, especially after a long time problem suddenly burst out. In most cases, it may take much more time to find a problem than to be able to understand RCU carefully.  

Readers in the RCU have higher degrees of freedom than the Rwlock readers. Because RCU readers do not need to take into account the writer's feelings when accessing a shared resource, this differs from the writer of Rwlock, and Rwlock reader needs to ensure that no writer is manipulating the resource while reading the shared resource. the difference between the two stems from the RCU of the shared resource between the reader and the writer, while the reader and the writer of the Rwlock use only one copy of the shared resource from beginning to end . This also means that the writer in the RCU takes more responsibility, and there must be some kind of mutex between the multiple writers who update the same shared resource, so RCU's claim to be a "lock-free mechanism" is limited to between the reader and the writer. So we see that the RCU mechanism should be used in situations where there are a large number of read operations, while the update operation is relatively small. At this point, the RCU can greatly improve the system performance, because the RCU read operation compared with other locking mechanisms, the cost of the lock is almost no.

In practice, shared resources often exist in the form of lists, and the kernel implements several interface functions for list operations in RCU mode, and readers and users should use these kernel functions, such as LIST_ADD_TAIL_RCU, List_add_rcu,hlist_replace _rcu and so on, specific use can refer to some kernel programming or device driver data.  

In terms of releasing old pointers, the Linux kernel provides two ways for users to use it, one to call Call_rcu and the other to call Synchronize_rcu. The former is an asynchronous way, Call_rcu the callback function that releases the old pointer into a node, and then joins that node in the local list of the processor that is currently running CALL_RCU, in the SOFTIRQ part of the clock interrupt (RCU_SOFTIRQ), RCU Soft Interrupt Handler rcu_process_callbacks checks to see if the current processor has gone through a sleep period (quiescent, where the kernel process scheduling is involved), RCU's kernel code implementation determines that all processors in the system have experienced a dormant period (meaning that a process switch has occurred on all processors, so the old pointer can be safely freed at this time), and the callback function provided by CALL_RCU will be called.
The implementation of SYNCHRONIZE_RCU uses the wait queue, and in its implementation it also adds a node to the local list of the current processor in the same way as CALL_RCU, unlike Call_rcu, where the callback function in the node is WAKEME_AFTER_RCU. Then Synchronize_rcu will sleep in a waiting queue until all processors in the system have a process switch, so WAKEME_AFTER_RCU is called by Rcu_process_callbacks to wake up the synchronize of sleep _rcu, after being awakened, Synchronize_rcu knows it can now release the old pointer.

So we see that after CALL_RCU returns, its registered callback function may not have been called, so it means that the old pointer has not been released, and the old pointer is definitely released after Synchronize_rcu returns. Therefore, it is called call_rcu or SYNCHRONIZE_RCU, depending on the specific requirements and the current context, such as the context of the interrupt processing must not use the SYNCHRONIZE_RCU function.  

(original www.embexperts.com forum, slightly changed here)

On the RCU mechanism in Linux kernel

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.