Analysis of Linux RCU lock mechanism

Source: Internet
Author: User
Tags garbage collection prev openvswitch

Tag: Perform a return server note on-line CAL understanding content Initialization

Openvswitch (OVS) source code of Linux RCU lock mechanism analysisCategory: Linux kernels |   Tags: cloud computing, Openvswitch,linux kernel, RCU lock mechanism |  Yuzhihui_no1 Related |  Release Date: 2014-10-19 | Heat: 1044°Objective

I would like to continue to analyze Upcall calls along the processing flow of the packet, but found that we must first understand the kernel and user space communication interface NetLink mechanism in Linux in the analysis of Upcall calls, so we have been delaying the analysis of Upcall. If you have some knowledge of openvswitch, you will find that Openvswitch is actually running on Linux system, because there are many mechanisms in openvswitch, modules, etc. are directly called the Linux kernel. For example, the RCU lock mechanism, the Upcall invocation, and the definitions of some structures are now obtained directly from the Linux kernel. So if you look at some of the structure of the source code (or the module, the Mechanism code), found in the Openvswitch is not defined (I use the source insight to view and analyze the source, can be very good to see whether it is defined), Then it is likely that openvswitch contains some definitions of the Linux kernel that the Linux header file references.

RCU is a new type of Linux lock mechanism (RCU in the Linux 2.6 kernel version of the beginning of the formal use), has been struggling to use the blog to say that the lock mechanism. Because there are many places in the Openvswitch to use the RCU lock, I began to analyze the time is a lock mechanism with a pen (can see the Openvswitch (OVS) source code Analysis of the data structure there are many places are used in the RCU lock mechanism). Later found that there are many places also used in the lock mechanism of the linked list insertion and deletion operations, and the code behind the analysis also has the appearance of RCU, so a little bit about the locking mechanism of some features and operations.

RCU Operating principle

We first recall the read-write lock (Rwlock) operating mechanism, so that can analyze RCU when the analysis can be done. Read-write locks are divided into reading locks (also known as shared locks), write locks (also known as exclusive locks, or exclusive locks). Analyze the read-write lock in the following case:

First, the data area to be manipulated is read lock, 1, if the request is read data, read the lock, a plurality of read locks are not excluded (that is, in the access to the data of the reader is not reached, the data area can be read lock); 2, if the request is write data, you can not write the lock immediately, Instead, you wait until all locks in the data area, including read and write locks, are freed before write access can begin.

Second, write locks are written on the data area to be manipulated, and no matter what requests must wait for the write lock on the data area to be released before it can be unlocked.

The same way to analyze the RCU lock mechanism: RCU is a read copy udate abbreviation, according to the word meaning to know that this is a read, copy, modify the data protection lock mechanism. Locking mechanism principle:

First, when writing data, there is no need to wait for all locks to be released like read-write locks. Instead, a copy of the data area is copied and then modified in the copy, waiting for the change to complete. Replace the original data area with this copy, and replace it with the write lock in the read-write lock, wait until all the visitors on the original data area exit before the data replacement, according to this feature can be inferred that with RCU lock can have multiple writer, copy multiple data area data, After the modification, each write replaces the contents of the original data area successively.

Second, read the data, do not need any locks, and almost no need to wait (read and write locks if the data area has a write lock to wait) can directly access the data. Why is it that there is little need to wait? Since the original data is replaced in the writing data, as long as the change of the pointer can be, the time spent can be said to be almost no, so read the data does not require additional overhead.

Summarize the RCU lock mechanism feature, which allows multiple readers and multiple writer to access the contents of the shared data area at the same time. And this kind of lock is very efficient to read and write less data, which can make the CPU reduce some extra overhead. If the operation is more, this mechanism will not read and write the lock so good. Because the RCU write data overhead is still very large, to copy the data, and then to modify, and finally wait for the replacement. In fact, this mechanism is like we put a file on a shared server, a lot of people use together. If you just look at this file content, then directly on the server cat view can. But if you want to modify the file, then you can not directly on the server to modify, because you do this will affect the person will see this file or write this file. So you can only copy to your own local changes, when the final confirmation is correct, and then replace the original data on the server.

RCU Writing work diagram

The following is a look at the RUC mechanism to modify the data (list as an example)


According to the above figure will be found in fact, as long as the replacement of the pointer can be changed, the original data area content is replaced by default, garbage collection mechanism is recycled.

Linux kernel RCU mechanism API

Having understood these mechanisms of RCU, let's look at some of the operations that are commonly used in the Linux kernel with RCU locks. Note that this blog will not be too much to delve into RCU the bottom of the implementation mechanism, because the purpose of sharing the RCU work mechanism is only to better understand the part of the code used in Openvswitch understanding, rather than to analyze the Linux kernel source code, do not cart before the horse. If you encounter a knowledge point and desperately dig deep, then you look at a source code estimated for several months.

Rcu_read_lock (); see someone here may feel that there is a contradiction, not a good reader does not need to lock it? In fact, this is not the kind of lock with the reading and writing locks, this just identifies the starting position of the critical section. It is shown that it is not possible to block and hibernate in the critical area, nor to allow the writer to replace the data (this function is much more than that). RCU _read_unlock () corresponds to the above Rcu_read_lock () to define a critical section (the data area to be protected with a lock).

Synchronize_rcu (); When the function is called by a CPU (usually by a writer to replace the data), while the other CPUs are reading the data in the critical section of the RCU protection, then SYNCHRONIZE_RCU () will ensure that the writer is blocked, The block is aborted until all other CPUs that read the data have exited the critical section, allowing the write to begin replacing the data. The function is to ensure that the CPU of all read data exits the critical section safely before the data is replaced. Also, there is a call_rcu () function that is similar. If Call_rcu () is called by one CPU and the other CPUs are read in the critical section of the RCU protection, the call to the corresponding RCU callback will be deferred until the CPU of the other read critical section data is fully secured (you can see comments from the Linux kernel source file, In the Rcupdate.h file, the comment before the Rcu_read_look () function.

Rcu_dereference (); Gets the pointer in a RCU protection, pointing to the RCU read-end critical section. His pointer may be safely dereferenced later. After all, it's a RCU protection pointer.

List_add_rcu (); Add a data node to the RCU protected data structure. This is similar to adding a node to a linked list, and the only difference is that this code is more: Rcu_assign_pointer (Prev->next, new); The code probably means: Assigning a pointer to a newly initialized struct will be dereferenced by the RCU read-end critical section, returning the specified value. (To be honest, I don't quite understand what this annotation means) the approximate explanation is that the next point of the insertion point will be directed to the newly added new node, why should it be implemented separately with a single statement instead of prev->next = new; This is because Prev->next is supposed to point to other data that is worthwhile, there may be a CPU going through Prev->next to access other RCU protection, so if you want to insert a RCU protected data structure it is necessary to invoke this statement, It will help you to deal with some of the details (for example, with other CPUs using the latter data, the direct use of prev->next may cause the read data to disconnect the CPU, resulting in problems), and let the new node just joined by RCU protection. This type of insertion has many, such as from the head inserted, from the end of the insertion, and so on, the realization is similar, here is not a detailed talk.

List_for_each_entry_rcu (); This is an operation that traverses the RCU list, and is similar to the normal list traversal. The difference is that you have to enter the RCU-protected CPU (that is, the CPU that called the Rcu_read_lock () function) to invoke this operation, and you can traverse the RCU linked list with other CPUs. In this same way, there are other forms of traversal and hash list traversal, not to speak fine.

If in the Openvswitch source code analysis found about RCU analysis and the contradiction here, can be here, of course, I will proofread.

Linux RCU lock mechanism analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.