First, preface
The documentation on RCU consists of two parts, one on basic principles (this is the article) and one on the implementation of Linux kernel. The second chapter describes why there is RCU this synchronization mechanism, especially in the increasing number of CPU cores today, a better performance of the synchronization mechanism is how to solve the problem, of course, the best tools have its application scenarios, this chapter also gives some application limitations of RCU. The first section of the third chapter describes the concept of RCU design, in fact, RCU design concept is relatively simple, relatively easy to understand, the more difficult is the product level of RCU implementation, we will describe in the next document. The second section of the third chapter describes the related operations of RCU, in fact, corresponds to the RCU external interface API. The last chapter is the reference document, Perfbook is a magical number, like the parallel programming of the classmate must not miss a book, strongly recommended. Compared with Perfbook, this article looks very ugly (mainly some rcu knowledge or not deep understanding, may need to take a closer look at the implementation of the Linux kernel to understand its true meaning), in addition to the Chinese expression, there is no advantage, English is better than the students can directly refer to the book.
Second, why is there RCU this synchronization mechanism?
Before we talked about the spin LOCK,RW spin lock and SEQ Lock, why did RCU such a synchronization mechanism? The question is similar to asking: Why would a weapon like a meteor hammer come with a tool like brandish? Each weapon has its own application, and so does the kernel synchronization mechanism. RCU has solved the problem of the past synchronization mechanism under certain application scenarios, which is also the cornerstone of its existence. This chapter mainly consists of two parts: how to solve the other kernel mechanism problems, and the other is the restricted scenario?
1, performance issues
Let's recall the fundamentals of Spin Lcok, RW spin lcok, and SEQ lock. For spin lock, the protection of the critical section is done through two shared variables, next and owner. The thread calls Spin_lock into the critical section, which includes three actions:
(1) obtain its own number card (that is, next value) and allow which number card to enter the critical section (owner)
(2) Setting the next number plate (next++) to enter the critical area
(3) Determine whether the number of their own card is allowed to enter the number card (next = owner), if it is, enter the critical section, no spin (constantly get the value of the owner, judge whether it equals their number card, for the ARM64 processor, you can use WFE to reduce power consumption).
Note: (1) is the value, (2) is updated and writeback, so (1) and (2) must be atomic operations, the middle cannot insert any operations.
The thread calls Spin_unlock out of the critical section and executes owner++, indicating that the next thread can enter.
RW spin lcok and SEQ lock are similar to spin lock, and they are all based on a shared variable in memory (the access to the variable is atomic). We assume that the system architecture is as follows:
When a thread competes for a critical section on multiple CPUs, it operates the data lock (rose block) that is shared across multiple CPUs. CPU 0 operates lock, for data consistency, the operation of CPU 0 will cause the lock in other CPU L1 to become invalid, and subsequent access from other CPU to lock will cause L1 cache miss (more accurately communication cache Miss), must be obtained from the cache of the next level, and similarly, the lock in the L1 cache of the other CPUs is set to invalid, causing the next communication cache miss on other CPUs.
RCU's read side does not require access to such "shared data", which greatly improves the performance of the reader side.
2. Reader and writer can execute concurrently
Spin lock is mutually exclusive, and at any time only one thread (reader or writer) enters the critical section, RW spin lock is better, allowing multiple reader to execute concurrently, improving performance. However, reader and Updater cannot be executed concurrently, RCU lifting these limits, allowing a updater (not multiple updater into the critical section, which can be guaranteed by spinlock) and multiple reader concurrent execution. We can compare the RW spin lock and RCU, for reference:
Rwlock allows multiple reader concurrency, so in, three rwlock reader happily parallel execution. When Rwlock writer tries to enter (the red dashed line), it can only spin until all the reader exits the critical section. Once the Rwlock writer is in the critical section, no reader can enter until the writer finishes updating the data and immediately the critical section. The green Reader thread is able to play happily again. One of the characteristics of Rwlock is certainty, the white reader must be read old data, and the green reader must be obtained by the writer after the update of new data. RCU and the traditional locking mechanism, when the RCU updater into the critical area, even if there is no matter that reader, it can go straight, do not need to spin. Similarly, even if a updater is working in a critical area, this does not stop RCU reader. Thus, RCU concurrency performance is better than rwlock, especially if the number of CPUs is considered more than the situation, those in the spin state of the CPU in the unnecessary consumption, how unfortunate, with the number of CPUs increased, rwlock performance continues to decline. RCU Reader and Updater because it can be executed concurrently, so there are two protected data at this time, one is old, one is new, for White RCU reader, the data it reads may be old, it may be new, and the timing of data access is related, of course, when RCU After update finishes updating, the newly-launched RCU Reader (green block) must read the new data.
3, the applicable scene
As we said before, each lock has its own applicable scene: Spin lock does not distinguish between reader and writer, for those who are not suitable for reading and writing intensity asymmetry, RW spin lcok and SEQ Lock solves this problem, but Seq Lock prefers writer, and RW spin lock takes care of reader more. It seems that everything is perfect, but with the development of computer hardware technology, CPU computing speed is getting faster, in contrast, the speed development of the memory device lags behind. In this context, the mechanism overhead of acquiring a lock based on counter (which requires access to a memory piece), such as spin lock,rwlock, is relatively expensive. Moreover, the current trend is that the speed difference between the CPU and the storage device is gradually widening. Therefore, those based on the sharing of a multi-processor between the counter lock mechanism has not satisfied the performance requirements, in this case, the RCU mechanism came into being (of course, more accurately RCU a kernel synchronization mechanism, but not a lock, It is essentially lock-free), it overcomes the drawbacks of other locking mechanisms, but sugar cane does not have two sweet, rcu use scenes are more limited, mainly for the following scenarios:
(1) RCU can only protect dynamically allocated data structures and must be accessed through pointers
(2) RCU protected in the critical area can not sleep (Srcu is not the content of this article)
(3) Read-write asymmetry, there is no special requirements for writer performance, but reader performance requirements are very high.
(4) Reader side is not sensitive to old and new data.
Three, the basic idea of RCU
1. Principle
The basic ideas of RCU can be illustrated by the following images:
RCU involves two types of data, one pointing to a pointer to protect the data, and we call it RCU protected pointer. The other is shared data accessed through pointers, which we call RCU protected data, which, of course, must be dynamically allocated. There are two kinds of access to shared data, one is writer, that is to update the data, and the other is reader. If there is reader in the critical section of the data access, for the traditional, lock-based synchronization mechanism, reader will prevent writer entry (such as Spin lock and RW spin lock. Seqlock is not so, so essentially seqlock is also lock-free, because in the case of reader access to shared data, write directly modifies data to destroy the sharing. What do we do? Of course, after the reader has removed the access to the shared data, then let writer enter (writer slightly tragic). For RCU, the principle is similar, in order to allow writer to enter, you must first remove reader access to the shared data, how to remove it? Creating a new copy is a good choice. So RCU Writer's actions are divided into two steps:
(1) removal. Write assigns a new version of the shared data for data updates, and when the update is complete, RCU protected pointer points to the new version of the data. Once the RCU protected pointer point to the new data, it also means to be brought to the foreground, public (reader is through the pointer access to data). By doing so, the original read 0, 1, and 2 reference of the shared data were removed (for new versions of RCU-protected data), which were accessed on the previous version of RCU protected data.
(2) Reclamation. Shared data cannot have two versions, so be sure to recycle older versions of the data at the right time. Of course, can not be too anxious, can not be the reader thread also access to the old version of the data when the forced recovery, this will make reader crash. Reclamation must occur when all of the reader accessing the old version of the data has left the critical section and then recycled, and this waiting time is called grace period.
By the way, reclamation does not need to wait for read3 and 4, because the write-side statement for RCU protected pointer is atomic, and the scrambled reader thread either sees the old data or the new data. For Read3 and 4, they access new shared data and therefore do not reference the old data, so reclamation does not need to wait for read3 and 4 to leave the critical section.
2. Basic RCU operation
The operations for READER,RCU include:
(1) Rcu_read_lock, used to identify the beginning of the critical section of RCU read side.
(2) Rcu_dereference, this interface is used to obtain RCU protected pointer. Reader to access RCU protected shared data, of course, to get RCU protected pointer, and then through the pointer for dereference operations.
(3) Rcu_read_unlock, used to identify reader leaving RCU read side critical section
The operations for WRITER,RCU include:
(1) Rcu_assign_pointer. This interface is used by writer for removal operation, after witer complete new version of the data allocation and update, call this interface can let RCU protected pointer point to RCU protected data.
(2) Synchronize_rcu. The operation of the writer side can be synchronous, that is, after the update operation is completed, the interface function can be called to wait for all reader threads on the old version data to leave the critical section, and once returned from the function, there is no reference to the old shared data. The reclaimation can be operated directly.
(3) Call_rcu. Of course, in some cases (for example, in SOFTIRQ context), writer cannot block, this time can call the Call_rcu interface function, the function is only registered callback directly returned, at the appropriate moment will call the callback function, Complete the reclaimation operation. Such scenes are actually separate removal and reclaimation operations in two different threads: Updater and Reclaimer.
Iv. reference Documentation
1, Perfbook
2, linux-4.1.10\documentation\rcu\*
Linux kernel sync-RCU Basics