RCU principle:
RCU (read-copy update), as its name implies, is read-copy modification. It is named based on its principle. For the shared data structure protected by RCU, the reader can access it without getting any locks, but the writer first copies a copy when accessing it and then modifies the copy, finally, a callback mechanism is used to re-point the pointer to the original data to the new modified data at an appropriate time. At this time, all CPUs that reference the data exit the sharing data operation.
Therefore, RCU is actually an improved rwlock. The reader has almost no synchronization overhead and does not need to lock or use atomic commands, in addition, memory barrier is not required for all architectures except Alpha, so it will not lead to lock competition, memory latency and stagnant pipelines. No locks are required to make it easier to use, because the deadlock issue does not need to be considered. The writer's synchronization overhead is relatively high. It needs to delay the release of the data structure and copy the modified data structure. It must also use a lock mechanism to synchronize the modification operations of other writers in parallel. The reader must provide a signal to the writer so that the writer can determine the time when the data can be safely released or modified. There is a dedicated Garbage Collector to detect the reader's signal, once all the readers have sent a signal to inform them that they are not using the data structure protected by RCU, the garbage collector calls the callback function to complete the final data release or modification operation. The difference between RCU and rwlock is that it allows multiple readers to access protected data at the same time, and allows multiple readers and multiple writers to access protected data at the same time (note: whether concurrent access by multiple writers can be achieved depends on the synchronization mechanism used between writers. The reader does not have any synchronization overhead, and the writer's synchronization overhead depends on the inter-writer synchronization mechanism. However, RCU cannot replace rwlock, because if there are too many writes, the performance improvement of readers cannot compensate for the loss caused by writers.
Readers cannot be blocked when accessing shared data protected by RCU, which is a basic prerequisite for implementing the RCU mechanism. That is to say, when the reader references shared data protected by RCU, context switching is not allowed for the reader's CPU. Both spinlock and rwlock require this premise. When accessing shared data protected by RCU, the writer does not need to compete with the reader for any lock. Only when there are more than one writer, the writer needs to obtain a lock to synchronize with other writers. The writer first copies a copy of the modified element before modifying the data, and then modifies it on the copy, after modification, it registers a callback function with the garbage collector to perform real modification operations at the right time. This period of waiting for the right time is called grace period, and context switching of the CPU is called going through a quiescent state. grace period is the waiting time required for all CPUs to go through a quiescent state. The garbage collector calls the writer-registered callback function after grace period to complete real data modification or data release operations.
1. http://www.ibm.com/developerworks/cn/linux/l-rcu/
2. http://blog.csdn.net/xabc3000/article/details/15335131
3. http://blog.csdn.net/jianchaolv/article/details/7718578
4. http://blog.csdn.net/ustc_dylan/article/details/4049647