Linux Kernel RCU (Read Copy Update) Lock analysis-forward data transfer
If you use the top command of Linux perf tool for hot spot check, you will find that several of the first 10 suspects are locked!
When performing parallel multi-processing, it is inevitable that locks will inevitably occur, because this may always be the only way to protect shared data, and the protected area is the critical area. As we know, the lock overhead is huge, because it will inevitably either wait or let others wait. However, this is not the essence of overhead, the essence of overhead is that many locks use the "atomic operation" technology. Such an atomic operation will have a great impact on bus or cache consistency, for example, if you want to add 1 to a variable, do not think it is very simple. In fact, there are many unwanted operations behind it. On a processor of a certain architecture, you must first LOCK the bus, this means that when the LOCK is not released, Other Processors cannot access the memory (at least some areas of the memory). It may also involve refreshing the cache or triggering the cache consistency operation... this is not the most severe blow. In some architectures, there is a memory fence. It will fl the CPU pipeline and cache, and almost all the optimization solutions will be ineffective. Of course, this is the price. The benefit is that you have protected the critical section.
You have to protect the critical section. You have to pay the price. If you use a complicated lock to pay for it, it will be a little too large. Do you have to do this? Maybe your data structure is poorly designed, maybe your code stream is poorly designed. For example, if multiple threads read shared data at the same time, two threads read one write at a time, can a circular buffer be used to reduce competition? In fact, many shared peripheral drivers, such as NICs and hard disks, are playing like this. The Code only needs to ensure that the read pointer and write pointer do not surpass each other, so that the use of the lock can be minimized, of course, this is just a very simple example.
A well-designed data structure and code flow are on the one hand, but this layer is not abstract enough. A better way is to design a more optimized lock. The asymmetric locks such as read/write locks are an optimized lock for scenarios where there are few readers. The preferential treatment for readers is that they do not need to wait, and can be directly read as long as there is no writer, otherwise, wait. For writers, they need to wait for the completion of all readers. This read/write implementation can depend on another mechanism called spin lock. One of my implementations is as follows:
Typedef struct {
Spinlock_t * spinlock;
Atomic_t readers;
} Rwlock_t;
Static inline void rdlock (rwlock_t * lock)
{
Spinlock_t * lck = lock-> spinlock;
If (likely (! Lock-> readers ++ ))
Spin_lock (lck );
}
Static inline void rdunlock (rwlock_t * lock)
{
Spinlock_t * lck = lock-> spinlock;
If (likely (! -- Lock-> readers ))
Spin_unlock (lck );
}
Static inline void wrlock (rwlock_t * lock)
{
Spin_lock (lock-> spinlock );
}
Static inline void wrunlock (rwlock_t * lock)
{
Spin_unlock (lock-> spinlock );
}
Very OK, isn't it? But the best solution is to discard the lock and never use it.
When I designed my forwarding table, I copied a local forwarding table for each CPU to reduce the lock overhead. These forwarding tables are consistent and are generated by the route table, I thought this would avoid competition. However, these forwarding tables always face updates. How can we update them ?? I initially used IPI (Inter-processor interrupt). In the processing function, the processing thread is stopped, data is updated, and the thread is enabled. This prevents lock during processing. Very reasonable, isn't it? But I think it's complicated.
Take a closer look at the write lock of the read/write lock. It performs the standard lock operation with reckless results, and the read lock is also used when the first reader comes in. Can the wait caused by these lock operations be avoided? Let's look at my original IPI solution. The thread is stopped to prevent readers from reading the wrong data. In fact, the writer will take the initiative to give the execution stream to the writer first, then let's look at the writer in the read/write lock and find that when a reader exists, he does not take the initiative to give up his position, but just passively waits. This kind of waiting is boring!
Can I combine my methods with read/write locks?
How to combine? According to the previous thought, it is nothing more than making a decision for the writer to be passive or preemptive! However, it has another option, that is, to write data first according to its own process, instead of writing the original data, but writing a copy of the original data (Great copy-on-write ), then, the transaction will be linked to an unfinished transaction linked list, and the original data will be overwritten one by one with the data on the linked list when the system finds that all readers are finished. This is a good combination, and this is the great RCU lock. The reader's price is simply to mark someone reading it, and the writer does not need to wait for the lock, write the copy directly, write the copy, and leave it, and then the system will hand over the matter to the system ....
This article permanently updates the link address: