Where will the big kernel lock go?

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: the simple and uncommon kernel locking mechanism of the universus kernel lock has always been a controversial topic among kernel developers. It is widely used in earlier Linux versions and has been gradually replaced by various spin locks since the 2.4 kernel, but it cannot be completely abandoned until now; it has been implemented using spin locks, in version 2.6.11, it was changed to a semaphore, but in version 2.6.26-RC2, it was returned to the old road using the spin lock; it even caused Linux founder Linus Torvalds and the famous fully fair scheduling (CFS) A controversy between algorithm contributor Ingo Molnar. What exactly is this?

1. Independent

People who have used kernel mutex mechanisms such as spin locks or semaphores will hardly think of a large kernel lock. Like a spin lock or semaphore,A large kernel lock is used to protect resources in the critical zone and prevent processes on multiple processors from accessing the same region at the same time.But what makes the lock unique is that it does not create many instances or objects like spin locks or semaphores. Each object protects a specific critical zone. In factThe entire kernel has only one such lock. Once a process acquires a large kernel lock, it enters the protected critical section, not only the critical section is locked, all other critical zones protected by the process will not be accessible until the process releases the large kernel lock.This seems incredible: How can a process operate a global linked list on one processor make other processes unable to access another global array? Using Two Spin locks, one protecting the linked list, and the other protecting the array won't be solved? However, if you use a large kernel lock, this is the result.
There are historical reasons for the generation of large kernel locks.. In earlier versions of Linux, the support for Symmetric Multi-processing (SMP) is very limited. To ensure reliability, the method of 'Kill three thousand by mistake and never let it go 'is adopted for mutual exclusion between processors: install a huge lock at the kernel entrance. Once a processor enters the kernel state, it immediately locks. Other processes that will enter the kernel state can only wait at the door, this ensures that only one process runs in the kernel state at a time. This lock is a large kernel lock. A system protected by a large kernel lock can certainly run securely on multiple processors: because only one processor is running the kernel code, the kernel execution is essentially no different from that of a single processor; the user State where multiple processors run simultaneously in the process is secure, because each process has its own address space. However, the disadvantages of such a rude lock are also obvious: the performance prompt of the multi-processor can only be reflected in the parallel processing of the user State, and the single-line execution in the kernel state, the power of multi-processor cannot be fully utilized. As a result, kernel developers began to find a way to gradually narrow down the scope of lock protection. In fact, most of the kernel code is multi-processor secure, and only a few global resources need to be mutually exclusive, so there is no need to limit the number of simultaneously running kernel-state processors. All processors can run in kernel mode at any time. You only need to pick out the resources to be protected and limit the number of processors that can access these resources at the same time. In this way, the large kernel lock reduces from protecting the entire kernel state to protecting some key fragments of the kernel state. This is an improvement, but the pace is not big enough. As mentioned above, the problem of locking the bedroom kitchen cannot be solved. With the wide application of spin locks, no one in the new kernel code has used large kernel locks.

2 Food is tasteless, lingering

Now that there is a substitute, the big kernel lock should be 'glorious offset. But it's actually not that simple. If the large kernel lock is just a spin lock of 'only one instance', wise kernel developers have long replaced it: create a spin lock for each type of resource under the protection of the spin lock, replace the lock/unlock of the large kernel lock with the lock/unlock of the corresponding spin lock. But today's big kernel lock is like a spoiled child. The kernel has given it a lot of extra care at some key points, making replacement of the big kernel lock a little annoying. Here are the complaints from Ingo Molnar in an email named 'Kill the big kernel lock (BKL:

The biggest technical complication is that the BKL is unlike any other lock: It "self-releases" When schedule () is called. this makes the BKL spinlock very "Sticky", "invisible" and viral: It's very easy to add it to a piece of code (even unknowingly) and
You never really know whether it's held or not. preempt_bkl made it even more invisible, because it made its effects even less visible to ordinary users.

The biggest technical difficulty is that the big kernel lock is different: it can be 'automatically released' when schedule () is called '. This makes the large kernel lock very troublesome and concealed: it allows you to easily add a piece of code, and almost never know whether it is locked or not. The preempt_bkl option makes it more concealed, because it causes its effect to be more 'struct' in front of normal users '.

It is more difficult to translate the code written by Linux developers than to understand the code they write, but it is clear that the automatic release of large kernel locks in the Schedule () function has complicated the problem. Let's take a look at the operations on the large kernel lock in Schedule:

Linux_2.6.34/kernel/sched. c

1 /*
2 * Schedule ()
Is the main Scheduler
Function.
3 */
4 asmlinkage void _ sched schedule (void)
5 {
...
19Release_kernel_lock (prev );
...
55 context_switch (RQ, Prev,
Next );
/* Unlocks the RQ
*/
...
67 If (unlikely (Reacquire_kernel_lock (current)
<0 ))
{
68 Prev = RQ-> curr;
69 switch_count =
& Prev-> nivcsw;
70 goto need_resched_nonpreemptible;
71}
...

In row 3, The release_kernel_lock (prev) function releases the large kernel lock occupied by the current process (prev). Then, the process is switched from the current process Prev to the next process next. Context_switch () can be considered as a super function. Calling it is not to execute a piece of code, but to execute another process. The multi-task switching of the system relies on this super function to switch from one process to another, and then switch from another process to the next process, so continuous rotation. As long as the process to be switched is still in the ready state, one day there will be a chance to schedule it back and continue running. The effect looks like the function context_switch () returns schedule () after running (). Continue to run row 67th and call the reacquire_kernel_lock () function (). This is a function that is paired with release_kernel_lock () and relocks the large kernel lock released earlier. If the statement test is true, the attempt to lock the large kernel lock fails. In this case, some optimizations can be made. The normal lock should be 'stay-in-situ ', and the status of the large kernel lock is repeatedly queried in the same place until other processes are released. However, this will waste valuable processing time, especially when a process in the running queue is waiting for running. Therefore, release_lernel_lock () only performs the 'try _ lock' operation, that is, if no one holds the large kernel lock, it will be locked. If it returns 0, it indicates the operation is successful; if it is locked,-1 is returned immediately, indicating a failure. Once failed, re-execute the Main Part of Schedule (), check the running queue, and select a suitable process to run. The lock may be unlocked when it is scheduled for the next running. In this way, another process (if a process is waiting in the queue) is used to replace the original process and improve the processor utilization.

In addition to the 'care' in Schedule (), the big kernel lock has another advantage: in the same process, you can lock and unlock it repeatedly, as long as the number of locks and the number of locks can be matched with each other, there will be no problem. This is beyond the reach of the spin lock. In the same process, the spin lock will be deadlocked if nested locks occur. Therefore, in the process control block (PCB), the lock counter is opened for the large kernel lock, that is, the lock_depth domain in task_struct. The initial value of this field is-1, indicating that the process has not obtained a large kernel lock. Lock_depth will be added with 1 each time the lock is applied, and then check that if lock_depth is 0, the real locking operation will be executed. This ensures that all nested locking operations will be ignored after a lock is applied, this avoids deadlocks. The unlock process is the opposite. lock_depth is reduced by 1 every time until it is found that the value is-1, the real unlock operation is executed.

If the kernel is biased towards a large kernel lock, the developer can lock it and access the critical section protected by it. After the developer executes the code that should not be executed, it cannot be noticed.

1. The program must exit as soon as possible after the critical section is locked; otherwise, other processes that will enter the critical section will be blocked. Therefore, the schedule () function cannot be called in the critical section. Otherwise, once a process is switched, the unlock time will become far away. In addition, process switching in the critical section protected by spin locks can easily lead to deadlocks. For example, if a process locks a spin lock and calls schedule () to switch to another process, the process must obtain the lock again, this is where the system will die in the spin where the process is waiting to be unlocked. This problem does not exist in the critical section of kernel lock protection, because the schedule () function automatically unlocks the obtained large kernel lock before being scheduled to a new process; the large kernel lock is automatically locked when the process is switched back. After locking a large kernel lock, you can hardly detect whether the schedule () function has been used during this period. This is the above Ingo
'Technical complication 'mentioned by Molnar: after replacing a large kernel lock with a spin lock, if Schedule () is called during the locking process, unpredictable and catastrophic consequences will occur. Of course, as a well-trained programmer, even if the large kernel lock relaxes the constraints, it will not consciously call the schedule () function in the critical section. However, if the code of a strange module is called, a superb programmer cannot guarantee that the function will not be called.

Second, as mentioned above, the lock in the critical section cannot be obtained again to protect the critical section; otherwise, the lock will be deadlocked. However, because the lock counter of the large kernel lock is protected, no matter how to nest it. This is also a 'technical complication ': after replacing a large kernel lock with a spin lock, it would be disastrous if the same nested spin lock occurs. Like schedule () functions, well-trained programmers do not consciously lock large kernel locks multiple times, however, after obtaining the spin lock, the code that calls the unfamiliar module cannot guarantee that these modules will not use the large kernel lock again. This situation is very common when developing large systems: everyone is very careful to avoid deadlocks in their modules, but no one can avoid deadlocks that may be introduced when other modules are called.

Ingo Molnar also mentioned another drawback of the large kernel lock: the large kernel lock is not covered by lockdep. Lockdep is a debugging module of the Linux kernel. It is used to check the kernel mutex mechanism, especially the potential deadlock problem of the spin lock. Because the spin lock is waiting in the query mode and does not release the processor, it is easier to deadlock than the general mutex mechanism. Therefore, lockdep is introduced to check possible deadlocks in the following situations (lockdep will have a special article to introduce in detail, here is a simple list ):

The same process recursively locks the same lock;
A lock is used to perform the lock operation when the lower half of the interrupt (or interrupt) is enabled, and the lock operation is performed in the lower half of the interrupt (or interrupt. In this way, the lock may attempt to lock the same processor due to interruption;
After locking, the dependency graph is generated into a closed loop, which is a typical deadlock.

Because the large kernel lock is out of lockdep, the dependency between itself and other mutex mechanisms is not monitored, and the deadlock may occur and cannot be recorded, it becomes increasingly messy and out of control.

In this case, the big kernel lock has become the core of the chicken, and cannot keep pace with the times, to a non-rectification point. However, the removal of large kernel locks from the kernel will face many challenges. For those large kernel locks that are scattered in the 'years of disrepair 'Code that have not been well received for many years, no one dared to touch them. Since there is little hope to completely remove it, it is also a matter of privilege to try to optimize it.

3. Change once changed: helpless choice

In earlier versions, the large kernel lock was implemented based on the spin lock. Spin locks are commonly used for interceptive intercept between processors. The spin lock is a simple and efficient method of mutual exclusion when the critical zone is very short, for example, only a few variable values are changed. But the disadvantage of the spin lock is that it will increase the system load, because the process still occupies the processor during the spin wait process, this part of the wait time is useless. Especially when a large kernel lock is used, the chance of a 'collision 'occurs in all the critical sections of a lock tube. In addition, in order to enable the process to throw the critical section at full speed as soon as possible, the spin lock disables kernel preemptible scheduling While locking. Therefore, locking the spin lock means that a scheduling 'boundary 'is created on a processor. During this period, it is neither preemptible by other processes nor allowed to call schedule () for autonomous process switching. That is to say, once a process on the processor gets a spin lock, the processor can only run the process all the time, even if the real-time process with a high priority is ready, it can only wait in line. The appearance of the scheduling restricted area increases the scheduling latency and reduces the system's real-time response speed, which is contrary to the kernel real-time transformation that everyone has been striving. So in Linux 2.6.7, the spin lock was thoroughly transformed, and instead of using the spin lock to use the semaphore. Semaphores do not have the two problems mentioned above: When waiting for the semaphores to be idle, the process does not occupy the processor and is in a blocking state; after obtaining the semaphores, the kernel preemption is still enabled and there will be no scheduling blind spots. Such a solution should be uncontroversial. But everything has advantages and disadvantages. The biggest defect of semaphores is that it is too complicated. Every time a process is blocked, time-consuming context switching is required. When semaphores are ready to wake up the waiting process, another context switching occurs. Besides the time-consuming context switching, TLB refresh and cache cool-down caused by process switching are all very popular. If the blocking time is relatively long and reaches the millisecond level, this switching is worthwhile. However, in most cases, you only need to wait for dozens of hundreds of commands to loop through another process at the entry of the critical section to hand over the critical section. At this time, such a switchover is a bit cool. This is like going to the hospital to see a common outpatient service. When a doctor is seeing a doctor for a patient, other patients will wait for a while at the door, so they don't have to leave a phone number to go home and go to bed, when the doctor is idle, he will make a phone call and then rush to the hospital.

Due to frequent process switching caused by the use of semaphores, the large kernel lock may have serious performance problems in some cases. Linus Torvalds has to consider changing the implementation of the large kernel lock back to the spin lock, the natural scheduling latency problem will also come back. This makes 'latency junkie 'self-built Ingo Molnar unhappy. But Linux is still Linus Torvalds, so the large kernel lock in 2.6.26-RC2 has become a spin lock until now. In general, the changes to Linus Torvalds are justified. It is cumbersome to use. heavyweight semaphores are not worthwhile to protect short-term critical zones. Moreover, Linux is not a real-time operating system, and it is not necessary to sacrifice the overall performance when seeking truth from facts.

4 days thin Xishan: curtain call is coming soon

Switching back to the spin lock does not mean that Linus Torvalds does not care about the scheduling delay. On the contrary, his true idea is to permanently remove the large kernel lock one day, which is slightly the same as that of Ingo Molnar.However, as it is difficult and risky to remove the large kernel lock, Ingo Molnar thinks that 'the new game rules must be used to solve the large kernel lock. He created a version branch named Kill-the-BLK. On this branch, he replaced the large kernel lock with the new mutex mechanism and solved the problem step by step:

Solve all known issues and take advantage of the critical section of the automatic unlocking mechanism of large kernel locks. That is to say, eliminate the dependency of codes using large kernel locks on the automatic unlocking mechanism, make it closer to the common mutex mechanism;
Add many debugging facilities to warn assumptions that are no longer valid under the new mutex mechanism;
Converts a large kernel lock to a common mutex and deletes the automatically unlocked code left in the scheduler;
Add lockdep monitoring on it;
Greatly simplify the large kernel lock code and delete it from the kernel.

It was two years ago. Now this work is not over yet, and we are still moving forward with no hesitation. We are looking forward to fading out the Linux kernel from the discord of the big kernel lock in the near future.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More