Linux process scheduling

Source: Internet
Author: User

Original address:

http://cchxm1978.blog.163.com/blog/static/35428253201092910491682/

Pretty good article, read the Post collection, Dochebeau Master share!

Start---------------------------------body---------------------------------

To implement multi-processes, process scheduling is essential.
Some people say that process scheduling is one of the most important parts of the operating system. I think this statement is too absolute a bit, as many people often say, "Such a function is more efficient than a certain function xx times" the same, divorced from the actual environment, these conclusions are relatively one-sided.

And how important is process scheduling? First, we need to make it clear that process scheduling is the process of scheduling a task_running state (see "Analysis of Linux process status"). If the process is not executable (sleeping or otherwise), it has little to do with process scheduling.
So, if your system loads very low, expect the stars to look forward to the moon in an executable state of the process. Then the process scheduling will not be too important. Which process is executable, let it go, there is nothing more to consider.
Conversely, if the system load is very high, there will always be n multiple processes in the executable state, waiting to be scheduled to run. So the process scheduler must do a lot of work to coordinate the execution of these n processes. If the coordination is not good, the performance of the system will be greatly compromised. At this time, the process scheduling is very important.
Although many of the computers we normally touch (such as desktop systems, Web servers, etc.) are less loaded, Linux, as a general-purpose operating system, cannot assume that the system load is low and must be carefully designed to cope with high-load process scheduling.
Of course, these designs are not very much used in environments where low loads (and no real-time requirements) are required. In extreme cases, if the load of the CPU is always 0 or 1 (there is always only one process or no process needs to run on the CPU), then these designs are largely futile.

Priority level
The most basic way to coordinate the "simultaneous" operation of multiple processes is to define priorities for the process. The priority of the process is defined, and if there are multiple processes at the same time in the executable state, then who is high priority to execute, there is no good tangled up.
So, how is the priority of the process determined? There are two ways: specified by the user program, dynamically adjusted by the kernel scheduler. (It will be said below)
The Linux kernel divides a process into two levels: normal and real-time processes. The priority of real-time processes is higher than normal processes, and their scheduling strategies are different.
Scheduling of real-time processes
In real time, the original implication is "a given operation must be completed within a definite time". The point is not how quickly the operation must be handled, but the time to be controllable (in the worst case, it cannot break through a given time).
Such "real-time" is called "hard real-time" and is used in very sophisticated systems (such as rockets, missiles, etc.). In general, hard real-time systems are relatively private.
A general-purpose operating system such as Linux is clearly unable to meet such requirements, and the existence of mechanisms such as interrupt handling, virtual memory, and so on, presents considerable uncertainty about the processing time. The hardware cache, disk seek, bus contention, also bring uncertainty.
Consider, for example, "i++;" So a C code. In most cases, it executes very quickly. But in extreme cases there is the possibility that:
1, I memory space is not allocated, the CPU triggered the fault of the pages. While linux in the processing code of the fault pages attempt to allocate memory, and may be due to the system memory shortage allocation failure, resulting in the process into sleep;
2, the code during the execution of the hardware interrupt, Linux into the interrupt handler and shelve the current process. A new hardware interrupt may occur during the processing of interrupt handlers, and interrupts are never nested ... ;
Wait a minute......
And like Linux, so-called "real-time" universal operating system, in fact, just realized "soft real-time", that is, as far as possible to meet the real-time needs of the process.
If a process has real-time requirements (it is a real-time process), the kernel will keep it executing as long as it is executable, to meet its CPU needs as much as possible until it finishes what it needs to do, then sleeps or exits (into a non-executable state).
If more than one real-time process is in the executable state, the kernel will first meet the CPU needs of the real-time process with the highest priority until it becomes non-executable.
Thus, as long as the high-priority real-time process has been in the executable state, the low-priority real-time process has not been able to get the CPU, as long as the real-time process is in the executable state, the normal process has been unable to get the CPU.
So, what if multiple real-time processes of the same priority are in the executable state? There are two scheduling strategies to choose from:
1, Sched_fifo: FIFO. Until the process that was executed first becomes non-executable, subsequent processes are scheduled to execute. In this strategy, the first-come process can execute the Sched_yield system call, voluntarily abandon the CPU, in order to give right to the subsequent process;
2, Sched_rr: rotation scheduling. The kernel allocates a time slice for the real-time process, allowing the next process to use the CPU when the time slice is exhausted;
It is emphasized that these two scheduling strategies are only for situations where multiple real-time processes of the same priority are in an executable state at the same time.
Under Linux, the user program can be used to set the scheduling policy of the process and related scheduling parameters through the Sched_setscheduler system, and the Sched_setparam system call is only used to set the scheduling parameters. These two system calls require the user process to have the ability to set process priorities (cap_sys_nice, which generally requires root privileges) (see capability related articles).
By setting the policy of the process to Sched_fifo or SCHED_RR, the process becomes a real-time process. The priority of the process is specified by the above two system calls when setting the scheduling parameters.
For real-time processes, the kernel does not attempt to adjust its priority. Because the process is real-time or not? How real-time is it? These problems are related to the application scenario of the user program, only the user can answer, the kernel can not assume.
In summary, the scheduling of real-time processes is very simple. The priority of the process and scheduling policy are determined by the user, the kernel only need to always select the highest priority real-time process to schedule execution. The only thing that's a little bit troublesome is that there are two scheduling strategies to consider when selecting a real-time process with the same priority.
Scheduling of normal processes
The central idea of real-time process scheduling is to have the highest-priority real-time processes in the executable state occupy the CPU as much as possible, because it has real-time requirements, while the normal process is considered to be a process without real-time requirements, so the scheduler tries to let the normal processes in the executable state share the CPU peacefully. This allows users to feel that these processes are running at the same time.
The scheduling of ordinary processes is much more complex than a real-time process. The kernel needs to consider two troublesome things:
First, the priority of dynamic adjustment process
By the behavior characteristics of a process, you can divide the process into "interactive processes" and "Batch Processes":
Interactive processes, such as desktop programs, servers, and so on, are the main tasks of interacting with the outside world. Such processes should have a high priority, and they always sleep waiting for input from outside. And when the input arrives, the kernel wakes it up, and they should be scheduled to execute quickly to respond. For example, a desktop program, if the mouse click after half a second type has not responded, the user will feel the system "card";
The primary task of a batch process, such as compiling a program, is to do continuous operations, so that they persist in an executable state. Such processes generally do not require high priority, such as the compiler run more than a few seconds, the user will probably not care too much;
If the user is able to know exactly what priority the process should have, the priority can be set through the nice, setpriority system tune-up. (If you want to increase the priority of the process, the user process is required to have cap_sys_nice capability.) )
However, applications may not be as typical as desktop programs or compiler programs. The behavior of the program can be varied, and may be like an interactive process for a while, just like a batch process. So that it is difficult for the user to set a proper priority.
Furthermore, even if the user knows for sure whether a process is interactive or batch, it is mostly due to permissions or lazy rather than setting the priority of the process. (Have you set a priority for a program?) )
As a result, the task of differentiating between interactive and batch processes falls to the kernel scheduler.
The scheduler pays attention to the performance of the process over time (mainly to check its sleep time and run time), according to some empirical formulas, to determine whether it is now interactive or batch processing? How much? Finally decided to make a certain adjustment to its priority.
When the priority of a process is dynamically adjusted, there are two priority levels:
1, the priority of the user program settings (if not set, the default value is used), called the static priority. This is the baseline for process prioritization, which is often not changed during process execution;
2, the priority of dynamic adjustment after the actual effective priority. This value is likely to change at all times;
Second, the fairness of the Dispatch
In a system that supports multiple processes, ideally, each process should have a fairly high level of CPU ownership based on its priority. There is no such thing as an uncontrolled situation where "who is lucky and who takes up a lot".
Linux implementation of fair scheduling is basically two ideas:
1. Assign a time slice (by priority) to a process that is in an executable state, and the process that runs out of time slices is placed in the expiration queue. The process of the executable state expires, and then the time slice is redistributed;
2, the dynamic adjustment process priority. As the process runs on the CPU, its priority is constantly lowered so that other lower-priority processes get run-time;
The latter approach has a smaller scheduling granularity and combines the two things "fairness" and "dynamic prioritization", greatly simplifying the kernel scheduler code. As a result, this approach is also a favorite of the kernel scheduler.
It is emphasized that the above two points are only for normal processes. And for the real-time process, the kernel can not be in the same way to dynamically adjust the priority level, there is no fairness to speak of.
The general process specific scheduling algorithm is very complex, and with the evolution of the Linux kernel version is also changing (not just simple adjustment), so this article does not go further. Interested friends can refer to the following links:
"Linux Scheduler Development Brief"
"Mouse See Linux Scheduler"
"Mouse eye again see Linux Scheduler [1]"
"Mouse eye again see Linux Scheduler [2]"
The efficiency of the Dispatch program
"Priority" clarifies which process should be scheduled for execution, and the scheduler must also be concerned about efficiency issues. The scheduler is executed as often as many processes in the kernel, and if inefficient, it wastes a lot of CPU time, causing system performance to degrade.
At Linux 2.4, the executable state of the process is hung in a linked list. Each time the scheduler is dispatched, the scheduler needs to scan the entire list to find the optimal process to run. The complexity is O (n);
In the early days of Linux 2.6, the executable process was hung in the N (n=140) list, with each linked list representing a priority, and how many lists were in the system that supported the number of priorities. For each dispatch, the scheduler only needs to remove the process that is located in the list header from the first non-empty linked list. This greatly improves the efficiency of the scheduler, the Complexity of O (1);
In the recent version of Linux 2.6, the process of executable status is hung in priority order in a red-black tree (which can be imagined as a balanced binary tree). For each dispatch, the scheduler needs to find the highest-priority process from the tree. The complexity is O (logn).
So why is the complexity of the scheduler's selection process increased from early Linux 2.6 to the recent Linux 2.6 release?
This is because, at the same time, the scheduler's implementation of fairness changes from the first idea mentioned above to the second idea (implemented by dynamically adjusting the priority). and O (1) algorithm is based on a small number of linked lists to achieve, according to my understanding, this makes the priority value range is very small (very low), can not meet the needs of fairness. The use of red-black trees has no limit on the priority value (32-bit, 64-bit, or more bits can be used to represent priority values), and O (logn) complexity is also very efficient.
Timing of the Dispatch trigger
The triggering of the dispatch mainly has the following situation:
1. The status of the current process (the process running on the CPU) becomes a non-executable state.
The process Execution system call actively becomes a non-executable state. such as performing nanosleep into sleep, execution exit exit, and so on;
The resource requested by the process is not satisfied and is forced into sleep state. For example, when performing a read system call, the disk cache does not have the required data, so that sleep waits for disk IO;
The process responds to a signal and becomes a non-executable state. such as response sigstop into the suspended state, response Sigkill exit, and so on;
2, preemption. When the process runs, it is not expected to be deprived of the CPU's use. This is done in two cases: the process has run out of time slices, or a higher priority process has occurred.
A higher-priority process is awakened by the impact of processes running on the CPU. Wake up when sending a signal, or be awakened by releasing a mutex (such as releasing a lock);
During the response to the clock interrupt, the kernel discovers that the time slice of the current process is exhausted;
The kernel wakes up when it responds to an outage by discovering that the external resources that the higher-priority process waits for are available. For example, the CPU receives the network card interrupt, the kernel handles the interrupt, discovers that a socket is readable, and then wakes the process that is waiting to read the socket, and then, for example, the kernel triggers the timer during the processing of the clock interrupt, which wakes up the corresponding process of sleep in the nanosleep system call;
Other questions
1. Kernel preemption
Ideally, the current process should be preempted immediately as long as the "higher priority process" condition is met. However, just as multithreaded programs require locks to protect critical-area resources, there are many such critical sections in the kernel that are unlikely to receive preemption anytime, anywhere.
Linux 2.4 is designed to be simple, and the kernel does not support preemption. The process is not allowed to preempt when it is running in a kernel state (such as executing a system call, being in an exception handler). The dispatch must wait until the user state is returned (specifically, before returning to the user state, the kernel checks to see if it needs to be dispatched);
Linux 2.6 implements kernel preemption, but in many places it is necessary to temporarily disable kernel preemption in order to protect critical zone resources.
There are also some places where preemption is disabled for efficiency reasons, typically spin_lock. Spin_lock is a lock that, if the request locking is not satisfied (the lock is already occupied by another process), the current process constantly detects the state of the lock in a dead loop until the lock is released.
Why are you so busy waiting? Because the critical area is small, such as protecting only "i+=j++;" Such a sentence. If the lock fails to form a "sleep-wake" process, it is not worth the candle.
So now that the current process is busy waiting (no sleep), who's going to release the lock? In fact, the process that has been locked is running on another CPU, and the kernel preemption is disabled. This process is not preempted by other processes, so the process of waiting for a lock can only run on another CPU. (What if there is only one CPU?) Then there is no way to wait for the lock process. )
And what if kernel preemption is not disabled? Then the process of getting the lock may be preempted, so the lock may not be released for a long time. Thus, the process of waiting for a lock may not know what year the month is looking.
For some systems with higher real-time requirements, such things as spin_lock are not tolerated. Instead of using a more strenuous sleep-wake process, you can't allow higher-priority processes to wait because preemption is disabled. For example, embedded real-time Linux MontaVista is doing this.
This shows that real-time does not represent efficiency. Many times in order to achieve "real-time", still need to make some concessions to performance.
2. Load balancing under multi-processor
We did not specifically discuss the impact of multiprocessor on the scheduler, in fact, there is nothing special, is that at the same moment can have multiple processes running in parallel. So why is there a "multiprocessor load balancer" thing?
If there is only one executable queue in the system, which CPU is idle, go to the queue to find the most appropriate process to execute. Isn't that good and balanced?
This is true, but there are some problems with multiple processors sharing an executable queue. Obviously, each CPU needs to lock up the queue when executing the scheduler, which makes it difficult for the scheduler to parallelize and may result in degraded system performance. There is no such problem if each CPU corresponds to an executable queue.
In addition, there is a benefit to multiple executable queues. This allows a process to always be executed on the same CPU for a period of time, so it is likely that the CPU caches the process data at all levels, which is beneficial to the performance of the system.
So, under Linux, each CPU has its own executable queue, and a process with an executable state can only be in one executable queue at a time.
As a result, "Multiprocessor load balancing" is the troublesome thing to do. The kernel needs to focus on the number of processes in each CPU's executable queue and make appropriate adjustments when the number is uneven. When to adjust to how much effort process adjustment, these are the core needs to be concerned. Of course, try not to adjust the best, after all, adjusted to consume the CPU, but also lock the executable queue, the price is not small.
In addition, the kernel has to be concerned with the relationship of each CPU. Two CPUs, which may be independent of each other, may be shared with the cache, and may even be virtual by the same physical CPU through Hyper-Threading technology ... The relationship between CPUs is also an important basis for load balancing. The closer the relationship is, the more tolerant the "imbalance" should be.
For more details, refer to the article on "Dispatch domain".

3. Priority inheritance
Because of mutual exclusion, a process (set to a) may sleep because it waits to enter the critical section. Process A is not awakened until the process that is occupying the corresponding resource (set to B) exits the critical section.
There may be situations where a has a very high priority and B has a very low priority. B enters the critical section, but is preempted by other higher-priority processes (set to C) and cannot exit the critical section without running. So a can not be awakened.
A has a high priority, but now it is reduced to with B, the priority is not too high C preemption, resulting in execution is deferred. This behavior is called priority reversal.
It is unreasonable to have this phenomenon. A better response is: When a begins to wait for B to exit the critical section, B temporarily gets the priority of a (or the priority of A is higher than B) in order to successfully complete the process and exit the critical section. After the priority of B is restored. This is the method of precedence inheritance.
In order to achieve priority inheritance, the kernel has to do a lot of things. For more details, refer to the article on "priority reversal" or "priority inheritance".
4, interrupt processing line threaded
Under Linux, the interrupt handler runs in a non-scheduled context. From the CPU response hardware interrupt automatically jump to the kernel set interrupt handler to execute, to interrupt handler exit, the whole process can not be preempted.
If a process is preempted, it can be resumed at some later time by saving the information in its Process Control block (task_struct). While the interrupt context is not task_struct, it cannot be recovered by being preempted.
An interrupt handler cannot be preempted, which means that the "priority" of the interrupt handler is higher than any process (it must wait for the interrupt handler to be completed before the process can be executed). However, in a real-world scenario, some real-time processes should be given a higher priority than interrupt handlers.
As a result, some systems with higher real-time requirements give task_struct and priority to interrupt handlers, allowing them to be preempted by high-priority processes when necessary. But obviously, doing this is going to cost the system a certain amount of money, which is also a concession to performance in order to achieve "real-time".

------------------------------------------body Ends---------------------------------------------

Linux process scheduling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.