Linux Process Scheduling

Source: Internet
Author: User

Multi-process scheduling is essential for the operating system.
Some people say that process scheduling is the most important part of the operating system. I think this is an absolute saying, just as many people say, "a function is XX times more efficient than a function." It is out of the actual environment, these conclusions are one-sided.

How important is process scheduling? First, we need to make it clear:Process Scheduling is used to schedule processes in the task_running state.(See Linux Process status analysis). If the process cannot be executed (sleeping or other), it has little to do with process scheduling.
Therefore, if your system load is very low, you can see an executable process only when you look forward to the moon. Process Scheduling is not very important. If a process is executable, let it be executed. There is nothing to consider.
If the system load is very high, more than N processes are always executable, waiting for scheduling to run. The process scheduler must do a lot of work to coordinate the execution of the N processes. If coordination is poor, the system performance will be compromised. At this time, process scheduling is very important.

Although many computers (such as desktop systems, network servers, and so on) we usually have relatively low loads, Linux, as a general operating system, cannot assume that the system load is low, process Scheduling must be carefully designed to cope with high loads.
Of course, these designs are not very useful for low-load (and there is no real-time requirement) environments. In extreme cases, if the CPU load is always 0 or 1 (there is always only one process or no process needs to run on the CPU), these designs are basically futile.

Priority
In order to coordinate the running of multiple processes at the same time, the operating system defines the priority for the process. Defines the priority of a process. If multiple processes are in the executable state at the same time, there is nothing to worry about when there is a higher priority.
How can we determine the priority of a process? You can use either of the following methods: user program designation and dynamic adjustment by the kernel scheduler. (As mentioned below)

The Linux kernel divides processes into common and real-time processes. Real-time processes have higher priority than normal processes, and their scheduling policies are also different.

Scheduling of Real-Time Processes
Real-time, the original meaning is that "a given operation must be completed within a specified period of time ". The point is not that operations must be processed much faster, but that time must be controlled (in the worst case, the given time cannot be broken ).
Such "real-time" is called "hard real-time" and is mostly used in very precise systems (such as rocket and missile systems ). Generally, hard real-time systems are relatively dedicated.

General operating systems like Linux obviously cannot meet such requirements. The existence of Interrupt Processing, virtual memory, and other mechanisms brings great uncertainty to the processing time. Hardware cache, disk tracing, and bus Contention also bring uncertainty.
For example, consider a C Code such as "I ++. In most cases, it runs very fast. However, in extreme cases, it is possible that:
1. I memory space is not allocated, and the CPU trigger page missing exception. When Linux tries to allocate memory in a page-missing Exception Code, the allocation may fail due to a shortage of system memory, causing the process to sleep;
2. hardware is interrupted during code execution. Linux enters the interrupt processing program and suspends the current process. However, new hardware interruptions may occur during the process of Interrupt Processing. interruption is always nested ......;
Wait ......
For example, a Linux-like general operating system that implements "real-time" is actually just "Soft Real-time", that is, to meet the real-time needs of processes as much as possible.

If a process has real-time requirements (it is a real-time process), as long as it is executable, the kernel keeps letting it execute to meet its CPU needs as much as possible, wait until it completes what needs to be done, and then sleep or quit (to a non-executable State ).
If multiple real-time processes are in the executable state, the kernel will first meet the CPU needs of the real-time process with the highest priority until it becomes non-executable.
As long as a high-priority real-time process is always in the executable state, the low-priority real-time process will never get the CPU; as long as there are always real-time processes in the executable state, A common process never gets the CPU.
(Later, the kernel added the/proc/sys/kernel/sched_rt_runtime_us and/proc/sys/kernel/sched_rt_period_us parameters, which limited the period in which sched_rt_period_us is used, real-time processes can run at most sched_rt_runtime_us for so many times. In this way, when a real-time process is always in the executable state, it leaves a little chance for a common process to be executed. See Linux group scheduling analysis.)

So what if multiple real-time processes with the same priority are in the executable state? Two scheduling policies are available:
1. sched_fifo: first-in-first-out. It is not until the first executed process changes to a non-executable state that the later process is scheduled for execution. In this policy, the first-come process can executeSched_yieldThe system calls the CPU and voluntarily gives up the CPU to the subsequent process;
2. sched_rr: rotation scheduling. The kernel allocates time slices for Real-Time Processes and allows the next process to use the CPU when the time slice is used up;
The two scheduling policies are only applicableMultiple real-time processes with the same priorityAt the same timeExecutable status.

In LinuxSched_setschedulerA system call is used to set the scheduling policy and related scheduling parameters of a process;Sched_setparamSystem calls are only used to set Scheduling parameters. These two system calls require the user process to have the ability to set the process priority (cap_sys_nice, generally requires the root permission) (see capability-related articles ).
By setting the Process Policy to sched_fifo or sched_rr, the process becomes a real-time process. The priority of a process is specified when the scheduling parameters are set through the preceding two system calls.

For real-time processes, the kernel does not try to adjust their priority. Is the process real-time or not? How real-time? These questions are related to the application scenarios of the user program. Only the user can answer these questions and the kernel cannot be broken down.

To sum up, real-time process scheduling is very simple. The priority and scheduling policy of processes are all set by the user. The kernel only needs to always select the real-time process with the highest priority for scheduling and execution. The only thing that is a little tricky is to consider two scheduling policies when selecting a real-time process with the same priority.

Scheduling of common processes
The central idea of real-time process scheduling is to make the real-time process with the highest priority in the executable state occupy the CPU as much as possible because it has real-time requirements; common processes are considered as processes without real-time requirements. Therefore, the scheduler tries to share the CPU with common processes in an executable state in a peaceful way, this makes the user feel that these processes are running at the same time.
Compared with Real-Time Processes, scheduling of common processes is much more complex. The kernel needs to consider two troubles:

I. dynamically adjust the priority of processes
Based on the behavioral characteristics of processes, processes can be divided into "interactive processes" and "batch processing processes ":
The main tasks of interactive processes (such as desktop programs, servers, and so on) are to interact with the outside world. Such processes should have a higher priority, and they always wait for external input. When the input arrives and the kernel wakes it up, they should be quickly scheduled for execution to respond. For example, if a desktop program does not respond after the mouse clicks for half a second, the user will feel that the system is "stuck;
The main tasks of batch processing (such as compiling programs) are continuous operations, so they are continuously executable. Such a process generally does not require a high priority. For example, if the Compilation Program runs for several seconds, most users do not care too much;

If the user can clearly know the priority of the processNice,SetpriorityThe system calls this operation to set the priority. (To increase the priority of a process, the user process must be cap_sys_nice .)
However, applications may not be as typical as desktop programs and compiler programs. The program may behave in a variety of ways. It may be like an interactive process and a batch process. This makes it difficult for users to set a proper priority for it.
Furthermore, even if the user knows whether a process is interactive or batch processing, most of the time the process priority is not set due to permission or laziness. (Have you set a priority for a program ?)
Therefore, in the end, the task of distinguishing between the interactive process and the batch process falls into the kernel scheduler.

The scheduler focuses on the performance of a process in the recent period (mainly checking its sleep time and running time). Based on some empirical formulas, can the scheduler determine whether the process is interactive or batch? To what extent? Finally, we decided to adjust its priority.
After the priority of a process is dynamically adjusted, there are two priorities:
1. The priority set by the user program (if not set, the default value is used), which is called the static priority. This is the benchmark of the process priority, which is usually not changed during the process execution;
2. The priority that actually takes effect after the priority is dynamically adjusted. This value may change all the time;

Ii. Fairness of scheduling
In a system that supports multiple processes, each process should ideally occupy the CPU fairly based on its priority. There will be no uncontrollable situations like "who has better luck and who has more shares.
There are basically two ways to implement fair scheduling in Linux:
1. assign a time slice (by priority) to the processes in the executable state. The processes that use the time slice are placed in the "Expiration queue. Wait until all processes in the executable status expire, and re-allocate the time slice;
2. dynamically adjust the priority of a process. As the process runs on the CPU, its priority is continuously lowered so that other processes with lower priority can get a running opportunity;
The latter method has a smaller scheduling granularity and combines "fairness" with "dynamic priority adjustment", greatly simplifying the code of the kernel scheduler. Therefore, this method has become a new favorite of kernel scheduling programs.

The above two points are onlyCommon Process. For real-time processes, the kernel cannot adjust the priority dynamically without any fairness.

The specific Scheduling Algorithms of common processes are very complex and constantly Replace with the evolution of the Linux kernel version (not just a simple adjustment). Therefore, this article will not proceed further. If you are interested, refer to the following link:
Brief Introduction to the Development of Linux schedulers
Mouse watch Linux Scheduler
Looking at the Linux scheduler [1]
Looking at the Linux scheduler [2]

Scheduler Efficiency
The "Priority" determines which process should be scheduled for execution, and the scheduling program must be concerned with efficiency. The scheduling program is executed frequently like many processes in the kernel. If the efficiency is poor, it will waste a lot of CPU time, leading to a reduction in system performance.
In Linux 2.4, executable processes are linked to a linked list. During each scheduling, the scheduler needs to scan the entire linked list to find the best process to run. Complexity: O (N );
In the early days of Linux 2.6, processes in the executable state were hung in the n (n = 140) linked list. Each linked list represents a priority, and there are multiple linked lists in the system. During each scheduling, the Scheduler only needs to retrieve the process at the head of the linked list from the first non-empty linked list. This greatly improves the efficiency of the scheduling program, and the complexity is O (1 );
In a recent version of Linux 2.6, processes in the executable state are hashed in a red/black tree (as you can think of as a balanced binary tree) in the order of priority. During each scheduling, the scheduler needs to find the process with the highest priority from the tree. The complexity is O (logn ).

So, from the early days of Linux 2.6 to the recent version of Linux 2.6, why does the scheduler increase the complexity of selecting processes?
This is because, at the same time, the scheduler's realization of fairness has changed from the first approach mentioned above to the second approach (implemented by dynamically adjusting the priority ). The O (1) algorithm is implemented based on a small set of linked lists. According to my understanding, this makes the priority value range very small (low discrimination ), it cannot meet the demand of fairness. The use of the red/black tree has no restrictions on the priority value (32-bit, 64-bit, or more bits can be used to represent the priority value), and O (logn) the complexity is also very efficient.

Scheduling trigger time
Scheduling trigger mainly involves the following situations:
1. The status of the current process (processes running on the CPU) Changes to non-executable.
The invocation status of the Process Execution System Changes to non-executable. For example, execute nanosleep to sleep, execute exit, and so on;
The resources requested by the process are not satisfied and forced to enter sleep state. For example, when a read system call is executed, the disk cache does not have the required data, so that you can sleep and wait for disk IO;
The process response signal changes to a non-executable state. For example, the response sigstop enters the pause status, the response sigkill exits, and so on;

2. preemption. When a process is running, the CPU usage is unanticipated. There are two cases: the process is used up, or a process with a higher priority occurs.
Processes with higher priority are awakened due to the impact of processes running on the CPU. If the sender actively wakes up, or is awakened by releasing mutex objects (such as releasing locks;
During the response to clock interruption, the kernel finds that the time slice of the current process is used up;
When the kernel responds to the interruption, it finds that the external resources waiting for by processes with higher priority become available, thus awakening them. For example, when the CPU receives a NIC interruption, the kernel processes the interruption and finds that a socket is readable, then it wakes up the process waiting to read the socket. For example, when the kernel processes the clock interruption, A timer is triggered to wake up the sleeping process that is being called by the nanosleep system;

Other problems
1. kernel preemption
Ideally, the current process should be immediately preemptible if the condition "a process with a higher priority is met. However, just as a multi-threaded program needs to use locks to protect critical zone resources, there are also many such critical zones in the kernel, which are unlikely to receive preemption anytime and anywhere.
In Linux 2.4, the design is very simple, and the kernel does not support preemption. The process is not allowed to be preemptible when it is running in the kernel state (such as executing a system call or being in an exception handling function. Scheduling is triggered only when the user State is returned (specifically, the kernel checks whether scheduling is required before the user State is returned );
Linux 2.6 implements kernel preemption, but in many places, we need to temporarily disable kernel preemption to protect critical zone resources.

In addition, in some cases, preemption is disabled for efficiency reasons. A typical example is spin_lock. Spin_lock is such a lock. If the request lock is not satisfied (the lock has been occupied by another process), the current process continuously detects the lock status in an endless loop until the lock is released.
Why is it so busy? Because the critical section is very small, for example, only protecting the sentence "I + = J ++. If the process of "sleep-wake up" is formed due to locking failure, it will be worth the candle.
Now that the current process is waiting (not sleeping), who will release the lock? In fact, the locked process runs on another CPU and kernel preemption is disabled. This process will not be preemptible by other processes, so the process waiting for the lock may only run on another CPU. (What if there is only one CPU? Then there is no process waiting for the lock .)
What if kernel preemption is not disabled? The lock process may be preemptible, so the lock may not be released for a long time. As a result, the process waiting for the lock may not know how many months have been expected.

For systems with higher real-time requirements, such as spin_lock cannot be tolerated. We would rather switch to a more laborious "sleep-wake-up" process, or wait for a process with a higher priority because preemption is disabled. For example, the embedded real-time Linux monavista does this.
It can be seen that real-time does not mean efficient. In many cases, certain concessions are required to achieve "real-time" performance.

2. multi-processor Load Balancing
We didn't discuss the influence of multi-processor on the scheduling program. In fact, there is nothing special, that is, multiple processes can run in parallel at the same time. So why is there "multi-processor Load Balancing?
If there is only one executable queue in the system and the CPU is idle, find the most suitable process in the queue for execution. Isn't this a good balance?
This is true, but there are some problems when multiple processors share an executable queue. Obviously, each CPU needs to lock the queue when executing the scheduling program, which makes it difficult for the scheduling program to run in parallel and may lead to a reduction in system performance. This problem does not exist if each CPU corresponds to an executable queue.
In addition, multiple executable queues have another benefit. This makes a process always run on the same CPU for a period of time. It is very likely that the data of this process is cached in cache at all levels of the CPU, which is conducive to the improvement of system performance.
Therefore, in Linux, each CPU has a corresponding executable queue, and an executable process can only be in one executable queue at a time.

As a result, the "multi-processor Load Balancing" problem is coming. The kernel needs to pay attention to the number of processes in each CPU executable queue and make appropriate adjustments when the number is not balanced. The kernel needs to be concerned with the time and intensity of process adjustment. Of course, try not to adjust the best. After all, it will consume CPU and lock the executable queue. The cost is not small.
In addition, the kernel also needs to care about the relationship between various CPUs. The two CPUs may be mutually independent, shared cache, or even virtualized by the same physical CPU through hyper-Threading Technology ...... The relationship between CPUs is also an important basis for load balancing. The closer the relationship, the lower the cost of migration between processes. See Linux kernel SMP Server Load balancer analysis.

3. Priority Inheritance
Due to mutual exclusion, a process (set to a) may sleep due to waiting for entry into the critical section. Process A is not awakened until the process (set to B) that occupies the corresponding resource exits the critical section.
There may be a situation where a has a very high priority and B has a very low priority. B enters the critical section, but is preemptible by other processes with higher priority (set to C). If it is not run, it cannot exit the critical section. Therefore, a cannot be awakened.
A has a high priority, but now it is reduced to B, and is preemptible by C with a low priority, resulting in execution delay. This phenomenon is called priority inversion.

This is unreasonable. A better solution is: when a starts to wait for B to exit the critical section, B temporarily obtains the priority of A (or assume that a's priority is higher than B), so as to smoothly complete the processing process and exit the critical section. Then B's priority is restored. This is the method of priority inheritance.
In order to implement priority inheritance, the kernel has to do a lot. For more details, refer to the article on "priority inversion" or "Priority Inheritance.

4. Thread-based interrupt handling
In Linux, the interrupt handler program runs in an unschedulable context. The entire process cannot be preemptible from the automatic switch from the CPU to the interrupt handler set in the kernel to the execution of the interrupt handler, to the exit of the interrupt handler.
If a process is preemptible, You can resume its operation at a later time by saving the information in its process control block (task_struct. The interrupt context does not have task_struct, and cannot be recovered if it is preemptible.
The interrupt handler cannot be preemptible, which means that the interrupt handler has a higher "Priority" than any other process (the process can be executed only after the interrupt handler is completed ). However, in actual application scenarios, some real-time processes may have a higher priority than the interrupt processing program.
As a result, some systems with higher real-time requirements give the interrupt handler task_struct and priority, so that they can be preemptible by high-priority processes when necessary. However, it is clear that doing this will cause a certain amount of overhead to the system, which is also a concession to achieve "real-time" performance.
For more details, refer to the "thread interruption" article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.