Introduction to process scheduling in Linux systems

Source: Internet
Author: User
Tags exit require resource sleep socket

The operating system is required to implement multiple processes and process scheduling.

It is said that process scheduling is one of the most important parts of the operating system. I think this statement is too absolute a bit, just like many people often say, "a certain function than a certain function of high efficiency xx times", divorced from the actual environment, these conclusions are relatively one-sided.

And how important is process scheduling? First, we need to be clear: process scheduling is the process of scheduling a task_running state (see "Analysis of Linux process status"). If the process is not executable (sleeping or otherwise), it has little to do with process scheduling.

So, if your system loads very low, expect the stars to look forward to the moon before an executable state of the process. Then process scheduling is not too important. Which process is executable, let it go, there's nothing more to consider.

Conversely, if the system load is very high, always have n multiple processes in the executable state, waiting to be scheduled to run. The process scheduler must do a lot of work to coordinate the execution of these n processes. Coordination is not good, the performance of the system will be greatly compromised. At this time, the process scheduling is very important.

Although many of the computers we normally touch, such as desktops, Web servers, and so on, are relatively low, Linux, as a general-purpose operating system, cannot assume that system load is low and must be carefully designed to handle process scheduling under high loads.

Of course, these designs have little use for environments with low load (and no real-time requirements). In extreme cases, if the CPU load is always 0 or 1 (there will always be a single process or no process needs to run on the CPU), then these designs are largely futile.

Priority level

The most basic way for the current operating system to coordinate the "simultaneous" operation of multiple processes is to define a priority for the process. The priority of a process is defined, and if more than one process is in the executable state, then the priority is to be executed and there is nothing to tangle with.

So how do you determine the priority of a process? There are two ways: specified by the user program, dynamically adjusted by the kernel scheduler. (The following will be said)

The Linux kernel divides processes into two levels: normal processes and real-time processes. While the priority of real-time processes is higher than normal processes, their scheduling strategies are also different.

Scheduling of real-time processes

In real time, the original meaning is "the given operation must be completed within a certain time". The point is not how quickly the operation must be handled, but whether the time is manageable (and, at worst, it cannot break through a given time).

Such "real Time" is called "hard real-time" and is used in very sophisticated systems (such as rockets, missiles, etc.). In general, hard real-time systems are relatively private.

The general operating system such as Linux is obviously unable to meet such requirements, interrupt processing, virtual memory, and other mechanisms of the existence of the processing time has brought great uncertainty. Hardware cache, disk seek, bus contention, also can bring uncertainty.

Consider, for example, "i++;" So a C code. In most cases, it executes very quickly. But in extreme cases there is a possibility:

1, I of the memory space is not allocated, the CPU triggers page faults. While linux in the processing code of the page faults to allocate memory, it may be due to the lack of system memory allocation failure, resulting in the process into sleep;

2, the code execution process in the hardware interruption, Linux into the interrupt handler and shelve the current process. In the process of interrupt handlers, new hardware interrupts may occur, and interrupts are never nested ...;

Wait a minute......

And such as Linux known as the implementation of the "real-time" general-purpose operating system, in fact, only to achieve a "soft real-time", that is, to meet the real-time needs of the process as far as possible.

If a process has real-time requirements (it is a real-time process), as long as it is executable, the kernel keeps it running to meet its CPU needs as much as possible until it completes what needs to be done and then sleeps or exits (becomes non executable).

If there are more than one real-time process in the executable state, the kernel first meets the CPU needs of the highest priority real-time process until it becomes a non executable state.

Thus, as long as the high priority of the real-time process has been in the executable state, the low priority of the real-time process has been unable to get the CPU, as long as the real-time process is in the executable state, the ordinary process has been unable to get the CPU.

So, what if multiple, same-priority real-time processes are in an executable state? Then there are two scheduling strategies to choose from:

1, Sched_fifo: Advanced First out. Later processes are scheduled to execute until the first executed process becomes a non executable state. Under this strategy, the first process can execute the Sched_yield system call, voluntarily abandons the CPU, in order to let the power to the later process;

2, Sched_rr: rotation scheduling. The kernel allocates time slices to the real-time process, allowing the next process to use the CPU when the time slice is exhausted;

Emphasize that both scheduling policies and Sched_yield system calls are only for situations in which multiple real-time processes of the same priority are in the executable state at the same time.

Under Linux, user programs can be used to set the scheduling policy and related scheduling parameters of the process through the Sched_setscheduler system, and Sched_setparam system calls are used only to set scheduling parameters. These two system calls require that the user process have the ability to set process priorities (cap_sys_nice, which generally requires root permissions) (see capability related articles).

By setting the policy of the process to Sched_fifo or SCHED_RR, the process becomes a real-time process. The priority of the process is specified by the above two system calls when the schedule parameter is set.

For real-time processes, the kernel does not attempt to adjust its priority. Because the process is real-time or not? How real-time? These problems are related to the user program's application scenarios, only users can answer, the kernel can not assume.

To sum up, the scheduling of real-time processes is very simple. The priority and scheduling policies of the process are determined by the user, and the kernel needs to always select the highest priority real-time process to schedule execution. The only thing that's slightly more troublesome is to consider two scheduling strategies when choosing a real-time process with the same priority.

Scheduling of common processes

The central idea of real-time process scheduling is to have the highest priority real-time process in the executable state occupy as much CPU as possible, because it has real-time requirements, while the ordinary process is considered a process without real-time requirements, so the scheduler tries to share the CPU with ordinary processes in an executable state. This allows the user to feel that these processes are running concurrently.

The scheduling of common processes is much more complex than real-time processes. The kernel needs to consider two things:

First, dynamic adjustment process priority

By the behavior characteristics of the process, you can divide the process into interactive processes and batch processes:

The primary task of interactive processes (such as desktop programs, servers, etc.) is to interact with the outside world. Such processes should have a higher priority, and they always sleep waiting for input from outside. And when the kernel wakes them up when the input arrives, they should be scheduled to execute quickly to respond. For example, a desktop program, if the mouse clicks after half a second has not responded, users will feel the system "card";

The primary task of batch processes, such as compiling programs, is to do continuous operations, so that they continue to be in an executable state. Such processes generally do not require high priority, such as the compiler to run more than a few seconds, the user will probably not care too much;

If the user is able to know exactly how the process should be prioritized, you can set the priority by using Nice, setpriority system tuning. (If you want to increase the priority of a process, you require that the user process be cap_sys_nice-capable.) )

However, an application may not be as typical as a desktop program or a compiler. The program may behave in a variety of way, perhaps for a while as an interactive process, and then as a batch process. So that it is difficult for users to set a proper priority.

Furthermore, even if a user is explicitly aware that a process is interactive or batch, it is mostly due to permissions or laziness that does not set the priority of the process. (Have you ever set a priority for a program?)

As a result, the task of distinguishing between the interactive process and the batch process falls to the kernel scheduler.

The scheduler focuses on the performance of the process over a period of time (primarily to check its sleep and run times), according to some empirical formulas, to determine whether it is now interactive or batch. How much is it? finally decided to make some adjustments to its priority.

When the priority of a process is dynamically adjusted, there are two priority levels:

1, the User program set priority (if not set, then use the default), called static priority. This is the benchmark for process prioritization, which is often unchanged during process execution;

2, the priority level dynamic adjustment, the actual effective priority. This value is likely to change all the times;

Second, the fairness of the scheduling

In systems that support multiple processes, ideally, each process should have a fairly high level of CPU ownership based on its priority. And there is no such thing as "who is lucky and who has more" than the uncontrollable situation.

Linux implementation of fair scheduling is basically two ideas:

1. Assign a time slice (by priority) to a process that is in an executable state, and the process that finishes the time slice is placed in the expiration queue. The process of the executable state is expired, and then the time slice is redistributed;

2, the dynamic adjustment of the priority level of the process. As the process runs on the CPU, its priority is continuously lowered so that other lower-priority processes get a chance to run;

The latter approach has a smaller scheduling granularity, and the "fairness" and "dynamic adjustment priority" two things in one, greatly simplifying the kernel scheduler code. Therefore, this approach is also a new favorite of the kernel scheduler.

Emphasize that the above two points are only for the ordinary process. And for real-time process, the kernel can not be romantic to the dynamic adjustment of priority, there is no fairness to say.

The general process specific scheduling algorithm is very complex, and with the Linux kernel version of the evolution is also changing (not only simple adjustment), so this article will not continue to drill down.

Efficiency of the Scheduler

Priorities define which processes should be scheduled to execute, and the scheduler must also be concerned with efficiency issues. The scheduler, like many processes in the kernel, is frequently executed, wasting a lot of CPU time and causing system performance to degrade if inefficient.

In Linux 2.4, the process of executable state is hung in a linked list. Each schedule, the scheduler needs to scan the entire list to find the optimal process to run. The degree of complexity is O (n);

In the early days of Linux 2.6, the process of executable state was hung in the N (n=140) list, each linked list represented a priority, and how many priorities were in the system and how many linked lists were there. Each schedule requires that the scheduler only take the process from the first list that is not empty to the one that is in the header of the chain. In this way, the efficiency of the scheduler is greatly improved, and the complexity is O (1).

In recent versions of Linux 2.6, the process of executing a state is hung in order of precedence in a red-black tree (which can be imagined as a balanced binary tree). Each schedule requires the scheduler to find the highest priority process from the tree. The degree of complexity is O (logn).

So why is the complexity of the scheduler choosing processes increased when it comes from the early days of Linux 2.6 to the recent Linux 2.6 version?

This is because, at the same time, the scheduler's implementation of fairness changes from the first thought mentioned above to the second idea (implemented by dynamic adjustment of priorities). The O (1) algorithm is based on a small number of linked lists to achieve, in my understanding, this makes the priority range is very small (very low distinction), can not meet the needs of fairness. The use of red-black trees has no limit on priority values (you can use 32-bit, 64-bit, or more bits to represent priority values), and O (logn) complexity is also very efficient.

Timing of scheduling triggers

The trigger of the dispatch mainly has the following several situations:

1, the current process (the process running on the CPU) state becomes non executable.

The process executes the system call actively into a non executable state. such as performing nanosleep into sleep, execution exit exit, and so on;

The resource requested by the process is not satisfied and is forced into sleep. For example, when performing the read system call, the disk cache does not have the required data, thus sleep waiting for disk IO;

The process responds to the signal and becomes a non executable state. For example, the response sigstop into a paused state, response Sigkill exit, and so on;

2, preemption. When the process is running, it is not expected to be deprived of the right to use the CPU. There are two different scenarios: the process has run out of time slices, or a higher priority process has occurred.

Higher-priority processes are awakened by the impact of processes running on the CPU. If the signal is actively awakened, or is awakened by releasing a mutex (such as releasing a lock);

In response to the clock interrupt, the kernel discovers that the time slice of the current process is exhausted;

The kernel wakes up in response to interrupts by discovering that the external resources that the higher priority process waits for become available. For example, the CPU received a network card interrupt, the kernel processing the interrupt, found that a socket readable, so wake is waiting to read the socket process, and then, for example, the kernel in the process of processing the clock interrupt, triggering the timer, so as to wake up the corresponding Nanosleep system call sleep in the process.

When all tasks are based on the Linux time-sharing scheduling strategy:

1, create tasks Specify a time-sharing scheduling policy and specify the priority nice value ( -20~19).

2, the execution time on the CPU (counter) is determined based on the nice value of each task.

3, if the resource is not waiting, the task is added to the ready queue.

4, the scheduler traverses the task in the ready queue, through the calculation of the dynamic priority of each task (Counter+20-nice) results, select the largest one to run the calculation results, when the time slice is used up (counter to 0) or actively discard the CPU, The task will be placed at the end of the ready queue (run out of time) or wait for the queue (the CPU is discarded because it waits for resources).

5, the scheduler repeats the above calculation process and moves to step 4th.

6, when the scheduler finds that all the ready tasks are calculated with a weight of less than 0 o'clock, repeat step 2nd.

All Tasks are FIFO:

1, when the process is created, the FIFO is specified and the real-time priority rt_priority (1-99) is set.

2, if the resource is not waiting, the task is added to the ready queue.

3, the Scheduler to traverse the ready queue, based on real-time priority calculation of the scheduling weights (1000+rt_priority), the highest choice of the task to use the CPU, the FIFO task will always occupy the CPU until a higher priority task is ready (even if the priority is not the same) or actively give up ( Wait for resources).

4, the scheduler finds that higher-priority tasks arrive (high-priority tasks may be interrupted or awakened by the timer task, or awakened by the currently running task, etc.), and the scheduler immediately saves all data for the current CPU registers in the current task stack. Re-load the register data to the CPU from the High-priority task's stack, at which point the high-priority task begins to run. Repeat step 3rd.

5, if the current task is actively abandoning CPU access because it waits for a resource, the task is removed from the ready queue, joined in the wait queue, and the 3rd step is repeated at this time.

When all tasks are based on the RR scheduling policy:

1, when you create a task, you specify the schedule parameter as RR and set the task's real-time priority and nice value (the nice value will be converted to the length of the time slice of the task).

2, if the resource is not waiting, the task is added to the ready queue.

3, the scheduler traverses the ready queue, calculates the dispatch weights (1000+rt_priority) according to the real-time priority, and the task with the highest selection weights uses the CPU.

4, if the RR task time slice in the ready queue is 0, the time slice of the task is set based on the Nice value, and the task is placed at the end of the ready queue. Repeat step 3.

5, the current task is joined in the wait queue because it actively exits the CPU because it waits for resources. Repeat step 3.

The system has both time-sharing scheduling, time slice rotation scheduling and advanced first out scheduling:

The process of 1,RR scheduling and FIFO scheduling belongs to real-time process, and the process of time-sharing scheduling is not real time process.

2, when the real-time process is ready, if the current CPU is running a non-real-time process, the real-time process immediately grabs the non-real-time process.

The 3,RR process and FIFO process adopt real-time priority as the weight standard of scheduling, and RR is an extension of FIFO. FIFO, if two processes have the same priority, then these two priority processes are executed by their unknown in the queue, which leads to some unfairness (priority is the same, why do you keep running?) if the scheduling policy for the two priority tasks is set to RR, Guarantees that the two tasks can be implemented in a circular and fair.

Ingo molnar-Real-time patch

In order to be incorporated into the mainstream kernel, the Ingo Molnar real-time patch also employs a very flexible strategy that supports four preemption modes:

1.No forced preemption (server), which is equivalent to a standard kernel without the ability to preempt options, is primarily applicable to server environments such as scientific computing.

2.Voluntary Kernel preemption (Desktop), which enables voluntary preemption, but still fails the preemption kernel option, which reduces preemption latency by increasing the preemption point, and therefore applies to environments that require better responsiveness, such as the desktop environment, Of course, this good responsiveness is at the expense of some throughput rates.

3.Preemptible Kernel (low-latency Desktop), which includes both voluntary preemption and the ability to preempt kernel options, has a very good response latency, actually to a certain extent has reached the soft real-time. It is mainly applied to desktop and some embedded systems, but the throughput rate is lower than mode 2.

4.Complete preemption (real-time), which enables all real-time functions and is fully capable of meeting soft real-time requirements, which is suitable for real-time systems with latency requirements of 100 microseconds or less.

Real-time implementation is at the expense of the throughput of the system, so the better the real-time, the lower the throughput rate of the system.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.