"Linux kernel Design and implementation" chapter fourth reading notes

Last Update:2016-04-17 Source: Internet

Author: User

Tags ranges

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

4.1 Multi-tasking

A multitasking operating system is an operating system that can concurrently interact with multiple processes concurrently.

Multi-tasking systems can be divided into two categories:

Non-preemptive multi-tasking
The process will continue until it has voluntarily stopped running
Preemptive multi-tasking
The Linux/unix uses a preemptive approach; the action of the forced suspend process is called preemption.

Like all Unix variants and many other modern operating systems, Linux provides a preemptive multitasking model.

The time slice of the process: the time the process can run before it is preempted is pre-set.

4.2 Linux Process Scheduling 36

From the 1th edition of Linux in 1991 to the later 2.4 kernel series, the Linux scheduler was fairly rudimentary, the design almost primitive, of course it was easy to understand, but it was difficult to handle in many running processes or multiprocessor environments, and because of this, in the Linux 2.5 development series of cores, The dispatcher did a big operation and started using a new scheduler called the O (1) Scheduler--it was named because of its algorithm's behavior.
It solves many of the shortcomings of previous versions of the Linux Scheduler, introducing many powerful new features and performance features, mainly thanks to the static time slice algorithm and the run queue for each processor, which helped us to get rid of the limitations of the previous scheduling program.
The O (1) scheduler, while still showing near-perfect performance and scalability in a multi-processor environment with several 10 (not hundreds) of processors, proves that the scheduling algorithm has some inherent weaknesses in scheduling response time-sensitive programs, These programs we call it interactive process one it undoubtedly includes all programs that require user interaction, and because of this, the O (1) scheduler, while ideal for large server workloads, is poorly performing on desktop systems where many interactive programs are running because of the lack of interactive processes, Since the beginning of the 2.6 kernel system development, the developer introduced a new process scheduling algorithm to improve the scheduling performance of the interactive program, the most famous of which is the "inverse stair deadline scheduling algorithm, which absorbs the queue theory and introduces the concept of fair dispatch to the Linux scheduler." And finally replaced the O (1) scheduling algorithm in the 2.6.23 kernel version, which is now called the "complete Fair scheduling algorithm", or simply referred to as CFS.

4.3 Policies 4.3.1 I/O consumption and processor-consumed processes

The scheduling strategy usually seeks to balance the two conflicting goals: the process responds quickly (short response times) and the maximum system utilization (high throughput), in order to meet the above requirements, the scheduler usually uses a very complex set of algorithms to determine the most worthy of running the process into operation, However, it is often not guaranteed that the low priority process will be treated fairly, the Unⅸ system Scheduler is more inclined to the I/O consumption program to provide better program response speed, Linux in order to ensure the performance of interactive applications and desktop system, so the response to the process is optimized (shorten the response to think of) more inclined to priority scheduling I /o consuming process, though, but below you will see that the scheduler is not ignoring processor-consuming processes.

4.3.2 Process Priority

The most basic class of scheduling algorithm is based on priority scheduling, which is a process based on the value of the process and its demand for the processor time to grade the idea, usually the high priority of the process first run, low after the run, the same priority of the process by the rotation of the schedule (one by one, repeated).
In some systems, high-priority processes use a longer slice of time, and the scheduler always chooses which time slices are not exhausted and the highest priority is run, and both the user and the system can influence system scheduling by setting the priority of the process.
Linux takes two different priority ranges, the first of which is the nice value, which ranges from ―20 to +19. The second range is real-time priority.
You can use-Ps-eo State,uid,pid,ppid,rtprio,time,comm to see the process column decay in your system and their corresponding real-time priority (located under the Rtprio column), where
"•" indicates that it is not a real-time process.

4.3.3 Time Slice

Time Slice e is a numeric value that indicates how long a process can continue to run before it is preempted. The scheduling policy must specify a default time slice, but this is not a simple matter, the time slice is too long can cause the system to respond poorly to the interaction, people feel that the system can not execute the application concurrently: The time slice is too short will significantly increase the processor consumption of process switching because there will certainly be a considerable amount of system time used in process switching, While these processes can be used to run the time slice is very short, in addition, I/O consumption and processor consumption of the process of the contradiction is again revealed here: I/O consumption does not require a long time slice, and the processor consumption of the process is expected to be as long as possible (for example, so that their cache hit rate higher)

4.4 Linux Scheduling algorithm 4.4.1 Scheduler class

The Linux scheduler is provided in a modular manner, which is intended to allow different types of processes to selectively select scheduling algorithms. This modular structure is called the Scheduler class.
It allows a number of different dynamically added scheduling algorithms to coexist, scheduling the process belongs to their own category.
Each scheduler has a priority, the underlying scheduler code is defined in the Sched_fair.c file, it traverses the scheduling class in order of precedence, and the Scheduler class with the highest priority of an executable process wins, choosing the program to execute below. The Complete Fair Dispatch (CS) is a scheduling class for ordinary processes, called Sched_normal in Linux.

Process scheduling in 4.4.2 Unix system

Most of the problems can be solved by modifying the traditional UNIX scheduler, although this modification is not small, but it is not a structural adjustment, for example, Solve the second problem by adding the nice value as a geometric increment rather than an arithmetic increment: A new measurement mechanism is used to separate the map from Nice value to time slice from the timer beat to solve the third problem. But these solutions have avoided the real problem-the fixed switching frequency caused by allocating absolute time slices, which creates a lot of variability in fairness, and the method used by CFS is to radically redesign the time-slice allocation (in terms of the process scheduler) by completely discarding the time slice and assigning it to the process a processor to use the weighting , in this way, the CFS ensures that the process scheduling can have a constant fairness, and the switching frequency is placed in constant change.

4.4.3 Fair Dispatch

The starting point of CFS is based on a simple idea: the effect of process scheduling should be as good as the system--a perfect multi-tasking in an ideal, and we can dispatch them to an infinitely small time period, so that in any measurable cycle, we give the same number of running times for each process in a process. For example, if we have two running processes, in the standard Unⅸ scheduling model, we run one of the 5ms and then run another, 5ms. However, they will occupy 100% of the processor at any one of their runtimes. Ideally, the perfect multitasking model should look like this: we can run two simultaneous processes within 5ms, each using half the capacity of the processor.

Implementation of 4.5 Linux scheduling 4.5.1 time accounting

All schedulers must be billed for the process run time. Most UNIX systems allocate a time slice to each process. The time slice will be reduced by one cycle per tick when the system clock ticks occur.

Scheduler entity Structure
Virtual Real-time

4.5.2 Process Selection

In our discussion in the previous section, we talked about the vruntime values of all the running processes would be the same if there was a perfect multitasking processor, but in fact we didn't find the perfect multi-tasking processor, Therefore, CFS tries to use a simple rule to balance the virtual running time of the process: when the CFS needs to choose the next running process, it will pick a process with minimal vruntime, which is actually the core of the CSF scheduling algorithm: Choose the task with minimum and vruntime. So the rest of this is about how to implement a process that chooses a minimum, vruntime value. Linux, red black tree called Rbtree, it is a self-balancing binary search tree, red and black tree is a tree node form of the data stored, the data will correspond to a key value, we can use these key values to quickly retrieve the data on the node (it is important that The speed of the corresponding node is retrieved by the key value and the node of the whole tree is modeled as an exponential ratio.

Pick the next task
Join a process to the tree

4.5.3 Scheduler Entry

The primary entry point for process scheduling is schedule (), which is defined in file KERNEL/SCHED.C.

4.5.4 Sleep and wake up

Hibernation (blocked) process is in a special non-executable state, which is very important, if there is no such a special state, the dispatcher may elect a process that is unwilling to be executed, worse, the hibernation must be implemented by polling, process hibernation for a variety of reasons, but must be to wait for some events , the event may be a period of time to read more data from a file, or a hardware event, a process may also be forced into hibernation when trying to acquire an already occupied kernel semaphore, a common cause of hibernation is that the file i/o--, such as the process on a file performed a read () operation, which needs to be read from the disk, Also, the process has to wait for the keyboard input, in either case, the kernel operates the same way: the process marks itself as a hop-off state, moves out of the tree of the licensed course trees, puts it into the waiting queue, and then calls schedule () to select and execute--a different process, The process of awakening is the opposite: the process is set to the executable state and then moved from the wait queue to a clear red-black tree.

4.6 preemption and Context switching 4.6.1 user preemption

When the kernel returns to the user space, it knows that it is safe, because since it can continue to execute the current process, it can of course choose a new process to execute. Therefore, whether the kernel returns after an interrupt handler or a system call, the need_resched flag is checked, and if it is set, the kernel chooses a different (more appropriate process to run. The return path returned from the interrupt handler or system call is architecture-related, in entry. S (this file contains not only the kernel entry part of the program, the kernel exit part of the relevant code is also in it) in the file is implemented through assembly language. In short, a user preemption occurs when the following conditions occur:

When returning user space from system transfer.
When the user space is returned from the interrupt handler.

4.6.2 kernel preemption

Unlike most of the other Unⅸ variants and most other operating systems, Linux supports kernel preemption in its entirety, and in kernels that do not support kernel preemption, the kernel code can be executed until it is complete, that is, the scheduler has no way to reschedule when a kernel-level task is executing-- The tasks in the kernel are not preemptive because they are scheduled in a cooperative manner. Kernel code has been executed until completion (return to user space) or obvious blocking, in the 2.6 kernel, the kernel introduced the preemption capability, now, as long as the re-Dispatch is secure, the kernel can at any time preempt the task being performed.

4.7 Real-Time scheduling strategy

Linux real-time scheduling algorithm provides a soft real-time mode of operation, soft real-time meaning is that the kernel scheduling process, try to make the process before its limited time to run, but the kernel is not guaranteed to always meet the requirements of these processes. In contrast, the hard real-time system is guaranteed to meet any scheduling requirement under certain conditions. Linux does not guarantee the scheduling of real-time tasks. Although the hard real-time operation is not guaranteed, the performance of the Linux real-time scheduling algorithm is very good. The 2.6 version of the kernel can meet strict time requirements.

4.8 Scheduling-related system calls 4.8.1 system calls related to scheduling policies and priorities

Sched_setscheduler () and Sched_getscheduler () are used to set and get the scheduling policy and real-time priority of the process, respectively.

Similar to other system invocations, their implementations are made up of many parameter checks, initialization, and cleanup.

The most important job is to read or overwrite the values of the policy and rt_priority of the process task_struct.

Sched_setscheduler () and Sched_getscheduler () are used to set and get the real-time priority of a process, respectively.

These two system calls get encapsulated in the rt_priority of the SCHED_PARAM special structure. The maximum priority of a real-time scheduling strategy: Max_ Userrt_prio minus 1. The minimum priority is equal to 1.

For a normal process, the nice function can increase the static priority of a given process by a given amount.

Only a superuser can use a negative value when calling it, thereby increasing the priority of the process.

The nice function calls the kernel's Set_user_nice function, which sets the task_struct Static_prio value of the process.

4.8.2 system calls related to processor bindings

The Linux Scheduler provides a mandatory processor binding mechanism.

That is, although it tries to try to make the process run on the same processor as much as possible through a soft (or natural) affinity, it also allows the user to force the designation "this process must run on these processors anyway".

This enforced affinity is stored in a bitmask flag of the process.

Each bit of the mask flag corresponds to a system-available processor, and by default all bits are set.

"Linux kernel Design and implementation" chapter fourth reading notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More