"Linux kernel design and implementation" Reading notes of the fourth chapter

Last Update:2016-04-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The 4th chapter of the process scheduling more than 4.1 tasks

Multi-tasking system can be divided into: non-preemptive multi-task and preemptive multi-task.
Linux provides preemptive multi-tasking mode.

In preemptive multitasking mode, the scheduler decides when to stop a process from running. This mandatory suspend action is called preemption .

In non-preemptive multitasking mode, it will execute until the process itself stops running. Processes that actively suspend their operations are called concessions .

Process scheduling for 4.2 Linux

1, O (1) Scheduler scheduling algorithm for scheduling those response time-sensitive programs have congenital defects.

A response-time-sensitive program calls it an interactive process.

2, O (1) Scheduler performance comparison:

Pros: Ideal for large server workloads
Disadvantage: Poor performance on desktop systems that have interactive programs to run because they lack an interactive process

3. "Complete Fair Scheduling algorithm" (CFS)

This algorithm absorbs the theory of queue and introduces the concept of fair dispatch to the Linux scheduler. It was later referred to as the "complete fair dispatch collision", or simply CFS.

4.3 Policies 4.3.1 I/O consumption and processor-consumed processes

1, the process can be divided into I/O consumption and processor consumption type.

2. The scheduling strategy usually seeks to strike a balance between two conflicting objectives: Rapid Process Response (short response time) and maximum system utilization (high throughput).

4.3.2 Process Priority

1. Linux uses two different priority ranges.

The first is a nice value, ranging from 20 to +19, with a default of 0: The larger the nice value means the lower priority.

Mac OS X, the nice value of the process represents the absolute values of the time slices assigned to the process

In a Linux system, nice values represent the proportions of the time slices.

You can view the process column decay in the system through the Ps-el command, and a column labeled NI in the results is the nice value for the process.

The second range is real-time priority, its value is configurable, and by default it varies from 0 to 99. The higher the real-time priority value means the higher the process priority.

2. See the list of processes in your system, and their corresponding real-time priority:

Ps-eo state, Uid,pi D,ppid,rtpri O,time,comm.

If there is a process corresponding column showing "•", it is not a real-time process.

4.3.3 Time Slice

1. A time slice is a numeric value that indicates how long a process can continue to run before it is preempted.

Too long a time slice can cause the system to respond poorly to interactions

Time slices are too short to significantly increase the processor time that is brought by process switching

2, whether to put a process into operation immediately, is completely determined by the process priority and whether there is time slice.

Activities of the 4.3.4 scheduling policy

A system that has two running processes: a text-editing program and a video coding program that we have two goals for.

The first is that we want the system to have more processor time for the text editing program

The second is that we want the text editor to preempt the video decoding program when it wakes up.

The goal is to rely on the system assigned to the text editor higher priority and more time slices than the video decoder program.

4.4 Linux Scheduling algorithm 4.4.1 Scheduler class

1. The Linux Scheduler is provided in a modular manner.

This modular structure is called the Scheduler class, it allows a number of different dynamically added scheduling algorithms coexist, scheduling belongs to their own category of processes.

Each scheduler has a priority, and the underlying scheduler code is defined in the Kernel/scbed.c file.

2, the Complete Fair Dispatch (cfs) is a scheduling class for the general process, the CFS algorithm implementation is defined in the file Kemel/scbed_ Fair. In C.

Called Sched_normal in Linux

Called Sched_other in POSIX

Process scheduling in 4.4.2 Unix system

1. The modern process Scheduler has two general concepts: process priority and time slice.

A time slice is how much time the process is running, and a default time slice will be available once the process is started.

Processes with higher priority will run more frequently and will be given more time slices.

2. Related issues

First, to map a nice value to a time slice, you will necessarily need to correspond the Nice unit value to the absolute time of the processor. However, doing so will result in the process switching not being optimized.

In fact, a process that is given a high nice value (low priority) is often a background process and mostly computationally intensive: while the normal priority process is more of a foreground user task. So this time-slice distribution is obviously with the original intention of the road.

The second question concerns the relative nice value, and it also has to do with the previous friendly value to the time slice mapping relationship.

Workaround: Increase the nice value in geometric increments rather than arithmetic

The third problem is that if you perform a nice value to a time slice mapping, we need to be able to allocate an absolute time slice, and this absolute time slice must be within the test range of the kernel.

WORKAROUND: Use a new measurement mechanism to separate the map from Nice value to time slice from the timer beat.

The fourth issue concerns the priority-based scheduler that wakes up the process in order to optimize the interactive task.

3, the real problem is the allocation of absolute time min tablets triggered by the fixed switching frequency, to fairness caused a lot of variables.

4, the CFS approach is the fundamental redesign of the time-slice allocation Method-The total rejection of the time slice is allocated to the process of a processor use proportion.

In this way, CFS ensures a constant level of fairness in process scheduling and shifts the switching frequency to constant change.

4.4.3 Fair Dispatch

1. CFS calculates how long a process should run based on the total number of running processes, rather than relying on the nice value to calculate the time slice. The nice value is the weight of the processor run ratio in CFS as the process gets.

2, CFS set a goal for the approximate value of the infinite small scheduling period in the perfect multi-tasking.

"Target Latency"-the smaller the scheduling cycle, the better the interactivity and closer to the perfect multitasking. However, higher switching costs and lower total system throughput capacity must be borne.

3. Minimum granularity: CFS leads to the time slice that each process obtains, the bottom line is called the minimum granularity.

4, only the relative value will affect the processor time distribution ratio.

5. The processor time obtained by any process is determined by the relative difference of the nice value of its own and all other operational processes.

The CFS is called a fair scheduler because it ensures that each process is given a fair processor usage ratio. It reduces the unfairness of scheduling delay in multi-process environment.

Implementation of 4.5 Linux scheduling

The relevant code for CFS is in file KERNEL/SCHED_FAIR.C. Its four components:

Time Billing

Process Selection

Dispatcher entry

Sleep and wake up

4.5.1 Time Accounting

1. Scheduler Entity Structure
CFS uses the scheduler entity structure to track process runs, defined in the Struct_ sched_ entity of the file .

2. Virtual Real-time
Vruntime The virtual run time of the variable storage process. CFS uses the Vruntime variable to record how long a program is running and how long it should run.

The accounting function is implemented by the Update_ Curr () function defined in the Kemel/sched_ fair.c file.

Update_ CurrO calculates the execution time of the current process and stores it in the variable Delta_ exec.

Update_ Curr () is periodically called by the system timer, and in this way vruntime can accurately measure the run time of a given process and know who should be the next running process.

4.5.2 Process Selection

1, the core of CFS scheduling algorithm: Select the task with minimum vruntime.

CFS uses a red-black tree to organize a process that can run a process queue and use it to quickly find the minimum vruntime value.

2, concrete steps;

1. Pick the next task

The process selection algorithm of CFS can be summarized simply as "running the process represented by the leftmost leaf node in the Rbtree tree". The function that implements this procedure is _ Pick_ next_ entity (), which is defined in the file Kemel/sched_ fair.c.

2. Join the process to the tree

The Enqueue_entity () function implements adding a process to the rbtree and how to cache the leftmost leaf node.

The function updates the run time and some other statistics, and then calls _enqueue_entity () for a heavy insert operation that actually inserts the data item into the red and black tree.

The basic rule of balancing a binary tree is that if the key value is less than the key value of the current node, you need to turn to the left branch of the tree: Conversely, if it is larger than the current node's key value, move to the right branch.

From the tree, the actual work is done by the auxiliary function _dequeue_entity (), except that the process delete action occurs when the process is blocked or terminated.

4.5.3 Scheduler Entry

1, the main entry point of the process scheduling is the function schedule (), which is defined in the file KERNEL/SCHED.C.

It is the other part of the kernel that invokes the process Scheduler's Portal: Select which process can run and when it will be put into operation.

Schedule () usually needs to be associated with a specific scheduling class.

The only important thing in this function is that it calls Pick_ Next_ task () also defined in the file kernel/sched.c)

2, Pick_ Next_task () will be in order of priority, from high to low, check each scheduling class sequentially, and from the highest priority scheduling class, select the highest-priority process.

4.5.4 Sleep and wake up

1. A common cause of hibernation is file I/O.

2, the operation of the kernel is the same:

The process marks itself as dormant, moves from the executable red-black tree, puts it in the wait queue, and then calls schedule () to select and execute a different process.

The process of awakening is the opposite-the process is set to executable state and then moved from the wait queue to the executable red-black tree.

3. Hibernation has two related process states: Task_ interruptible and Task_ uninterruptible.

The only difference is that the process in the Task_ uninterruptible ignores the signal, while a process in the Task_ interruptible state receives a signal that wakes up early and signals the response.

4. Waiting queue
Hibernation is processed by waiting for a queue.
The process adds itself to a wait queue by performing the following steps:

1. Call Macro Define_ Wait () to create an item waiting for the queue.

2. Call add_ wait_ Queue () to add yourself to the queue.

3. Call the prepare to wait () method to change the status of the process to Task_ interruptible or task_ uninterruptible.

4. If the status is set to Task_interruptible, the signal wakes up the process.

5, when the process is awakened, it will check again whether the condition is true.

6. When the condition is met, the process sets itself to Task_ RUNNING and calls the Finish_ Wait () method to move itself out of the waiting queue.

5. Wake Up
The wake operation is performed through the function Wake_ up (), which wakes up all processes on the specified wait queue. It calls the function Try_ to_ wake_ up (), which is responsible for setting the process to the Task_ RUNNING state.

Usually which code urges the wait condition to be reached, it is responsible for subsequent calls to the WAKE_UP () function.

4.6 Preemption and Context switching

Context switching, which is the process of switching from one executable process to another, is handled by the Context_switch () function defined in KERNEL/SCHED.C.

1. Schedule () calls the function and accomplishes two basic tasks:

with Switch_ mm () declared in , this function is responsible for switching virtual memory from the previous process map to the new process.

Invoke Switch_to () declared in <asm/system.h>, which is responsible for switching from the processor state of the previous process to the processor state of the new process.

2. Need_ resched flag: Indicates whether the schedule needs to be re-executed.

Scheduler_ Tick () sets this flag when a process should be preempted

Try_ to_ Wake_ Up () also sets this flag when a high-priority process enters the executable state.

4.6.1 User preemption

User preemption occurs when the following conditions occur:

When returning user space from system transfer

When returning user space from an interrupt handler

4.6.2 kernel preemption

Kernel preemption can occur in:

The interrupt handler is executing and before the kernel space is returned

The kernel code once again has the time to be preemptive.

If the task in the kernel explicitly calls schedule ().

If the task in the kernel is blocked.

4.7 Real-Time scheduling strategy

1. Linux provides two real-time scheduling strategies: Sched_ FIFO and Sched_ RR. The normal, non-real-time scheduling strategy is sched_ normal. The specific implementation is defined in the file kemel/sched_rt.c. In

2. Sched_fifo implements a simple, first-in, first-out scheduling algorithm: It does not use time slices.

Only higher-priority SCHEDFIFO or SCHEDRR tasks can preempt sched_fifo tasks. is a static priority.

3. Sched_ RR is a sched_fifo-with time slice. This is a real-time rotation scheduling algorithm. is a static priority.

4, Linux Real-time scheduling calculation and hanging provides a soft real-time working mode. The implication of soft real-time is that the kernel dispatches the process and tries to make the process run before its limited time, but the kernel does not guarantee that the requirements of these processes can always be met. In contrast, the hard real-time system is guaranteed to meet any scheduling requirement under certain conditions.

4.8 Scheduling-related system calls 4.8.1 system calls related to scheduling policies and priorities

Sched_ Setscheduler () and Sched_ Getscheduler () are used to set and get the scheduling policy and real-time priority of the process, respectively.

The most important work is to read or overwrite the values of the process taststruct's policy and RT priority.

Sched_ SetParam () and Sched_ GetParam () are used to set and get the real-time priority of a process, respectively.

These two system calls get encapsulated in the Rt_ priority of the sched_ param special struct.

Sched_ get_ priority_ Max () and Sched_ get_ priority_ min () are used to return the maximum and minimum priority for a given scheduling policy, respectively.

The maximum priority for a real-time scheduling strategy is max_ user_ Rt_ prio minus 1, and the minimum priority equals 1.

The nice () function calls the kernel's set_ user_ nice () function, which sets the Static_ Prio and Prio values of the process's task_ struct.

4.8.2 system calls related to processor bindings

The Linux Scheduler provides a mandatory processor binding mechanism.

When processing is first created, it inherits the correlation mask of its parent process. Because the parent process runs on the specified processor, the child process also runs on the appropriate processor.

When the processor binding relationship changes, the kernel pushes the task onto the combined processor using a "porting thread".

The load balancer only pulls the task onto the allowed processor, so the process runs only on the specified processor, and the processor designation is set by the Cpus_ allowed domain of the process descriptor.

4.8.3 Discard Processor Time

Linux through the sched_ yield () system call provides a mechanism for the process to explicitly cede processor time to other waiting execution processes. It is implemented by moving a process from the active queue to an expired queue.

Kernel code for convenience, you can call yield () directly

User-space applications are directly using the sched_ yield () system call.

4.9 Summary

This chapter examines the

The basic principles that process scheduling follows

Specific implementation

Scheduling algorithm

The current interface used by the Linux kernel

Linux kernel design and implementation read the fourth chapter of the Reading notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Linux kernel design and implementation" Reading notes of the fourth chapter

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Linux kernel design and implementation" Reading notes of the fourth chapter

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support