"Linux kernel design and Implementation" book fourth chapter study Summary

Source: Internet
Author: User
Tags bitmask

Process Scheduler 4.1 multitasking

A multitasking operating system is an operating system that can simultaneously concurrently execute multiple processes concurrently.

The multitasking system is divided into two types:

    • Preemptive multitasking: Linux provides a preemptive multi-tasking mode that is determined by the scheduler to stop running a process.

      Modern operating system provides: Dynamic time slice calculation method; configurable calculation strategy

    • Non-preemptive multi-tasking: Unless the process itself stops running itself, it will continue to execute.

      The scheduler cannot hide from each process for how long it takes to make uniform rules, so process exclusive processor time may exceed the user's expectations

Process scheduling for 4.2 Linux

O (1) Scheduler: The workload on large servers is ideal, but the interaction process is missing.

RSDL, reverse stair deadline scheduling algorithm (also known as CFS, perfect fair scheduling algorithm. )

4.3 strategy

The policy determines what program the scheduler is running in.

(i) two typical processes1. I/O expendable process

Most of the time the process is used to commit I/O requests or wait for I/O requests, often in a running state but running for a short period of time, and eventually blocking when waiting for more requests.

2. Processor-intensive Processes

Most of the time is spent on executing code, unless it is preempted, and will usually run continuously.

A scheduling strategy usually seeks to balance the two conflicting goals:

    • Fast process scheduling (short response times)
    • Maximum system utilization (high throughput)
(ii) Process priorities

The most basic class of scheduling algorithm is the priority-based scheduling. This is an idea of the process rating based on the value of the process and its need for processor time.

The scheduler always chooses the process in which the time slice is not exhausted and the highest priority is run.

Linux uses two different priority ranges:

    • Nice

      range [ -20,19], the default value is 0;
      The higher the Nice value, the lower the priority;
      The nice value of the Linux system represents the ratio of time slices;
      The Ps-el command looks at the list of processes in the system, and NI lists the nice values.

    • Real-time priority

      The value can be configured, and the default range of changes is [0,99];
      The higher the value, the higher the priority;
      Any real-time process has a higher priority than a normal process.

(c) Time slices

A time slice represents the time that a process can continue to run before it is preempted.

    • I/O consuming processes do not require a long time slice
    • Processor consumption process I want the time slice to be as long as possible.

The CFS scheduler for Linux does not allocate time slices directly to the process, but instead distributes the processor's use gestures to the process. The processor time that the process obtains is closely related to the system load. This ratio is affected by the nice value, and the nice value is used as a weight to adjust the processor time use ratio used by the process:

    • High nice (Low priority) will be given a low weight, thereby losing a small proportion of processor use ratio;

    • The low nice value (high priority) will be given a high weight, thus robbing more processor use ratio.

The Linux process is preemptive, and whether preemption is determined entirely by the priority of the process and whether it has a time slice.

CFS preemption: The timing of preemption depends on how much processor is consumed by the new executable, and if the consumption is smaller than the current process: The new program is immediately put into operation to preempt the current process, otherwise it is postponed.

4.4 Linux Scheduling algorithm(i) Scheduler class

Scheduler class: allows a number of different dynamically added scheduling algorithms coexist, scheduling belongs to their own category of processes. Each scheduler has a priority, which traverses the scheduling class in order of priority and selects the Scheduler class with the highest priority.

The completely fair dispatch of CFS is a scheduling class for ordinary processes.

(ii) process scheduling in Unix systems

The scheduling algorithm used by UNIX is to allocate absolute time slices, which will trigger a fixed switching frequency, which is not conducive to fairness.
The CFS used by Linux completely abandons the time slice, assigning the process a processor to use the weighting, guaranteeing constant fairness and changing the switching frequency.

(iii) Fair dispatch of CFS

Allow each process to run for a period of time, cycle round, select the least-run process as the next running process, and calculate how long a process should run based on the total number of running processes. The nice value is the weight of the processor run ratio obtained by the process.

Implementation of 4.5 Linux scheduling

Four components of the CFS scheduling algorithm implementation:

    • Time Billing
    • Process Selection
    • Dispatcher entry
    • Sleep and wake up
(i) Time accounting

All schedulers must be billed for the process run time.

1. Scheduler Entity Structure

CFS no longer has the concept of time slices, but it must also maintain the time accounting for each process run. Use the scheduler entity structure to track process run accounting: The SE variable in the process descriptor.

2. Virtual Real-time

CFS uses the Vruntime variable to hold the virtual run time of the process, to indicate how long the process is running, and how long it should run.

This virtual run time is weighted, regardless of the timer beat.定义在kernel/sched_fair.c文件中的update_curr()函数实现了该记账功能。

它计算了当前进程的执行时间并存放入变量delta_exec中,然后又将运行时间传递给__update_curr();__update_curr()根据当前可运行进程总数对进行时间进行加权计算,最终将权重值与当前运行进程的vruntime值相加。
(ii) Process selection

Core of the CFS scheduling algorithm: Selecting tasks with minimal vruntime

CFS uses a red-black tree to organize a process that can run a process queue and use it to quickly find the minimum vruntime value.

Linux, Red black tree is called Rbtree, is a self-balancing binary search tree, is a tree node form of data stored, the data will correspond to a key value, can be used to quickly retrieve the data on the node, and the retrieval speed and the entire tree node size into an exponential ratio relationship.

1. Pick the next task

The node key value is the virtual run time of the running process, and the CFS scheduler randomly chooses the next process to be run, which is the smallest vruntime in all processes, which corresponds to the leftmost leaf node in the tree. function is __pick_next_entity ()

这个函数本身不会遍历树找到最左叶子节点,该值缓存在rb_leftmost字段中,函数返回值就是CFS选择的下一个运行进程。如果返回NULL,表示树空,没有可运行进程,这时选择idle任务运行。
2. Join the process to the tree

Occurs when the process is awakened or the first time a process is created through a fork call.

函数enqueue_entity():更新运行时间和其他一些统计数据,然后调用__enqueue_entity()。函数__enqueue_entity():进行繁重的插入工作,把数据项真正插入到红黑树中:
3. Remove a process from the tree

Delete actions occur when a process is blocked or terminated.相关函数是dequeue_entity()和__dequeue_entity():

(iii) Dispatcher entry

The main entry point function for process scheduling is schedule ().

schedule()函数会调用pick_next_task();pick_next_task()会以优先级为序依次检查每一个调度类,并且选择最高优先级的进程。pick_next_task()会返回指向下一个可运行进程的指针,没有时返回NULLpick_next_task()函数实现会调用pick_next_entity()pick_next_entity()会调用__pick_next_entity()。
(iv) Sleep and wake-up

Sleep: The process marks itself as dormant, moves out of the executable red-black tree, puts in a wait sequence, and then calls schedule () to select and execute a different process

When awakened: The process is set to executable state and then moved from the wait queue to the executable red-black tree.

1. Waiting Queue

The wait queue is a simple list of processes that wait for certain events to occur, and hibernation is processed by waiting for the queue. The kernel uses wake _ queue head T to represent the wait queue. The wait queue can be created statically via declare _ Waitqueue (), or dynamically by init _ waitqueue _ Head ()

The process adds itself to a wait queue by performing the following steps:

    1. Call Macro define_wait () to create an option to wait for the queue.
    2. Call Add _ wait _ Queue () to add itself to the queue.
    3. Call the Prepare _ to _ Wait () method to change the state of the process to task _ Interruptible or Task _ uninterruptible.
    4. If the state is set to task_interruptible, the signal wakes up the process. (Pseudo-Wakeup: wake up not because of an event.) )
    5. When the process is awakened, it checks to see if the condition is true, exits the loop again, or calls schedule () again and repeats the action.
    6. When the condition is met, the process sets itself to task _ running and calls the finish _ Wait () method to move itself out of the wait sequence. function INotify _ Read (): Responsible for reading information from the notification file descriptor.
2. Wake Up

The wake operation is performed through the function wake_up (), which wakes up all processes on the specified wait queue.

wake_up()函数调用try_to_wake_up()try_to_wake_up()函数负责将进程设置成TASK_RUNNING状态调用enqueue_task()将此进程放入红黑树中如果被唤醒的进程优先级比正在执行的进程优先级高,设置need_resched标志通常哪段代码促成等待条件达成,它就负责随后调用wake_up()函数。
4.6 Preemption and Context switching

Context switches are handled by the context _ switch () function.
Whenever a new process is selected for operation, schedule () invokes Context _ switch ().
It has completed two basic tasks:

    • Call SWITCH_MM (), which is responsible for mapping virtual memory from the previous process to the new process.

    • Call Switch _ to (), which is responsible for switching from the processor state of the previous process to the processor state of the new process.
      This includes saving, recovering stack information and register information, and any other architecture-related state information that must be managed and saved for each process object.

1. User preemption

When the kernel is about to return to user space, if the need_resched flag is set, it causes schedule () to be called, and user preemption occurs.

What happens when a user preemption occurs:

    • When returning user space from system call
    • When returning user space from an interrupt handler
2. Kernel preemption

Linux supports kernel preemption in its entirety. As long as the rescheduling is secure, the kernel can preempt the tasks being performed at any time.

The lock is a non-preemptive zone flag. As long as the lock is not held, the kernel can be preempted.

Actions that are performed to support kernel preemption:

    1. The preempt _ Count counter is added to thread _ info for each process, the initial value is 0, the lock +1 is used, the lock is released-1, and the value is 0 o'clock, and preemption can be performed.
    2. When returning kernel space from an interrupt, check the need_resched flag first, if it is set to indicate that it needs to be dispatched, and then check the Preempt_count counter, if it is 0, it can be preempted, then the scheduler is called. Otherwise, the kernel returns the current execution process directly from the interrupt.
    3. The locks held by the current process are all released, when Preempt_count is 0, the code that releases the lock checks whether the need_resched is set, and if so, invokes the scheduler.
    4. Kernel preemption also occurs explicitly if a process in the kernel is blocked, or if schedule () is explicitly called.

Kernel preemption can occur in:

    • The interrupt handler is executing, and before the user space is returned
    • When the kernel code once again has preemption
    • The task in the kernel explicitly calls schedule ()
    • Task blocking in the kernel (also causes call schedule ())
4.7 Real-Time scheduling strategy

Linux provides two real-time scheduling strategies: SCHED _ FIFO and SCHED _ RR. The normal non-real-time scheduling strategy is sched _ normal.

Sched_fifo

Simple first-in-first-out algorithm without using time slices

The sched _ FIFO can be run more scheduled than any sched _ normal process. Only a higher priority FIFO or RR can preempt it, and the same priority FIFO is rotated, and only exits when it is willing to let go.

SCHED_RR

FIFO with time slice is a kind of implementation rotation scheduling algorithm.

When the RR exhausts its time, other real-time processes at the same priority are scheduled in turn. The time slice is used only to reschedule the same priority process.

Priority Range
    • Real-time: 0~[max _ RT _ PRIO-1]
      Default Max _ RT _ prio=100, so the default real-time priority range is [0,99]

    • SCHED _ Normal:[max _ rt _ Prio]~[max _ rt _ prio+40]. By default, the nice value from 20 to +19 corresponds to a real-time priority range from 100 to 139.

4.8 Scheduling-related system applications

1. System calls related to scheduling policies and priorities
    • GetPriority ()/setpriority () Set priority
    • Sched _ Getscheduler ()/sched _ Setscheduler () sets and gets the scheduling policy and real-time priority of the process
    • Sched _ GetParam ()/sched _ SetParam () set and get real-time priority of the process
    • Sched _ Get _ order _ min ()/sched _ Get _ priority _ Max () returns the maximum and minimum precedence for a given scheduling policy
2. System calls related to processor bindings
    • Linux Scheduler provides mandatory processor binding mechanism

    • In a CPUs _ allowed bitmask in a task _ struct

    • Sched_setaffinity () sets a bitmask of different combinations of one or several bits

    • Sched _ Getaffinity () returns the current cpus_ allowed bit mask

3. Discard Processor Time

Sched_yield () lets the process explicitly cede processor time to other waiting execution processes. The normal process moves to the expiration queue, and the real-time process moves to the priority queue last.

The kernel first calls yield, determines that a given process is actually in the executable state, and then calls Sched _ Yield ().

User space can call Sched _ yield () directly.

"Linux kernel design and Implementation" book fourth chapter study Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.