Process Scheduler 4.1 multitasking
A multitasking operating system is an operating system that can simultaneously concurrently execute multiple processes concurrently.
The multitasking system is divided into two types:
Preemptive multitasking: Linux provides a preemptive multi-tasking mode that is determined by the scheduler to stop running a process.
Modern operating system provides: Dynamic time slice calculation method; configurable calculation strategy
Non-preemptive multi-tasking: Unless the process itself stops running itself, it will continue to execute.
The scheduler cannot hide from each process for how long it takes to make uniform rules, so process exclusive processor time may exceed the user's expectations
Process scheduling for 4.2 Linux
O (1) Scheduler: The workload on large servers is ideal, but the interaction process is missing.
RSDL, reverse stair deadline scheduling algorithm (also known as CFS, perfect fair scheduling algorithm. )
4.3 strategy
The policy determines what program the scheduler is running in.
(i) two typical processes1. I/O expendable process
Most of the time the process is used to commit I/O requests or wait for I/O requests, often in a running state but running for a short period of time, and eventually blocking when waiting for more requests.
2. Processor-intensive Processes
Most of the time is spent on executing code, unless it is preempted, and will usually run continuously.
A scheduling strategy usually seeks to balance the two conflicting goals:
- Fast process scheduling (short response times)
- Maximum system utilization (high throughput)
(ii) Process priorities
The most basic class of scheduling algorithm is the priority-based scheduling. This is an idea of the process rating based on the value of the process and its need for processor time.
The scheduler always chooses the process in which the time slice is not exhausted and the highest priority is run.
Linux uses two different priority ranges:
-
Nice
range [ -20,19], the default value is 0;
The higher the Nice value, the lower the priority;
The nice value of the Linux system represents the ratio of time slices;
The Ps-el command looks at the list of processes in the system, and NI lists the nice values.
-
Real-time priority
The value can be configured, and the default range of changes is [0,99];
The higher the value, the higher the priority;
Any real-time process has a higher priority than a normal process.
(c) Time slices
A time slice represents the time that a process can continue to run before it is preempted.
- I/O consuming processes do not require a long time slice
- Processor consumption process I want the time slice to be as long as possible.
The CFS scheduler for Linux does not allocate time slices directly to the process, but instead distributes the processor's use gestures to the process. The processor time that the process obtains is closely related to the system load. This ratio is affected by the nice value, and the nice value is used as a weight to adjust the processor time use ratio used by the process:
High nice (Low priority) will be given a low weight, thereby losing a small proportion of processor use ratio;
The low nice value (high priority) will be given a high weight, thus robbing more processor use ratio.
The Linux process is preemptive, and whether preemption is determined entirely by the priority of the process and whether it has a time slice.
CFS preemption: The timing of preemption depends on how much processor is consumed by the new executable, and if the consumption is smaller than the current process: The new program is immediately put into operation to preempt the current process, otherwise it is postponed.
4.4 Linux Scheduling algorithm(i) Scheduler class
Scheduler class: allows a number of different dynamically added scheduling algorithms coexist, scheduling belongs to their own category of processes. Each scheduler has a priority, which traverses the scheduling class in order of priority and selects the Scheduler class with the highest priority.
The completely fair dispatch of CFS is a scheduling class for ordinary processes.
(ii) process scheduling in Unix systems
The scheduling algorithm used by UNIX is to allocate absolute time slices, which will trigger a fixed switching frequency, which is not conducive to fairness.
The CFS used by Linux completely abandons the time slice, assigning the process a processor to use the weighting, guaranteeing constant fairness and changing the switching frequency.
(iii) Fair dispatch of CFS
Allow each process to run for a period of time, cycle round, select the least-run process as the next running process, and calculate how long a process should run based on the total number of running processes. The nice value is the weight of the processor run ratio obtained by the process.
Implementation of 4.5 Linux scheduling
Four components of the CFS scheduling algorithm implementation:
- Time Billing
- Process Selection
- Dispatcher entry
- Sleep and wake up
(i) Time accounting
All schedulers must be billed for the process run time.
1. Scheduler Entity Structure
CFS no longer has the concept of time slices, but it must also maintain the time accounting for each process run. Use the scheduler entity structure to track process run accounting: The SE variable in the process descriptor.
2. Virtual Real-time
CFS uses the Vruntime variable to hold the virtual run time of the process, to indicate how long the process is running, and how long it should run.
This virtual run time is weighted, regardless of the timer beat.定义在kernel/sched_fair.c文件中的update_curr()函数实现了该记账功能。
它计算了当前进程的执行时间并存放入变量delta_
exec中,然后又将运行时间传递给__update_curr();__update_curr()根据当前可运行进程总数对进行时间进行加权计算,最终将权重值与当前运行进程的vruntime值相加。
(ii) Process selection
Core of the CFS scheduling algorithm: Selecting tasks with minimal vruntime
CFS uses a red-black tree to organize a process that can run a process queue and use it to quickly find the minimum vruntime value.
Linux, Red black tree is called Rbtree, is a self-balancing binary search tree, is a tree node form of data stored, the data will correspond to a key value, can be used to quickly retrieve the data on the node, and the retrieval speed and the entire tree node size into an exponential ratio relationship.
1. Pick the next task
The node key value is the virtual run time of the running process, and the CFS scheduler randomly chooses the next process to be run, which is the smallest vruntime in all processes, which corresponds to the leftmost leaf node in the tree. function is __pick_next_entity ()
这个函数本身不会遍历树找到最左叶子节点,该值缓存在rb_leftmost字段中,函数返回值就是CFS选择的下一个运行进程。如果返回NULL,表示树空,没有可运行进程,这时选择idle任务运行。
2. Join the process to the tree
Occurs when the process is awakened or the first time a process is created through a fork call.
函数enqueue_entity():更新运行时间和其他一些统计数据,然后调用__enqueue_entity()。函数__enqueue_entity():进行繁重的插入工作,把数据项真正插入到红黑树中:
3. Remove a process from the tree
Delete actions occur when a process is blocked or terminated.相关函数是dequeue_entity()和__dequeue_entity():
(iii) Dispatcher entry
The main entry point function for process scheduling is schedule ().
schedule()函数会调用pick_next_task();pick_next_task()会以优先级为序依次检查每一个调度类,并且选择最高优先级的进程。pick_next_task()会返回指向下一个可运行进程的指针,没有时返回NULLpick_next_task()函数实现会调用pick_next_entity()pick_next_entity()会调用__pick_next_entity()。
(iv) Sleep and wake-up
Sleep: The process marks itself as dormant, moves out of the executable red-black tree, puts in a wait sequence, and then calls schedule () to select and execute a different process
When awakened: The process is set to executable state and then moved from the wait queue to the executable red-black tree.
1. Waiting Queue
The wait queue is a simple list of processes that wait for certain events to occur, and hibernation is processed by waiting for the queue. The kernel uses wake _ queue head T to represent the wait queue. The wait queue can be created statically via declare _ Waitqueue (), or dynamically by init _ waitqueue _ Head ()
The process adds itself to a wait queue by performing the following steps:
- Call Macro define_wait () to create an option to wait for the queue.
- Call Add _ wait _ Queue () to add itself to the queue.
- Call the Prepare _ to _ Wait () method to change the state of the process to task _ Interruptible or Task _ uninterruptible.
- If the state is set to task_interruptible, the signal wakes up the process. (Pseudo-Wakeup: wake up not because of an event.) )
- When the process is awakened, it checks to see if the condition is true, exits the loop again, or calls schedule () again and repeats the action.
- When the condition is met, the process sets itself to task _ running and calls the finish _ Wait () method to move itself out of the wait sequence. function INotify _ Read (): Responsible for reading information from the notification file descriptor.
2. Wake Up
The wake operation is performed through the function wake_up (), which wakes up all processes on the specified wait queue.
wake_up()函数调用try_to_wake_up()try_to_wake_up()函数负责将进程设置成TASK_RUNNING状态调用enqueue_task()将此进程放入红黑树中如果被唤醒的进程优先级比正在执行的进程优先级高,设置need_resched标志通常哪段代码促成等待条件达成,它就负责随后调用wake_up()函数。
4.6 Preemption and Context switching
Context switches are handled by the context _ switch () function.
Whenever a new process is selected for operation, schedule () invokes Context _ switch ().
It has completed two basic tasks:
Call SWITCH_MM (), which is responsible for mapping virtual memory from the previous process to the new process.
Call Switch _ to (), which is responsible for switching from the processor state of the previous process to the processor state of the new process.
This includes saving, recovering stack information and register information, and any other architecture-related state information that must be managed and saved for each process object.
1. User preemption
When the kernel is about to return to user space, if the need_resched flag is set, it causes schedule () to be called, and user preemption occurs.
What happens when a user preemption occurs:
- When returning user space from system call
- When returning user space from an interrupt handler
2. Kernel preemption
Linux supports kernel preemption in its entirety. As long as the rescheduling is secure, the kernel can preempt the tasks being performed at any time.
The lock is a non-preemptive zone flag. As long as the lock is not held, the kernel can be preempted.
Actions that are performed to support kernel preemption:
- The preempt _ Count counter is added to thread _ info for each process, the initial value is 0, the lock +1 is used, the lock is released-1, and the value is 0 o'clock, and preemption can be performed.
- When returning kernel space from an interrupt, check the need_resched flag first, if it is set to indicate that it needs to be dispatched, and then check the Preempt_count counter, if it is 0, it can be preempted, then the scheduler is called. Otherwise, the kernel returns the current execution process directly from the interrupt.
- The locks held by the current process are all released, when Preempt_count is 0, the code that releases the lock checks whether the need_resched is set, and if so, invokes the scheduler.
- Kernel preemption also occurs explicitly if a process in the kernel is blocked, or if schedule () is explicitly called.
Kernel preemption can occur in:
- The interrupt handler is executing, and before the user space is returned
- When the kernel code once again has preemption
- The task in the kernel explicitly calls schedule ()
- Task blocking in the kernel (also causes call schedule ())
4.7 Real-Time scheduling strategy
Linux provides two real-time scheduling strategies: SCHED _ FIFO and SCHED _ RR. The normal non-real-time scheduling strategy is sched _ normal.
Sched_fifo
Simple first-in-first-out algorithm without using time slices
The sched _ FIFO can be run more scheduled than any sched _ normal process. Only a higher priority FIFO or RR can preempt it, and the same priority FIFO is rotated, and only exits when it is willing to let go.
SCHED_RR
FIFO with time slice is a kind of implementation rotation scheduling algorithm.
When the RR exhausts its time, other real-time processes at the same priority are scheduled in turn. The time slice is used only to reschedule the same priority process.
Priority Range
Real-time: 0~[max _ RT _ PRIO-1]
Default Max _ RT _ prio=100, so the default real-time priority range is [0,99]
SCHED _ Normal:[max _ rt _ Prio]~[max _ rt _ prio+40]. By default, the nice value from 20 to +19 corresponds to a real-time priority range from 100 to 139.
4.8 Scheduling-related system applications
1. System calls related to scheduling policies and priorities
- GetPriority ()/setpriority () Set priority
- Sched _ Getscheduler ()/sched _ Setscheduler () sets and gets the scheduling policy and real-time priority of the process
- Sched _ GetParam ()/sched _ SetParam () set and get real-time priority of the process
- Sched _ Get _ order _ min ()/sched _ Get _ priority _ Max () returns the maximum and minimum precedence for a given scheduling policy
2. System calls related to processor bindings
Linux Scheduler provides mandatory processor binding mechanism
In a CPUs _ allowed bitmask in a task _ struct
Sched_setaffinity () sets a bitmask of different combinations of one or several bits
Sched _ Getaffinity () returns the current cpus_ allowed bit mask
3. Discard Processor Time
Sched_yield () lets the process explicitly cede processor time to other waiting execution processes. The normal process moves to the expiration queue, and the real-time process moves to the priority queue last.
The kernel first calls yield, determines that a given process is actually in the executable state, and then calls Sched _ Yield ().
User space can call Sched _ yield () directly.
"Linux kernel design and Implementation" book fourth chapter study Summary