Linux Process Scheduling-completely fair scheduling

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The fully fair scheduling class is an instance of the scheduling class.

static const struct sched_class fair_sched_class = {    .next           = &idle_sched_class,    .enqueue_task       = enqueue_task_fair,    .dequeue_task       = dequeue_task_fair,    .yield_task     = yield_task_fair,    .check_preempt_curr = check_preempt_wakeup,    .pick_next_task     = pick_next_task_fair,    .put_prev_task      = put_prev_task_fair,    .set_curr_task          = set_curr_task_fair,    .task_tick      = task_tick_fair,    .task_new       = task_new_fair,};

In the master scheduler and periodic scheduler, functions of the fully fair or real-time scheduling class are called based on the process type.

@ Next refers to the idle scheduling class, while the next of the real-time scheduling class points to the completely fair scheduling class. This sequence has been created before compilation and will not be dynamically created during system running.

@ Enqueue_task_fair: Add a new process to the ready queue. This operation occurs when the process changes from sleep to runable.

@ Dequeue_task: removes a process from the ready queue. This operation occurs when the process switches from the ready running status to the unavailable running status. Although this is called a queue, the specific internal implementation is entirely determined by the scheduler class.

@ Yield_task: The process can automatically stop running sched_yield by calling the system. At this time, the kernel will call the yield_task function of the scheduler class.

@ Check_preempt_curr: if necessary, this function is called to seize the current process.

@ Pick_next_task: used to select the next process to be run

@ Put_prev_task, called before replacing the currently running process with another process

@ Set_curr_task: This function is called when the scheduling policy of the current process changes. You need to call this function to change the current task of the CPU.

@ Task_tick, called by the periodic scheduler every time the periodic scheduler is activated

@ New_task: when the system creates a new task, you must use this function to notify the scheduler class.

Data Structure

An instance of this structure is embedded in each ready queue of the master Scheduler:

 230 /* CFS-related fields in a runqueue */ 231 struct cfs_rq { 232     struct load_weight load; 233     unsigned long nr_running; 234  235     u64 exec_clock; 236     u64 min_vruntime; 237  238     struct rb_root tasks_timeline; 239     struct rb_node *rb_leftmost; 240     struct rb_node *rb_load_balance_curr; 241     /* 'curr' points to currently running entity on this cfs_rq. 242      * It is set to NULL otherwise (i.e when none are currently running). 243      */ 244     struct sched_entity *curr; 245  246     unsigned long nr_spread_over; 262 };

Nr_running calculates the number of processes on the queue, and load maintains the cumulative value of all these processes. We will use load to calculate the virtual clock.

Exec_clock only has the actual running time of statistics.

Min_vruntime calculates the minimum virtual runtime of all processes in the queue.

Tasks_timeline is a tree node of the red/black tree. It manages all processes and sorts them according to the virtual running time of these processes. The leftmost process is the process with the smallest virtual running time, it is also the process most to be called.

Rb_leftmost points to the leftmost node of the tree. In fact, we can find this node by traversing tasks_timeline.

Curr points to the scheduling entity of the currently executable process in cfs_rq.

How CFS works

The fully fair scheduling algorithm relies on the virtual clock to measure the CPU time that the waiting process can obtain in the fully fair system. However, there is no virtual clock in the data structure, because the virtual clock can be calculated based on the actual clock and the load weight related to each process.

All computation related to the virtual clock is performed in update_curr. This function is called in different places in the system.

 336 static void update_curr(struct cfs_rq *cfs_rq) 337 { 338     struct sched_entity *curr = cfs_rq->curr; 339     u64 now = rq_of(cfs_rq)->clock; 340     unsigned long delta_exec; 341  342     if (unlikely(!curr)) 343         return; 344  345     /* 346      * Get the amount of time the current task was running 347      * since the last time we changed load (this cannot 348      * overflow on 32 bits): 349      */ 350     delta_exec = (unsigned long)(now - curr->exec_start); 351  352     __update_curr(cfs_rq, curr, delta_exec); 353     curr->exec_start = now; 354  355     if (entity_is_task(curr)) { 356         struct task_struct *curtask = task_of(curr); 357  358         cpuacct_charge(curtask, delta_exec); 359     } 360 }

339 rq_of (cfs_rq)-> clock is used to implement the clock of the ready queue. The clock value is updated every time a periodic scheduler is called.

350 curr-> exec_start saves the time when the load was last changed. Note that it is not the last running time of the process. The current process may encounter multiple update_curr operations

_ Update_curr

 304 static inline void 305 __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, 306           unsigned long delta_exec) 307 { 308     unsigned long delta_exec_weighted; 309     u64 vruntime; 310  311     schedstat_set(curr->exec_max, max((u64)delta_exec, curr->exec_max)); 312  313     curr->sum_exec_runtime += delta_exec; 314     schedstat_add(cfs_rq, exec_clock, delta_exec); 315     delta_exec_weighted = delta_exec; 316     if (unlikely(curr->load.weight != NICE_0_LOAD)) { 317         delta_exec_weighted = calc_delta_fair(delta_exec_weighted, 318                             &curr->load); 319     } 320     curr->vruntime += delta_exec_weighted; 321  322     /* 323      * maintain cfs_rq->min_vruntime to be a monotonic increasing 324      * value tracking the leftmost vruntime in the tree. 325      */ 326     if (first_fair(cfs_rq)) { 327         vruntime = min_vruntime(curr->vruntime, 328                 __pick_next_entity(cfs_rq)->vruntime); 329     } else 330         vruntime = curr->vruntime; 331  332     cfs_rq->min_vruntime = 333         max_vruntime(cfs_rq->min_vruntime, vruntime); 334 }

313 sum_exec_runtime indicates the cumulative CPU time consumed by the process. @ delta_exec indicates the difference between the two times when the load statistic was last updated. Both are real time values.

316 If the process priority is 120 (Nice = 0), the virtual time is the same as the physical time; otherwise, the virtual execution time is calculated using calc_delta_mine.

326 first_fait: Check whether there are leftmost nodes on the tree, that is, whether there are processes waiting for scheduling on the tree.

332 cfs_rq-> min_vruntime increases monotonically

During the running process, the vruntime of the Process Scheduling entity increases monotonically. Processes with higher priorities are slower, so they move slowly to the right. In this way, the chance of being scheduled is greater.

Latency tracking

The kernel has an inherent concept called scheduling latency, which ensures that each process is run at least once at an interval.

Sysctl_sched_latency

The parameter is used to control this behavior. The default value is 20 ms. It can be controlled through/proc/sys/kernel/sched_latency_ns.

Sched_nr_latency

Controls the maximum number of activities processed within a delay period. If the data of the active process exceeds the upper limit, the delay period is linearly scaled proportionally.

_ Sched_period

The length of the delay period is usually sysctl_sched_latency. However, if more processes are running, the value must be calculated using the following formula:

_ Sched_period = sysctl_sched_latency * nr_running/sched_nr_latency

In a delay period, the weights of various processes are considered to allocate the delay period between active processes. For a given process represented by a scheduling entity, the allocated time is calculated as follows.

static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se){    u64 slice = __sched_period(cfs_rq->nr_running);    slice *= se->load.weight;    do_div(slice, cfs_rq->load.weight);    return slice;}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux Process Scheduling-completely fair scheduling

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support