The fully fair scheduling class is an instance of the scheduling class.
static const struct sched_class fair_sched_class = { .next = &idle_sched_class, .enqueue_task = enqueue_task_fair, .dequeue_task = dequeue_task_fair, .yield_task = yield_task_fair, .check_preempt_curr = check_preempt_wakeup, .pick_next_task = pick_next_task_fair, .put_prev_task = put_prev_task_fair, .set_curr_task = set_curr_task_fair, .task_tick = task_tick_fair, .task_new = task_new_fair,};
In the master scheduler and periodic scheduler, functions of the fully fair or real-time scheduling class are called based on the process type.
@ Next refers to the idle scheduling class, while the next of the real-time scheduling class points to the completely fair scheduling class. This sequence has been created before compilation and will not be dynamically created during system running.
@ Enqueue_task_fair: Add a new process to the ready queue. This operation occurs when the process changes from sleep to runable.
@ Dequeue_task: removes a process from the ready queue. This operation occurs when the process switches from the ready running status to the unavailable running status. Although this is called a queue, the specific internal implementation is entirely determined by the scheduler class.
@ Yield_task: The process can automatically stop running sched_yield by calling the system. At this time, the kernel will call the yield_task function of the scheduler class.
@ Check_preempt_curr: if necessary, this function is called to seize the current process.
@ Pick_next_task: used to select the next process to be run
@ Put_prev_task, called before replacing the currently running process with another process
@ Set_curr_task: This function is called when the scheduling policy of the current process changes. You need to call this function to change the current task of the CPU.
@ Task_tick, called by the periodic scheduler every time the periodic scheduler is activated
@ New_task: when the system creates a new task, you must use this function to notify the scheduler class.
Data Structure
An instance of this structure is embedded in each ready queue of the master Scheduler:
230 /* CFS-related fields in a runqueue */ 231 struct cfs_rq { 232 struct load_weight load; 233 unsigned long nr_running; 234 235 u64 exec_clock; 236 u64 min_vruntime; 237 238 struct rb_root tasks_timeline; 239 struct rb_node *rb_leftmost; 240 struct rb_node *rb_load_balance_curr; 241 /* 'curr' points to currently running entity on this cfs_rq. 242 * It is set to NULL otherwise (i.e when none are currently running). 243 */ 244 struct sched_entity *curr; 245 246 unsigned long nr_spread_over; 262 };
Nr_running calculates the number of processes on the queue, and load maintains the cumulative value of all these processes. We will use load to calculate the virtual clock.
Exec_clock only has the actual running time of statistics.
Min_vruntime calculates the minimum virtual runtime of all processes in the queue.
Tasks_timeline is a tree node of the red/black tree. It manages all processes and sorts them according to the virtual running time of these processes. The leftmost process is the process with the smallest virtual running time, it is also the process most to be called.
Rb_leftmost points to the leftmost node of the tree. In fact, we can find this node by traversing tasks_timeline.
Curr points to the scheduling entity of the currently executable process in cfs_rq.
How CFS works
The fully fair scheduling algorithm relies on the virtual clock to measure the CPU time that the waiting process can obtain in the fully fair system. However, there is no virtual clock in the data structure, because the virtual clock can be calculated based on the actual clock and the load weight related to each process.
All computation related to the virtual clock is performed in update_curr. This function is called in different places in the system.
336 static void update_curr(struct cfs_rq *cfs_rq) 337 { 338 struct sched_entity *curr = cfs_rq->curr; 339 u64 now = rq_of(cfs_rq)->clock; 340 unsigned long delta_exec; 341 342 if (unlikely(!curr)) 343 return; 344 345 /* 346 * Get the amount of time the current task was running 347 * since the last time we changed load (this cannot 348 * overflow on 32 bits): 349 */ 350 delta_exec = (unsigned long)(now - curr->exec_start); 351 352 __update_curr(cfs_rq, curr, delta_exec); 353 curr->exec_start = now; 354 355 if (entity_is_task(curr)) { 356 struct task_struct *curtask = task_of(curr); 357 358 cpuacct_charge(curtask, delta_exec); 359 } 360 }
339 rq_of (cfs_rq)-> clock is used to implement the clock of the ready queue. The clock value is updated every time a periodic scheduler is called.
350 curr-> exec_start saves the time when the load was last changed. Note that it is not the last running time of the process. The current process may encounter multiple update_curr operations
_ Update_curr
304 static inline void 305 __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, 306 unsigned long delta_exec) 307 { 308 unsigned long delta_exec_weighted; 309 u64 vruntime; 310 311 schedstat_set(curr->exec_max, max((u64)delta_exec, curr->exec_max)); 312 313 curr->sum_exec_runtime += delta_exec; 314 schedstat_add(cfs_rq, exec_clock, delta_exec); 315 delta_exec_weighted = delta_exec; 316 if (unlikely(curr->load.weight != NICE_0_LOAD)) { 317 delta_exec_weighted = calc_delta_fair(delta_exec_weighted, 318 &curr->load); 319 } 320 curr->vruntime += delta_exec_weighted; 321 322 /* 323 * maintain cfs_rq->min_vruntime to be a monotonic increasing 324 * value tracking the leftmost vruntime in the tree. 325 */ 326 if (first_fair(cfs_rq)) { 327 vruntime = min_vruntime(curr->vruntime, 328 __pick_next_entity(cfs_rq)->vruntime); 329 } else 330 vruntime = curr->vruntime; 331 332 cfs_rq->min_vruntime = 333 max_vruntime(cfs_rq->min_vruntime, vruntime); 334 }
313 sum_exec_runtime indicates the cumulative CPU time consumed by the process. @ delta_exec indicates the difference between the two times when the load statistic was last updated. Both are real time values.
316 If the process priority is 120 (Nice = 0), the virtual time is the same as the physical time; otherwise, the virtual execution time is calculated using calc_delta_mine.
326 first_fait: Check whether there are leftmost nodes on the tree, that is, whether there are processes waiting for scheduling on the tree.
332 cfs_rq-> min_vruntime increases monotonically
During the running process, the vruntime of the Process Scheduling entity increases monotonically. Processes with higher priorities are slower, so they move slowly to the right. In this way, the chance of being scheduled is greater.
Latency tracking
The kernel has an inherent concept called scheduling latency, which ensures that each process is run at least once at an interval.
Sysctl_sched_latency
The parameter is used to control this behavior. The default value is 20 ms. It can be controlled through/proc/sys/kernel/sched_latency_ns.
Sched_nr_latency
Controls the maximum number of activities processed within a delay period. If the data of the active process exceeds the upper limit, the delay period is linearly scaled proportionally.
_ Sched_period
The length of the delay period is usually sysctl_sched_latency. However, if more processes are running, the value must be calculated using the following formula:
_ Sched_period = sysctl_sched_latency * nr_running/sched_nr_latency
In a delay period, the weights of various processes are considered to allocate the delay period between active processes. For a given process represented by a scheduling entity, the allocated time is calculated as follows.
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se){ u64 slice = __sched_period(cfs_rq->nr_running); slice *= se->load.weight; do_div(slice, cfs_rq->load.weight); return slice;}