Linux Process Scheduling

Source: Internet
Author: User

CFS scheduler class

First, it is clear that CFS is not a scheduler, but a scheduler class.

The traditional scheduler uses the concept of time slice to calculate time slice for processes in the system separately, so that the process runs until the time slice ends. After all the process time slices are used up, recalculate the time slice. The CFS scheduler completely abandons the time slice and will focus on the waiting time of the process.

The purpose of the CFS scheduler is to provide maximum fairness to each activated process in the system, or to ensure that no process is compromised. Note that the CFS Scheduler only applies to processes with sched_normal, sched_idle, and sched_batch scheduling types. The CFS scheduler class does not work for real-time processes with sched_rr and sched_fifo scheduling types.

CFS introduces the concept of virtual clock. The clock goes faster than the real-time clock. The precise speed depends on the number of processes waiting for the scheduler to select. If there are four processes in the queue, the virtual clock speed is 1/4 of the actual clock speed. If the real-time clock is over 20 seconds, the virtual clock is only 5 seconds. If the CPU is allocated in a fair way, the 5 seconds of the virtual clock can be used as the benchmark.

Assume that the virtual clock of the ready queue is fair_clock, and the waiting time of the process is stored in wait_runtime. fair_clock indicates the time that the process needs to run under a completely fair condition, wait_runtime indicates the unfair level of the process. The smaller the value of fait_clock-wait_runtime, the more unfair it is, the more it needs to be scheduled for execution.

CFS introduced a red-black tree to manage processes in the ready queue, with key values using fair_clock-wait_runtime, so the leftmost node represents the most unfair process. After the process is run, the key value minus the running time, so that the key value is moved to the right of the tree. At this time, another process becomes the leftmost node and will be selected by the Scheduler for execution next time.

The above model is just an ideal situation. In actual use, we need to consider many factors.

  • Processes with different priorities should be given more time.
  • The process cannot be switched too frequently. Switching will change the context of the process, and the page table needs to be refreshed again, which is costly.

Scheduler Data Structure

Scheduler

The scheduler uses a series of data structures to sort and manage processes in the system. The way the scheduler works is closely related to the design of these structures.

You can use either of the following methods to activate scheduling:

1. directly activate the instance. For example, the process is going to sleep or discard the CPU for other reasons.

2. Periodic mechanism. The kernel checks from time to time whether process switching is necessary at a fixed frequency.

Task_struct Member

The task_struct of each process has several structures related to scheduling.

struct task_struct {......         int prio, static_prio, normal_prio;        struct list_head run_list;        const struct sched_class *sched_class;        struct sched_entity se;        unsigned int policy;        cpumask_t cpus_allowed;        unsigned int time_slice;        unsigned int rt_priority;.......} 

Static_prio is the static priority of a process. It is the priority assigned when the process is started. You can set the priority through nice call.

Prio and normal_prio are the dynamic priorities of processes,

Rt_priority is the priority of real-time processes. The lowest real-time priority is 0, and the highest real-time priority is 99. The higher the value, the higher the priority, which is the opposite of the static priority of common processes.

Sched_class is the scheduler class to which the process belongs.

Policy indicates the scheduling policy of the process. Currently, five scheduling policies are supported:

  • Sched_normal. We mainly talk about this type of process. The CFS scheduling policy can process this type of process.
  • Sched_batch and sched_idle also process such processes through the CFS scheduling policy.
  • Sched_rr and sched_fifo are used to implement soft real-time processes. These are not produced by the CFS scheduling class, but are processed by the Real-Time Scheduler class.

Scheduling

The scheduling class is used to determine which process to run next. The kernel supports different Scheduling Policies (completely fair scheduling, real-time scheduling, and scheduling of idle processes when nothing can be done ). Each process exactly belongs to a scheduling class, and each scheduling class is responsible for managing the processes.

The scheduler class provides associations between common schedulers and various scheduling methods. The scheduler class is represented by several function pointers of the collection in a specific data structure. Each operation requested by the global scheduler can be expressed by a pointer. This allows you to create a general scheduler without having to understand the internal working principles of different schedulers.

Each scheduling class must provide an instance of struct sched_class. The hierarchy between scheduling classes is flat: Real-Time processes must be processed before they are completely fair, while fully fair processes give priority to idle processes, idle processes are active only when the CPU has nothing to do.

User processes cannot directly interact with the scheduler class. They only need to define the policy as sched_xyz. The scheduler is responsible for finding the corresponding scheduler Class Based on the scheduling policy. Sched_normal, sched_batch, and sched_idle are mapped to fair_sched_class, while sched_rr and sched_fifo are associated with rt_sched_class. Both fair_sched_class and rt_sched_class are instances of struct sched_class, indicating completely fair scheduler and Real-Time Scheduler respectively.

Ready queue

The main data structure used by the core scheduler to manage active processes is called the ready queue. Each CPU has its own ready queue, and each active process only appears in one ready queue. It is impossible to run a process simultaneously on multiple CPUs.

The ready queue is the starting point for many operations of the Global scheduler. However, note that the process is not directly managed by the members of the ready queue. This is the role of the scheduler, so a sub-ready queue with a specific scheduler class is embedded in the ready queue.

The ready queue is implemented using the following data structure.

All ready queues are in the runqueues array. Each element of this array corresponds to a CPU in the system. In a single processor system, because there is only one CPU, the array has only one element.

 271 /* 272  * This is the main, per-CPU runqueue data structure. 273  * 274  * Locking rule: those places that want to lock multiple runqueues 275  * (such as the load balancing or the thread migration code), lock 276  * acquire operations must be ordered by ascending &runqueue. 277  */ 278 struct rq { 279     /* runqueue lock: */ 280     spinlock_t lock; 281  282     /* 283      * nr_running and cpu_load should be in the same cacheline because 284      * remote CPUs use both these fields when doing load calculation. 285      */ 286     unsigned long nr_running; 287     #define CPU_LOAD_IDX_MAX 5 288     unsigned long cpu_load[CPU_LOAD_IDX_MAX]; 289     unsigned char idle_at_tick; 290 #ifdef CONFIG_NO_HZ 291     unsigned char in_nohz_recently; 292 #endif 293     /* capture load from *all* tasks on this cpu: */ 294     struct load_weight load; 295     unsigned long nr_load_updates; 296     u64 nr_switches; 297  298     struct cfs_rq cfs; 299 #ifdef CONFIG_FAIR_GROUP_SCHED 300     /* list of leaf cfs_rq on this cpu: */ 301     struct list_head leaf_cfs_rq_list; 302 #endif 303     struct rt_rq rt; 304  305     /* 306      * This is part of a global counter where only the total sum 307      * over all CPUs matters. A task can increase this counter on 308      * one CPU and if it got migrated afterwards it may decrease 309      * it on another CPU. Always updated under the runqueue lock: 310      */ 311     unsigned long nr_uninterruptible; 312  313     struct task_struct *curr, *idle; 314     unsigned long next_balance; 315     struct mm_struct *prev_mm; 316  317     u64 clock, prev_clock_raw; 318     s64 clock_max_delta; 319  320     unsigned int clock_warps, clock_overflows; 321     u64 idle_clock; 322     unsigned int clock_deep_idle_events; 323     u64 tick_timestamp; 324  325     atomic_t nr_iowait; 326  327 #ifdef CONFIG_SMP 328     struct sched_domain *sd; 329  330     /* For active balancing */ 331     int active_balance; 332     int push_cpu; 333     /* cpu of this runqueue: */ 334     int cpu; 335  336     struct task_struct *migration_thread; 337     struct list_head migration_queue; 304  305     /* 306      * This is part of a global counter where only the total sum 307      * over all CPUs matters. A task can increase this counter on 308      * one CPU and if it got migrated afterwards it may decrease 309      * it on another CPU. Always updated under the runqueue lock: 310      */ 311     unsigned long nr_uninterruptible; 312  313     struct task_struct *curr, *idle; 314     unsigned long next_balance; 315     struct mm_struct *prev_mm; 316  317     u64 clock, prev_clock_raw; 318     s64 clock_max_delta; 319  320     unsigned int clock_warps, clock_overflows; 321     u64 idle_clock; 322     unsigned int clock_deep_idle_events; 323     u64 tick_timestamp; 324  325     atomic_t nr_iowait; 326  327 #ifdef CONFIG_SMP 328     struct sched_domain *sd; 329  330     /* For active balancing */ 331     int active_balance; 332     int push_cpu; 333     /* cpu of this runqueue: */ 334     int cpu; 335  336     struct task_struct *migration_thread; 337     struct list_head migration_queue;

Nr_running specifies the number of processes that can be run on the queue, including the number of processes that can be run on all priorities and scheduling classes.

Load provides the current Load Measurement of the ready queue. The load is essentially proportional to the number of active processes in the current ready queue. Each process also uses their priority as the weight. The Virtual Clock of each ready queue depends on this information.

Cpu_load can be used to track the previous CPU load status. The update_cpu_load function will be called in the periodic scheduler_tick scheduling to update the CPU load status. This array will be used during load migration.

The idle points to the task_struct structure of the idle process.

CFS and RT are embedded sub-ready queues for CFS scheduling and real-time scheduling respectively.

Clock and prev_clock_raw are used to implement the clock of the ready queue itself. The periodic scheduler calls _ update_rq_clock to update these two values. The clock records the clock value of this update, prev_clock_raw is the value of _ update_rq_clock called last time.

Tick_timestamp. Every time scheduler is called, tick_timestamp is updated to the current tick


Unimportant member variables

Clock_warps and clock_overflows. Due to hardware reasons, the system may roll back the time or jump before the time when the update is ready for the queue clock. These two variables are used for statistical purposes, record the number of rollbacks and previous hops.

Nr_load_updates indicates the number of times the system calls update_cpu_load.

Because the scheduler needs to schedule more specific entities, it needs an appropriate data structure to describe such entities. Definition:

struct sched_entity {        struct load_weight      load;           /* for load-balancing */        struct rb_node          run_node;        unsigned int            on_rq;        u64                     exec_start;        u64                     sum_exec_runtime;        u64                     vruntime;        u64                     prev_sum_exec_runtime;} 

Load defines the weight and determines the proportion of each entity to the total load of the queue. Load weight calculation is an important task of the scheduler. Because the speed of the virtual clock required by CFS is ultimately dependent on the load.

Run_node is a node of the red/black tree. The scheduling entity is attached to the red/black tree through this node.

On_rq indicates whether the object is currently scheduled on the ready queue.

When running the sum_exec_runtime in the process, we need to record the CPU time consumed to change it to CFS scheduling. This time is the actual time. The tracking Runtime is continuously accumulated by update_curr. Every time update_curr is called, the difference between the current time and exec_start is calculated and accumulated to sum_exec_runtime, and exec_start is updated to the current time.

Vruntime indicates the number of virtual time elapsed during execution.

Prev_sum_exec_runtime when the process is CPU revoked, the sum_exec_runtime is saved to prev_sum_exec_runtime. This data is used during process preemption.

Each task is embedded with a sched_entity, so the process is a schedulable entity. Of course, the schedulable entity is not necessarily a process.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.