CFS Scheduling __kernel

Source: Internet
Author: User
Linux supports three process scheduling strategies, namely Sched_fifo, SCHED_RR, and Sched_normal. Linux supports two types of processes, real-time processes, and common processes. The real-time process can adopt the SCHED_FIFO and SCHED_RR scheduling strategy, and the common process adopts the Sched_normal scheduling strategy.

This paper mainly discusses the scheduling algorithm of ordinary process, in order to describe conveniently, "process" in later chapters refers to "ordinary process".

From the Linux2.6.23 kernel to the current Linux3.3.5 kernel of the common process (using the scheduling strategy Sched_normal) The absolute Fair scheduling algorithm is adopted, CFS (completely fair schedule). CFS draws a completely fair idea from the RSDL/SD, no longer tracks the process's sleep time, and no longer distinguishes between interactive processes. It treats all the processes uniformly, and that is the meaning of fairness. In CFS scheduling, dynamic priority member Prio in the process data structure continue to be effective, but the kernel no longer dynamically adjusts the dynamic priority of the process.

The process has a priority of 100-139, and the corresponding nice value is-20-19. The same as the previous version's precedence rules. Nice and priority relationships are as follows

How to achieve the fair dispatch. The kernel maintains a virtual run time vruntime for each process, and each process runs for a period of time, and the virtual run time increases, but the value of each process increment is different when the same actual time runs. For example, a process with a nice value of 0 runs 10ms and its virtual uptime increases by 1VMS (VMs are 1 virtual milliseconds, defined for ease of description); The nice 19 process runs 10ms and its virtual uptime increases 1000vms. The virtual running time of a process has been increasing throughout its lifecycle. The kernel takes the virtual run time as the actual running time, and for the sake of fairness, select a process running with a short run time. So the kernel always chooses the process of small virtual running time in scheduling. For the kernel, this is fair, O (∩_∩) o

Also run 10ms, how to determine how many VMS a process should increase. The increased virtual run time is related to the priority nice value for the process, and each nice value corresponds to a weight value, as shown in the following figure.

The amount of time per process virtual run time increases in proportion to (nice_0_load/nice_n_weight). Where Nice_0_load is 1024, that is, the weight of Nice value 0, nice_n_weight is the weight of a process with a nice value of n, such as a nice-20 (normal process with the highest priority) with a weight of 88761. It can be seen from this algorithm that the virtual running time of the high priority process increases slowly, and the actual running time is long. Similarly, this algorithm can ensure that the process of low priority has the opportunity to run, but the actual running time is relatively short.

The kernel wants to put all the running state processes together and take a short virtual run process out of the schedule. Because the frequency of scheduling is very high, the algorithm to find the right process becomes very important. In the CFS scheduling, the kernel uses the red-black tree data structure which is the key value of the process virtual running time to hook up each running process. For red and black trees, refer to the red and black tree of the Linux kernel.

First, we introduce an important concept, the dispatch entity (Sched entiy): The object of the dispatch. The task_struct of each process contains the Schedule entity member variable SE. Why introduce the scheduling entity instead of directly using the task_struct of the process. Because CFS group scheduling is supported in CFS, one group may contain 1 or more processes. The group cannot be dispatched through the task_struct of any process in the group, so the concept of the dispatch entity is introduced. For the sake of consistency, both process groups and processes use the dispatch entity to save information about scheduling. The CFS group schedule is described later.

In multi-core systems, each CPU (here refers to a core) corresponds to a global variable per_cpu_runqueues, whose data structure is struct RQ, which is the top-most data architecture of the dispatch. CFS is included in the data structure, and its data structure is struct CFS_RQ. CFS is the top-level structure for CFS scheduling on the CPU, or the entry point for CFS scheduling. In fact, RQ also includes RT member variables, RT is the top-level structure of real-time process scheduling.

struct CFS_RQ {for ease of illustration, only partial member variables are preserved

struct Rb_root tasks_timeline;

struct Rb_node *rb_leftmost;

struct sched_entity *curr, *next, *last, *skip;

}

The member variable tasks_timeline points to the root of the red and black tree, and all processes are linked to the red-black tree (some of which are indirectly hooked up). A single process in the following figure, the process data structure task_struct contains the member variable SE, the dispatch entity. The dispatch entity SE contains the Run_node node, which is attached to the red-black tree through the node. When you select the process that needs to be scheduled, the kernel searches for the red-black tree, finds the process with little virtual running time, and takes the process off the tree. It also inserts a switch out (a red-black tree that is stripped of the running process for a long running time). Because of the characteristics of the red-black tree, the efficiency of inserting, removing and super finding is very high, thus ensuring the efficiency of the CFS scheduling.

Because the image below is not clear, the original Word document is directly transmitted: The Linux kernel of CFS scheduling and group scheduling. doc

CFS Group Scheduling

Why should we introduce CFS group scheduling? Suppose users A and B share a single machine. We may want a and B to share the CPU resources fairly, but if user a uses 8 processes and User B creates only 1 processes, based on the CFS schedule (assuming that the user process has the same priority), a user will consume 8 times times the CPU time of B users.
How to ensure that a, B users share the CPU fair. Group scheduling can do this. By dividing the processes that belong to user A and B into one group, the scheduler selects a group from two groups and then selects a process from the selected group to execute. If two groups are selected in the same probability, users A and B will each have about 50% of the CPU.

Related data structure
In the Linux kernel, you use the task_group structure to manage groups of group scheduling. All existing task_group form a tree-shaped structure.
A task_group can contain processes with arbitrary scheduling categories (in particular, real-time processes and common processes), task_group include scheduling entities and scheduling queues corresponding to real-time processes, and scheduling entities and scheduling queues for common processes. See the following structure definition

struct task_group{Delete unrelated member variables

#ifdef config_fair_group_sched

struct sched_entity **se; Normal process scheduling entity, each CPU on the previous

struct CFS_RQ **cfs_rq; Normal process scheduling queues, each CPU on the previous

#endif

#ifdef config_rt_group_sched

struct sched_rt_entity **rt_se; Real-time process scheduling entity, each CPU on the previous

struct RT_RQ **rt_rq; Real-time process scheduling queues, each CPU previous

#endif

}

Only the CFS group scheduling is described below. On a multi-core processor platform, when the kernel creates a group, the kernel creates a dispatch entity and a dispatch queue for the group on each CPU, as seen outside the red box in the figure above. The dispatch entity SE of the group on each CPU is individually scheduled. The red box refers to the scheduling data structure on one CPU. If a group has a process that can run on that CPU, the dispatch entity SE on that CPU is hooked up to the CFS_RQ red-black tree on the CPU. The process that the group can run on the CPU is suspended on the red-black tree pointed to by My_q in the Group dispatch entity SE, as seen in the lower-right corner of the previous figure.

The priority of the CFS group. A group has a fixed priority when it is created, with a nice value of 0. A group can get the same elapsed time on a CPU as a separate nice 0 process. The SE of a group assigns the actual running time to all processes on its my_q, in the same way as the CFS algorithm, after a certain running time is obtained. This enables the user A and B users to occupy the same CPU time.

Question: With regard to the use of process weighting tables, it is always necessary to do a multiplication and a division operation when calculating the process virtual run time. We all know multiplication and division operations are more CPU cycles, why not directly modify the weight table prio_to_weight[] in the value, so that each of these values are equal to the current value divided by 15 after the rounding (of course, manually calculated results after the replacement to the table), The weight of the process with the least priority is 1. The virtual run time can be multiplied directly by the weight value, thus reducing the division by one time. Whether this is better.

In addition, it is seen from the code that the nice 0 process directly increases the real time difference when calculating the virtual runtime without the process of multiplication and removal, as shown in the following figure, whether it is because the design thought that most ordinary processes have a nice value of 0.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.