Linux 2.6 Full Fair scheduling algorithm CFS (Completely Fair Scheduler) analysis

Source: Internet
Author: User

Transfer from http://www.ibm.com/developerworks/cn/linux/l-completely-fair-scheduler/index.html?ca=drs-cn-0125

A brief history of Linux Scheduler

the early Linux Scheduler used the lowest design, and it obviously did not focus on large architectures with many processors, Not to mention Hyper-threading. The 1.2 Linux Scheduler uses circular queues for operational task management, using a cyclic scheduling strategy. This scheduler adds and removes processes that are highly efficient (locks with protected structures). In short, the scheduler is not complex but simple and fast.
Linux version 2.2 introduces the concept of scheduling classes, allowing scheduling strategies for real-time tasks, non-preemptive tasks, and non-real-time tasks. The 2.2 Scheduler also includes support for Chenduo processing (SMP). The
2.4 kernel contains a relatively simple scheduler that runs at an O (N) interval (it iterates through each task during the scheduled event). The 2.4 scheduler divides the time into epoch, each epoch in which each task is allowed to run out of time slices. If a task does not use all of its time slices, half of the remaining time slices will be added to the new time slice to allow it to execute longer in the next epoch. The scheduler is simply an iterative task, and the Goodness function (indicator) is applied to determine which task is performed below. Although this method is relatively simple, it is inefficient, lacks scalability, and is not suitable for real-time systems. It also lacks the ability to take advantage of new hardware architectures, such as multicore processors. The early 2.6 Scheduler of the
, called the O (1) Scheduler, is designed to solve the problem with the 2.4 scheduler-the scheduler does not need to iterate over the entire task list to determine the next task to be dispatched (hence the name O (1), which means it is more efficient and more scalable). O (1) The scheduler tracks the tasks that can be run in the run queue (in fact, there are two run queues per priority level-one for the active task and one for the overdue task), which means that to determine which task to perform next, the scheduler simply takes the next task out of the running queue of the particular activity by priority. The O (1) Scheduler is more extensible and includes interactivity, providing a lot of inspiration for determining whether a task is subject to I/O or processor-bound. But the O (1) Scheduler is clumsy in the kernel. It takes a lot of code to calculate the apocalypse, is difficult to manage and does not embody the essence of the algorithm for purists.

In order to solve the problems faced by the O (1) scheduler and to deal with other external pressures, something needs to be changed. This change comes from the kernel patch of Con Kolivas, including his rotating staircase Deadline Scheduler (rsdl), which contains his early work on the staircase scheduler. The result of these efforts is a simple scheduler that includes fairness and intra-boundary delays. Kolivas's scheduler attracts a lot of people (and a lot of people call for it to be included in the current 2.6.21 mainstream kernel), and it's clear that the scheduler's change is about to happen. Ingo Molnar,o (1) The creator of the scheduler, and then developed a CFS-based scheduler around some of Kolivas's ideas. Let's dissect the CFS and see how it works at a higher level.

--------------------------------------------------------------------------------------------------------------- ---------------

CFS Overview

The main idea behind CFS is to maintain a balance (fairness) in providing processor time for the task. This means that the process should be assigned a significant number of processors. When you lose balance to a task (meaning that one or more tasks are not given a significant amount of time relative to other tasks), you should assign time to the task that has lost balance and let it execute.

To achieve balance, CFS maintains the amount of time it provides to a task where it is called a virtual runtime . The smaller the virtual runtime of a task means that the shorter the task is allowed to access the server-the higher the demand for the processor. CFS also includes a sleep fairness concept to ensure that tasks that are not currently running (for example, waiting for I/O) get a significant share of the processor when it is ultimately needed.

Unlike the previous Linux scheduler, however, it did not maintain the task in the running queue, and CFS maintained a red-black tree in chronological order (see Figure 1). The red and black tree is a tree with lots of interesting and useful properties. First, it is self-balancing, which means that no path on the tree is more than twice times longer than any other path. Second, the run on the tree takes place at O (log n) time (where n is the number of nodes in the tree). This means that you can insert or delete tasks quickly and efficiently.

Figure 1: Example of a red and black tree

sched_entity   Object Representation", the task that requires the most processor (minimum virtual runtime) is stored on the left side of the tree, The task that requires minimal processor (the highest virtual runtime) is stored on the right side of the tree. To be fair, the scheduler then chooses the node on the leftmost side of the red and black tree to dispatch to the next to maintain fairness. A task adds its elapsed time to the virtual runtime, describes how much time it consumes the CPU, and then, if it can, then plug it back into the tree. In this way, the task on the left side of the tree is given time to run, and the contents of the tree are migrated from the right to the left to maintain fairness. Therefore, each running task will catch up with other tasks to maintain the execution balance of the entire set of operational tasks.

--------------------------------------------------------------------------------------------------------------- ---------------

CFS Internal Principle

All tasks in linux are called  task_struct   's task structure representation. The structure (and other related content) completely describes the task and includes the current state of the task, its stack, process identity, priority (static and dynamic), and so on. You can find these and related structures in./linux/include/linux/sched.h. But because not all tasks are operational, you are in  task_struct   does not find any CFS-related fields. Instead, a "margin:0px" named  sched_entity   's new structure to track scheduling information (see Figure 2).

Figure 2: Hierarchy of tasks and red-black trees

The relationships of the various structures are shown in Figure 2. The roots of the tree passrb_rootElement throughcfs_rqThe structure (in./kernel/sched.c) is referenced. The leaves of the red and black tree do not contain information, but the internal nodes represent one or more operational tasks. Each node of the red and black tree isrb_nodeIndicates that it contains only the child reference and the color of the parent object.rb_nodecontained insched_entityStructure, the structure containsrb_nodeReferences, load weights, and various statistical data. The most important thing is thatsched_entityContainsvruntime(64-bit field), which represents the amount of time a task runs and is indexed as a red-black tree. At lasttask_structAt the top, it describes the task in its entirety and containssched_entityStructure.

As far as the CFS section is concerned, the dispatch function is very simple. In the./KERNEL/SCHED.C, you will see the genericschedule()function that will preempt the currently running task (unless it passes through theyield()Code to preempt itself first). Note that CFS does not have a real time slicing concept for preemption, because the preemption time is variable. The currently running tasks (tasks that are now preempted) areput_prev_taskCalled (via the Dispatch Class) to return to the red-black tree. WhenscheduleWhen the function starts to determine the next task to dispatch, it calls thepick_next_taskFunction. This function is also common (in./kernel/sched.c), but it invokes the CFS scheduler through the Scheduler class. In the CFSpick_next_taskFunctions can be in the./KERNEL/SCHED_FAIR.C (calledpick_next_task_fair()) found in the. This function only gets the leftmost task from the red-black tree and returns the relevantsched_entity。 With this reference, a simpletask_of()Called to determine the returnedtask_structReference. The generic Scheduler finally provides the processor for this task.

--------------------------------------------------------------------------------------------------------------- ---------------

Priority and CFS

CFS does not directly use precedence but is used as an attenuation factor for the time that a task is allowed to execute. Low-priority tasks have a higher attenuation factor, while high-priority tasks have a lower attenuation factor. This means that low-priority tasks allow task execution to be consumed more quickly than high-priority tasks. This is a great solution to avoid maintaining a priority-scheduled run queue.

CFS Group Scheduling

Another interesting area of CFS is the concept of group scheduling (introduced in the 2.6.24 kernel). Group scheduling is another way to bring fairness to scheduling, especially when dealing with tasks that generate many other tasks. Suppose a server that produces a lot of tasks is going to parallelize the incoming connection (the typical architecture of the HTTP server). Not all tasks are treated uniformly, and CFS introduces groups to handle this behavior. The server processes that generate the tasks share their virtual runtimes throughout the group (in one hierarchy), while individual tasks maintain their own independent virtual runtimes. This allows a single task to receive roughly the same scheduling time as the group. You will find that the/proc interface is used to manage the process hierarchy, giving you complete control over how groups are formed. With this configuration, you can assign fairness across users, across processes, or their variants.

scheduling classes and Domains

with CFS introduced is the scheduling class concept (you can review   Figure 2). Each task belongs to a scheduling class, which determines how the task will be dispatched. The dispatch class defines a common set of functions (via  sched_class ", the set of functions defines the behavior of the scheduler. For example, each scheduler provides a way to add tasks to be dispatched, to bring up the next task to run, to provide to the scheduler, and so on. Each scheduler class is connected to each other in a single-to-one connection, enabling the class to iterate (for example, to enable the disabling of a given processor). The general structure is shown in 3. Note that adding or removing tasks from a specific scheduling structure is only required to join or detach task functions from the queue. function  pick_next_task   Select the next task to be performed (depending on the specific policy of the dispatch class).

Figure 3. Scheduling class graphical View

But don't forget that the dispatch class is part of the task structure itself (see   Figure 2). This simplifies the operation of the task, regardless of its dispatch class. For example, the following function preempted the currently running task with a new task in./KERNEL/SCHED.C (where  curr   defines the current running task,  rq   stands for CFS red and black tree  p   is the next task to be dispatched):

static inline void check_preempt (struct RQ *rq, struct task_struct *p) {Rq->curr->sched_class->check_preempt_curr (RQ, p);}  

If this task is using a fair dispatch class,  check_preempt_curr ()   will resolve to  check_preempt_wakeup () . You can view these relationships in./kernel/sched_rt.c,./kernel/sched_fair.c, and./kernel/sched_idle.c.

Scheduling classes are another interesting place for scheduling changes, but as the scheduling domain increases, so does the functionality. These domains allow you to group one or more processors hierarchically, for load balancing and isolation purposes. One or more processors can share scheduling policies (and maintain load balancing between them) or implement independent scheduling policies to deliberately isolate tasks.

Back to top of page

Other Scheduler

Continue to study scheduling, and you will find that the scheduler that is being developed will break through the boundaries of performance and extensibility. Con Kolivas is not tied to his Linux experience, and he drives another Linux scheduler, which is abbreviated as: BFS. The scheduler is said to have better performance on NUMA systems and mobile devices, and has been introduced into a derivative product of the Android operating system.




Linux 2.6 Full Fair scheduling algorithm CFS (Completely Fair Scheduler) analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.