Linux Scheduler Insider __linux

Source: Internet
Author: User

The latest version of this very important component in the kernel improves scalability

M. Tim Jones M. Jones
Posted September 07, 2006
https://www.ibm.com/developerworks/cn/linux/l-scheduler/

This article reviews the Linux 2.6 Task Scheduler and some of its most important properties. Before we go into the details of the scheduler, let's take a look at the basic objectives of the scheduler. what is a scheduler.

Generally speaking, the operating system is the medium between the application and the available resources. Typical resources are memory and physical devices. But the CPU can also be considered a resource, and the scheduler can temporarily assign a task to execute on it (the unit is a time slice). The scheduler makes it possible for us to execute multiple programs at the same time, so we can share the CPU with users with a variety of requirements.

An important goal of the scheduler is to efficiently allocate CPU time slices while providing a good user experience. The scheduler also faces conflicting goals, such as minimizing response time for critical real-time tasks and maximizing CPU utilization. Let's take a look at how the Linux 2.6 Scheduler achieves these goals and compares it with the previous scheduler. problems with early Linux scheduler

Before the 2.6 version of the kernel, the scheduler had obvious limitations when many tasks were active. This is because the scheduler is implemented using an algorithm with an O (n) complexity. In this scheduler, the time it takes to schedule a task is a function of the number of tasks in a system. In other words, the more active tasks are, the longer it takes to schedule tasks. When the workload is heavy, the processor consumes a lot of time due to scheduling, and the time spent on the task itself is very low. Therefore, this algorithm lacks scalability.

The importance of O-notation
O-notation can tell us how long an algorithm takes. The time required for an O (n) algorithm depends on how much input (and n is a linear relationship), and O (n^2) is the square of the input quantity. O (1) is independent of input and can be completed within a fixed time.

In symmetric multiprocessing systems (SMP), the scheduler prior to version 2.6 uses a running queue for all processors. This means that a task can be scheduled on any processor-a good thing for load balancing, but a disaster for the memory cache. For example, suppose a task is executing on a CPU-1, and its data is in the cache of this processor. If the task is scheduled to execute on CPU-2, then the data needs to be invalidated in CPU-1 and placed in the CPU-2 cache.

The previous scheduler also used a run queue lock, so in an SMP system, selecting a task execution would prevent other processors from operating the run queue. The result is that the idle processor can only wait for the processor to release the running queue lock, which can result in reduced efficiency.

Finally, in the early kernel, preemption is impossible; This means that if a low-priority task is executing, a high-priority task can only wait for it to complete. Introduction to Linux 2.6 Scheduler

The 2.6 version of the scheduler is designed and implemented by Ingo Molnar. Ingo has been involved in the development of the Linux kernel since 1995. His motivation for writing this new scheduler was to create a complete O (1) Scheduler for wake-up, context switching, and timer interrupt overhead. One problem that triggers the need for a new scheduler is the use of the Java™ virtual machine (JVM). The Java programming model uses a lot of execution threads, which produces a lot of dispatch load in the O (n) scheduler. The O (1) Scheduler is not affected too much in this high load situation, so the JVM can be executed effectively.

The 2.6 version of the scheduler addresses the 3 major issues found in the previous scheduler (O (n) and SMP scalability) and addresses other issues. Now we're going to start exploring the basic design of the 2.6 version of the scheduler. The main scheduling structure

First, let's review the 2.6 version of the scheduler structure. Each CPU has a running queue that contains 140 priority lists that are serviced in a first-in, first-out order. Tasks that are scheduled for execution are added to the end of their respective run queue priority list. Each task has a time slice, depending on how long the system allows the task to be performed. The first 100 priority lists of the running queues are reserved for live tasks and the last 40 for user tasks (see Figure 1). We'll see in the future why this distinction is very important.

Figure 1. The running queue structure of the Linux 2.6 scheduler

In addition to the CPU's run queue (called an active Runqueue), there is also an expired run queue. When a task in the active run queue runs out of its own time slice, it is moved to the expired run queue (expired Runqueue). During the move, the time slice is recalculated (and therefore its priority is reflected; it will be described in more detail later). If a given priority task is not already in the active run queue, the pointer to the active run queue and the expired run queue is exchanged so that the expiration priority list becomes the list of active precedence.

The scheduler's work is simple: it chooses a task in the highest-priority queue to execute. To make this process more efficient, the kernel uses a bitmap to define when a task exists on a given priority list. Therefore, on most architectures, a find-first-bit-set instruction is used to have the highest priority in 5 32-bit words (140 priority). The time needed to find a task does not depend on the number of active tasks, but on the number of priorities. This makes the 2.6 version of the scheduler a complex O (1) process because the scheduling time is both fixed and not affected by the number of active tasks. better support for SMP systems

So what is SMP? SMP is an architecture in which multiple CPUs can be used to perform individual tasks at the same time, unlike traditional asymmetric processing systems, which use a single CPU to perform all tasks. SMP architectures are good for multi-threaded applications.

Although priority scheduling can work on SMP systems, its large lock architecture means that when a CPU chooses a task for distribution scheduling, the running queue is locked by the CPU and the other CPUs can only wait. The 2.6 version of the scheduler is not scheduled with a single lock, instead it has a lock on each run queue. This allows all CPUs to schedule tasks without competing with other CPUs.

In addition, because each processor has a running queue, the task is usually CPU-related and better able to take advantage of the CPU's thermal cache. Task preemption

Another advantage of the Linux 2.6 version Scheduler is that it allows preemption. This means that low-priority tasks cannot be executed when high-priority tasks are ready to run. The scheduler grabs low-priority processes, puts the process back in its priority list, and then schedules it again. But wait a minute, there are more features.

It seems that the O (1) Feature and preemption feature of the 2.6 version scheduler are not enough, and the scheduler also provides dynamic task priority and SMP load balancing. Let's talk a little bit about what these features are, and what they offer separately. Dynamic Task Priority

To prevent the task from monopolizing the CPU and starving other tasks that require access to the CPU, the Linux version 2.6 Scheduler can dynamically modify the priority of the task. This is accomplished by rewarding the I/O binding task by punishing the CPU-bound tasks. I/O-bound tasks typically use the CPU to set I/O, and then sleep-pending I/O operations are completed. This behavior provides CPU access to other tasks.

Because I/O-bound tasks are selfless for CPU access, their priority is reduced (rewards) by up to 5 priority levels. A CPU-bound task is penalized by increasing its priority by up to 5 priorities.

Better user response Ability
The task of communicating with the user is interactive, so the response should be better than the non-interactive task. Because communication with the user, whether sending data to standard output or waiting for input data through standard input, is I/O bound, improving the priority of these tasks gives better interactive responsiveness.

Whether the task is I/O-bound or CPU-bound is based on the principle of interactivity. The interactive metric for a task is calculated based on how much time the task takes to execute and how long it takes to sleep. Note that because I/O tasks First Schedule I/O and then sleep, I/O-bound tasks spend more time sleeping and waiting for I/O operations to complete. This increases its interactive metrics.

It is worth noting that priority tuning only takes place on user tasks and does not adjust their priority for real-time tasks. SMP Load Balancing

When a task is created in an SMP system, these tasks are placed in a given CPU run queue. In general, we don't know when a task is short-lived and when it needs to run for long periods of time. Therefore, the initial task to the CPU allocation may not be ideal.

To maintain a balanced workload between CPUs, tasks can be redistributed: moving tasks from a load-heavy CPU to a load-lighter CPU. The Linux 2.6 version of the scheduler uses load balancing (load balancing) to provide this functionality. Every 200ms, the processor checks the CPU for uneven load, and if it is unbalanced, the processor will perform a task balancing operation between the CPUs.

A negative side effect of this process is that the new CPU cache is cold for the migrated task (the data needs to be read into the cache).

Remember that the CPU cache is a local (on-chip) memory that provides faster access than system memory. If a task is performed on a CPU, the data associated with the task is placed in the local cache of the CPU, which is called hot. If there is no data in the CPU's local cache for a task, the cache is called cold.

Unfortunately, keeping the CPU busy can cause the CPU cache to be cold for migrating tasks. Explore more potential

The source code for version 2.6 Scheduler is well encapsulated in the/USR/SRC/LINUX/KERNEL/SCHED.C file. We summarize some of the useful functions that can be found in this file in table 1.

Table 1. Features of the Linux 2.6 scheduler

The
name of the function function Description
Schedule The Scheduler main function. Task execution with the highest scheduling priority.
Load_balance Check the CPU to see if there is an imbalance, and if not, try migrating the task.
Effective_prio Returns the effective priority of a task (based on a static policy, but can contain any rewards and penalties).
Recalc_task_prio Determine rewards or penalties for tasks based on their free time.
Source_load Properly compute the load of the source CPU (the CPU from which the task is migrated).
Target_load A fair calculation of the load of the target CPU (the CPU to which the task may migrate).
Migration_thread A high-priority system thread that migrates tasks between CPUs.


The structure of the run queue can also be found in the/usr/src/linux/kernel/sched.c file. The 2.6 version of the scheduler can also provide some statistical information, if Config_schedstats is enabled. These statistics can be seen from the/proc/schedstat in the/proc file system, which provides a lot of data for each CPU in the system, including load balancing and process migration statistics. Prospect

The Linux 2.6 Scheduler has taken a big step from the previous Linux scheduler. It greatly improves the ability to maximize CPU utilization while providing a good response experience for users. Preemption and better support for multiprocessor architectures make the system closer to operating systems that are useful for multiple desktops and real-time systems. Linux 2.8 version of the kernel it is still too early to talk about, but from the 2.6 version of the changes, we can expect more good things.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.