Linux 2.6 task scheduler and its important attributes

Source: Internet
Author: User

The Linux kernel continues to develop and uses new technologies, making great strides in reliability, scalability, and performance. One of the most important features of kernel 2.6 is the scheduler implemented by Ingo Molnar. This scheduler is dynamic and supports load balancing and operates at a constant speed-O (1 ). This article introduces these attributes of the Linux 2.6 scheduler and more.

This article will review the task scheduler in Linux 2.6 and its most important attributes. Before going into the details of the scheduler, let's first understand the basic goal of the scheduler.

  What is a scheduler?

Generally, the operating system is the medium between applications and available resources. Typical resources include memory and physical devices. However, the CPU can also be considered as a resource, and the scheduler can temporarily allocate a task for execution (unit: time slice ). The scheduler makes it possible for us to execute multiple programs at the same time, so we can share the CPU with users with various needs.

An important goal of the scheduler is to effectively allocate CPU time slices and provide a good user experience. The scheduler also needs to face conflicting goals, such as minimizing the response time for key real-time tasks and maximizing the overall CPU utilization. Next, let's take a look at how the Linux 2.6 scheduler achieves these goals and compare them with the previous scheduler.

  Early Linux schedulers

Before kernel version 2.6, the scheduler had obvious restrictions when many tasks were active. This is because the scheduler uses an algorithm with the complexity of O (n. In this scheduler, the time consumed by a scheduled task is a function of the number of tasks in the system. In other words, the more active tasks, the longer the scheduled task takes. When the task load is very heavy, the processor will consume a lot of time due to scheduling, and the time used for the task itself will be very small. Therefore, this algorithm lacks scalability.

  

The importance of o-notation can tell us how much time an algorithm will take. The time required by an O (n) algorithm depends on the number of inputs (linear relationship with N), while O (N ^ 2) is the square of the number of inputs. O (1) is irrelevant to the input. You can complete the operation within a fixed period of time.

In the Symmetric Multi-Processing System (SMP), The scheduler before version 2.6 uses a running queue for all processors. This means that a task can be scheduled on any processor-this is a good thing for Server Load balancer, but it is a disaster for the memory cache. For example, assume that a task is being executed on a CPU-1 and its data is in the cache of this processor. If this task is scheduled to run on a CPU-2, the data needs to invalidate it in the CPU-1 and put it in the cache of the CPU-2.

In the past, the scheduler also used a run queue lock. Therefore, in the SMP system, selecting a task to execute will impede other processors from operating the run queue. The result is that the idle processor can only wait for the processor to release the queue lock, which will reduce the efficiency.

Finally, in the early kernel, preemption is impossible; this means that if a low-priority task is being executed, the high-priority task can only wait for it to complete.

  Introduction to the Linux 2.6 Scheduler

The scheduler of version 2.6 is designed and implemented by Ingo Molnar. INGO has been involved in Linux kernel development since 1995. The motivation for writing this new scheduler is to create a full O (1) Scheduler for wakeup, context switching, and timer interrupt overhead. One problem that triggers requirements for the new scheduler is the use of Java Virtual Machine (JVM. The Java programming model uses a lot of execution threads. In the O (n) scheduler, this will generate a lot of scheduling load. O (1) the scheduler will not be affected too much in this case of high load, so JVM can effectively execute.

The 2.6 scheduler solves three major problems (O (N) and SMP scalability issues found in the previous Scheduler) and solves other problems. Now we will start to explore the basic design of the 2.6 scheduler.

Main scheduling Structure

First, let's review the scheduler structure of version 2.6. Each CPU has a running queue, which contains 140 priority lists, which serve in the FIFO order. All scheduled tasks are added to the end of the priority list of their respective running queues. Each task has a time slice, depending on how long the system allows the task to be executed. The first 100 priority lists of running queues are reserved for real-time tasks, and the last 40 are used for user tasks (see figure 1 ). Let's see why this difference is very important later.

  

  

  

(Figure 1)

In addition to the CPU running Queue (called the active runqueue), there is also an expired running queue. When a task in the active running queue uses its own time slice, it is moved to the expired running Queue (expired runqueue. During the moving process, the time slice will be re-calculated (so it will reflect its priority; it will be described in more detail later ). If there is no task with a given priority in the active running queue, the pointer pointing to the active running queue and the expired running queue will be exchanged, in this way, the expiration priority list can be changed to the activity priority list.

The scheduler is very simple: it selects a task in the queue with the highest priority for execution. To make this process more efficient, the kernel uses a bitmap to define when a task exists in a given priority list. Therefore, in most architecture, which of the five 32-bit characters (140 priorities) has the highest priority using a find-first-bit-set command. The time required to query a task for execution does not depend on the number of active tasks, but on the number of priority. This makes the scheduler of version 2.6 a process of complexity O (1), because the scheduling time is both fixed and not affected by the number of active tasks.

  Better support for SMP Systems

So what is SMP? SMP is an architecture where multiple CPUs can be used to execute each task at the same time. Unlike the traditional asymmetric processing system, SMP uses one CPU to execute all the tasks. The SMP architecture is very beneficial to multithreading applications.

Although priority scheduling can also work in the SMP system, its large lock architecture means that when a CPU selects a task for distribution and scheduling, the running queue will be locked by this CPU, other CPUs can only wait. The scheduler of version 2.6 does not use a lock for scheduling. On the contrary, it has a lock for each running queue. This allows all CPUs to schedule tasks without competing with other CPUs.

In addition, since each processor has a running queue, tasks are usually closely related to the CPU, which can better utilize the hot cache of the CPU.

  Task Preemption

Another advantage of the Linux 2.6 scheduler is that it allows preemption. This means that tasks with lower priority cannot be executed when a high-priority task is ready to run. The scheduler will seize a low-priority process, put the process back in its priority list, and then re-schedule it.

But please wait. There are more functions!

It seems that the O (1) and preemption features of the 2.6 scheduler are not enough. The scheduler also provides dynamic task priority and SMP Load Balancing functions. Next let's discuss what these functions are and what advantages they provide.

  Dynamic task priority

To prevent tasks from occupying the CPU exclusively and starve other tasks that need to access the CPU, the scheduler of Linux 2.6 can dynamically modify the task priority. This is done by punishing CPU-bound tasks and rewarding I/O-bound tasks. I/O-bound tasks usually use the CPU to set I/O, and then wait for the I/O operation to complete. This behavior provides CPU access for other tasks.

Because I/O-bound tasks are selfless for CPU access, their priority is reduced (rewarded) by up to five. A cpu-bound task is penalized by adding a maximum of five priority levels.

Whether the task is I/O bound or CPU bound depends on the interaction principle. Task interaction indicators are calculated based on the time spent in task execution and the time spent in sleep. Note that because I/O tasks are scheduled first and then sleep, therefore, I/O-bound tasks will spend more time sleeping and waiting for I/O operations to complete. This will increase the interaction index.

  

Tasks with better user response capabilities and better communication with users are classified, so their response capabilities should be better than non-interactive tasks. Communication with users (whether sending data to the standard output or waiting for input data through the standard input) is I/O-bound, therefore, improving the priority of these tasks can provide better interactive response capabilities.

It is worth noting that priority adjustment only applies to user tasks and does not apply to real-time tasks.

SMP Load Balancing

When creating tasks in the SMP system, these tasks are put into a given CPU running queue. Generally, we cannot know when a task is short-lived or needs to run for a long time. Therefore, the initial task to CPU allocation may not be ideal.

To maintain task load balancing among CPUs, tasks can be re-distributed: Move tasks from the CPU with heavy loads to the CPU with light loads. In Linux 2.6, the scheduler uses load balancing to provide this function. Every 200 ms, the processor checks whether the CPU load is not balanced. If not, the processor performs a task balancing operation between CPUs.

One negative impact of this process is that the cache of the new CPU is cold for the migrated tasks (data needs to be read into the cache ).

Remember that the CPU cache is a local (On-Chip) memory that provides faster access than the system memory. If a task is executed on a CPU, data related to the task will be stored in the local cache of the CPU, which is called hot. If there is no data in the local cache of the CPU for a task, the cache is called cold.

Unfortunately, keeping the CPU busy will cause the CPU cache to be cold for the migrated tasks.

  Explore more potential

The source code of the 2.6 scheduler is well encapsulated in the/usr/src/Linux/kernel/sched. c file. In table 1, we summarize some useful functions that can be found in this file.

Table 1. Function Description of the Linux 2.6 Scheduler

The main function of the Schedule scheduler. Task execution with the highest scheduling priority.

Load_balance checks the CPU to check whether there is an imbalance. If not, it tries to migrate the task.

Effective_prio returns the valid priority of the task (based on the static policy, but can contain any rewards and punishments ).

Recalc_task_prio determines the reward or punishment for the task based on the idle time of the task.

Source_load properly calculates the load of the source CPU (the CPU from which the task is migrated.

Target_load fairly calculates the load of the target CPU (the CPU to which the task may be migrated.

The high-priority system thread of the migration task between CPUs.

The running queue structure can also be found in the/usr/src/Linux/kernel/sched. c file. The 2.6 scheduler can also provide statistical information (if config_schedstats is enabled ). These statistics can be seen from/proc/schedstat in the/proc file system, which provides a lot of data for each CPU in the system, including load balancing and Process Migration Statistics.

  Outlook

The Linux 2.6 scheduler has taken a big step from the earlier Linux scheduler. It greatly improves the CPU utilization and provides a good response experience. Preemptible and better support for the multi-processor architecture bring the entire system closer to the operating systems that are very useful for both the multi-desktop and real-time systems. It is too early to talk about the Linux 2.8 kernel, but from the changes in version 2.6, we can expect more good things.

Address: http:// OS .yesky.com/lin/181/2576181.shtml

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.