Linux scheduler Overview

Source: Internet
Author: User

Source: http://www.ibm.com/developerworks/cn/linux/l-scheduler/

Level: Intermediate

M. Tim Jones (mtj@mtjones.com), consultant engineer, emulex

September 07, 2006

The Linux kernel continues to develop and uses new technologies, making great strides in reliability, scalability, and performance. One of the most important features of kernel 2.6 is the scheduler implemented by Ingo Molnar. This scheduler is dynamic and supports load balancing and operates at a constant speed-O (1 ). This article introduces these attributes of the Linux 2.6 scheduler and more.

This article will review the task scheduler in Linux 2.6 and its most important attributes. Before going into the details of the scheduler, let's first understand the basic goal of the scheduler.

What is a scheduler?

Generally, the operating system is the medium between applications and available resources. Typical resources include memory and physical devices. However, the CPU can also be considered as a resource, and the scheduler can temporarily allocate a task for execution (unit:Time slice). The scheduler makes it possible for us to execute multiple programs at the same time, so we can share the CPU with users with various needs.

An important goal of the scheduler is to effectively allocate CPU time slices and provide a good user experience. The scheduler also needs to face conflicting goals, such as minimizing the response time for key real-time tasks and maximizing the overall CPU utilization. Next, let's take a look at how the Linux 2.6 scheduler achieves these goals and compare them with the previous scheduler.


Early Linux schedulers

Importance of o-notation
O-notation can tell us how much time an algorithm will take. The time required by an O (n) algorithm depends on the number of inputs (linear relationship with N), while O (N ^ 2) is the square of the number of inputs. O (1) is irrelevant to the input. You can complete the operation within a fixed period of time.

Before kernel version 2.6, the scheduler had obvious restrictions when many tasks were active. This is because the scheduler uses an algorithm with the complexity of O (n. In this scheduler, the time consumed by a scheduled task is a function of the number of tasks in the system. In other words, the more active tasks, the longer the scheduled task takes. When the task load is very heavy, the processor will consume a lot of time due to scheduling, and the time used for the task itself will be very small. Therefore, this algorithm lacks scalability.

In the Symmetric Multi-Processing System (SMP), The scheduler before version 2.6 uses a running queue for all processors. This means that a task can be scheduled on any processor-this is a good thing for Server Load balancer, but it is a disaster for the memory cache. For example, assume that a task is being executed on a CPU-1 and its data is in the cache of this processor. If this task is scheduled to run on a CPU-2, the data needs to invalidate it in the CPU-1 and put it in the cache of the CPU-2.

In the past, the scheduler also used a run queue lock. Therefore, in the SMP system, selecting a task to execute will impede other processors from operating the run queue. The result is that the idle processor can only wait for the processor to release the queue lock, which will reduce the efficiency.

Finally, in the early kernel, preemption is impossible; this means that if a low-priority task is being executed, the high-priority task can only wait for it to complete.


Introduction to the Linux 2.6 Scheduler

The scheduler of version 2.6 is designed and implemented by Ingo Molnar. INGO has been involved in Linux kernel development since 1995. The motivation for writing this new scheduler is to create a full O (1) Scheduler for wakeup, context switching, and timer interrupt overhead. One problem that triggers requirements for the new scheduler is the use of Java Virtual Machine (JVM. The Java programming model uses a lot of execution threads. In the O (n) scheduler, this will generate a lot of scheduling load. O (1) the scheduler will not be affected too much in this case of high load, so JVM can effectively execute.

The 2.6 scheduler solves three major problems (O (N) and SMP scalability issues found in the previous Scheduler) and solves other problems. Now we will start to explore the basic design of the 2.6 scheduler.

Main scheduling Structure

First, let's review the scheduler structure of version 2.6. Each CPU has a running queue, which contains 140 priority lists, which serve in the FIFO order. All scheduled tasks are added to the end of the priority list of their respective running queues. Each task has a time slice, depending on how long the system allows the task to be executed. The first 100 priority lists of running queues are reserved for real-time tasks, and the last 40 are used for user tasks (see figure 1 ). Let's see why this difference is very important later.

Figure 1. Running queue structure of the Linux 2.6 Scheduler
 

Besides the CPU running Queue (calledActive runqueue)), There is an expired running queue. When a task in the active running queue uses its own time slice, it is movedExpired running Queue (expired runqueue). During the moving process, the time slice will be re-calculated (so it will reflect its priority; it will be described in more detail later ). If there is no task with a given priority in the active running queue, the pointer pointing to the active running queue and the expired running queue will be exchanged, in this way, the expiration priority list can be changed to the activity priority list.

The scheduler is very simple: it selects a task in the queue with the highest priority for execution. To make this process more efficient, the kernel uses a bitmap to define when a task exists in a given priority list. Therefore,find-first-bit-setWhich of the five 32-bit characters (140 priorities) has the highest priority. The time required to query a task for execution does not depend on the number of active tasks, but on the number of priority. This makes the scheduler of version 2.6 a process of complexity O (1), because the scheduling time is both fixed and not affected by the number of active tasks.

Better support for SMP Systems

So what is SMP? SMP is an architecture where multiple CPUs can be used to execute each task at the same time. Unlike the traditional asymmetric processing system, SMP uses one CPU to execute all the tasks. The SMP architecture is very beneficial to multithreading applications.

Although priority scheduling can also work in the SMP system, its large lock architecture means that when a CPU selects a task for distribution and scheduling, the running queue will be locked by this CPU, other CPUs can only wait. The scheduler of version 2.6 does not use a lock for scheduling. On the contrary, it has a lock for each running queue. This allows all CPUs to schedule tasks without competing with other CPUs.

In addition, since each processor has a running queue, tasks are usually closely related to the CPU, which can better utilize the hot cache of the CPU.

Task Preemption

Another advantage of the Linux 2.6 scheduler is that it allows preemption. This means that tasks with lower priority cannot be executed when a high-priority task is ready to run. The scheduler will seize a low-priority process, put the process back in its priority list, and then re-schedule it.


But please wait. There are more functions!

It seems that the O (1) and preemption features of the 2.6 scheduler are not enough. The scheduler also provides dynamic task priority and SMP Load Balancing functions. Next let's discuss what these functions are and what advantages they provide.

Dynamic task priority

To prevent tasks from occupying the CPU exclusively and starve other tasks that need to access the CPU, the scheduler of Linux 2.6 can dynamically modify the task priority. This is done by punishing CPU-bound tasks and rewarding I/O-bound tasks. I/O-bound tasks usually use the CPU to set I/O, and then wait for the I/O operation to complete. This behavior provides CPU access for other tasks.

Better User Response Capability
All tasks that communicate with users are of the optimized type, so their response capabilities should be better than non-interactive tasks. Communication with users (whether sending data to the standard output or waiting for input data through the standard input) is I/O-bound, therefore, improving the priority of these tasks can provide better interactive response capabilities.

Because I/O-bound tasks are selfless for CPU access, their priority is reduced (rewarded) by up to five. A cpu-bound task is penalized by adding a maximum of five priority levels.

Whether the task is I/O bound or the CPU bound, this is based onInteractivityDetermined in principle. Task interaction indicators are calculated based on the time spent in task execution and the time spent in sleep. Note that because I/O tasks are scheduled first and then sleep, therefore, I/O-bound tasks will spend more time sleeping and waiting for I/O operations to complete. This will increase the interaction index.

It is worth noting that priority adjustment only applies to user tasks and does not apply to real-time tasks.

SMP Load Balancing

When creating tasks in the SMP system, these tasks are put into a given CPU running queue. Generally, we cannot know when a task is short-lived or needs to run for a long time. Therefore, the initial task to CPU allocation may not be ideal.

To maintain task load balancing among CPUs, tasks can be re-distributed: Move tasks from the CPU with heavy loads to the CPU with light loads. Use of the scheduler in Linux 2.6Load Balancing)This function is provided. Every 200 ms, the processor checks whether the CPU load is not balanced. If not, the processor performs a task balancing operation between CPUs.

One negative impact of this process is that the cache of the new CPU is cold for the migrated tasks (data needs to be read into the cache ).

Remember that the CPU cache is a local (On-Chip) memory that provides faster access than the system memory. If a task is executed on a CPU, data related to the task will be stored in the local cache of the CPU.Hot. If there is no data in the local cache of the CPU for a task, the cache is calledCold.

Unfortunately, keeping the CPU busy will cause the CPU cache to be cold for the migrated tasks.


Explore more potential

The source code of the 2.6 scheduler is well encapsulated in the/usr/src/Linux/kernel/sched. c file. In table 1, we summarize some useful functions that can be found in this file.

Table 1. Functions of the Linux 2.6 Scheduler
Function Name Function Description
schedule The main function of the scheduler. Task execution with the highest scheduling priority.
load_balance Check the CPU and check whether there is any imbalance. If not, try to migrate the task.
effective_prio Return the valid priority of the task (based on the static policy, but can contain any rewards and punishments ).
recalc_task_prio Determine the reward or punishment for the task based on the idle time of the task.
source_load Properly calculate the load of the source CPU (the CPU from which the task is migrated.
target_load Calculate the load of the target CPU fairly (the CPU to which the task may be migrated.
migration_thread A high-priority system thread for migration tasks between CPUs.

The running queue structure can also be found in the/usr/src/Linux/kernel/sched. c file. The 2.6 scheduler can also provide some statistical information (if enabledCONFIG_SCHEDSTATS). These statistics can be seen from/proc/schedstat in the/proc file system, which provides a lot of data for each CPU in the system, including load balancing and Process Migration Statistics.


Outlook

The Linux 2.6 scheduler has taken a big step from the earlier Linux scheduler. It greatly improves the CPU utilization and provides a good response experience. Preemptible and better support for the multi-processor architecture bring the entire system closer to the operating systems that are very useful for both the multi-desktop and real-time systems. It is too early to talk about the Linux 2.8 kernel, but from the changes in version 2.6, we can expect more good things.

References

Learning

  • For more information, see the original article on the developerworks global site.
  • "Improving Linux kernel performance and scalability" (developerworks, January 2003) explains how to determine Linux kernel system performance, in this way, we can evaluate and optimize the system (including the scheduler tuning ).
  • In the Linux Process scheduler improvements in version 2.6.0 article on open source development labs, the scheduler of version 2.6 has improved the configurations of the scheduler of version 2.4 in various multi-processor systems.
  • "Linux kernel 2.6: The Future of embedded computing" (Linux Journal, November March 2004) briefly introduced some important features of kernel 2.6, including the scheduler and its impact on embedded systems.
  • "Kernel comparison: improvements in kernel development from 2.4 to 2.6" (developerworks, November February 2004) introduced tools, tests, and technologies for better version 2.6.
  • IBM redbooksAn application centric Performance Evaluation of the Linux 2.6 Operating System(March July 2004) focuses on issues related to scalability and performance around actual applications and the Linux 2.6 framework.
  • "Migrating Linux kernel from 2.4 to 2.6 On iseries and pseries" (developerworks, July 2004) focuses on the differences between the power architecture of Linux kernel 2.4 and 2.6.
  • "Linux 2.6 for Embedded Systems -- closing in on real time" (RTC magazine, November 2003) discussed how embedded systems with Real-Time Attributes can get the most advantage from the 2.6 scheduler.
  • In what's new in Linux 2.6 (a presentation by Dr. Ulrich Weigand), we can see how Linux 2.6 helped IBM zseries develop.
  • In the developerworks Linux area, you can find more resources for Linux developers.
  • Stay tuned to developerworks technical events and network broadcasts.

Obtain products and technologies

  • Find the latest Linux Kernel on Linux kernel archives.
  • Order a free SEK for Linux with two DVDs, including the latest trial software for IBM for Linux, including DB2, Lotus, rational, Tivoli, and websphere.
  • Use IBM trial software in your next development project, which can be downloaded directly from developerworks.

Discussion

  • Join the developerworks community by joining developerworks blogs.

About the author

Tim Jones is an embedded software engineer. He isGNU/Linux Application Programming,AI Application ProgrammingAndBSD sockets programming from a multilanguage perspectiveAnd other books. His engineering background is very extensive, from synchronizing the kernel development of the spacecraft to the embedded architecture design, to the development of network protocols. Tim is a consultant engineer at Emulex Corp.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.