Kernel notes--process scheduling

Source: Internet
Author: User

The scheduler accomplishes the following tasks:

    • Clock interrupt (or similar timer) time slice of refresh process, set process dispatch flag
    • Check dispatch flags when system call returns or interrupts complete

Schedule function

The function that completes the process dispatch in the kernel code is schedule (), which contains the following calls:

    1. Put_prev_task (RQ, prev);
    2. Next = Pick_next_task (RQ);
    3. Context_switch (RQ, Prev, next);

Schedule first puts the current execution function into the run queue and then selects the next process to run (how to select the next process, which is the work of the scheduling algorithm), and finally completes the process context switch.

By the a process into the schedule function, from the function will become a B process, a process call schedule equivalent to abandon the CPU resources, the process from R state to S and other states will go through schedule. In Vmcore, we can see that the S state process enters the S state before the last called function is schedule:

    1. crash> PS | grep sshd
    2. 1300 1 0 ffff81024fba1810 in 0.0 26088 1380 sshd
    3. Crash> BT 1300
    4. pid:1300 task:ffff81024fba1810 cpu:0 COMMAND: "sshd"
    5. #0 [FFFF81026A2E9CD8] schedule at ffffffff802d9025
    6. #1 [ffff81026a2e9da0] schedule_timeout at ffffffff802d9a6b
    7. #2 [ffff81026a2e9df0] do_select at FFFFFFFF801935F2
    8. #3 [ffff81026a2e9ed0] sys_select at ffffffff80193880
    9. #4 [ffff81026a2e9f80] System_call at ffffffff8010ad3e
    10. ......

A process can be "dispatched," while some processes or threads actively call schedule, discarding the CPU, such as KSOFTIRQD kernel threads.

Process priority

The early scheduling algorithm calculates the time slices that the process runs according to the process priority and chooses the scheduled process. For normal process, nice indicates its priority, NICE's value range is -20~19,nice value, the higher the priority, and for real time process, the priority value range is 0~99, The higher the same value, the lower the priority.

The 100~139,max_rt_prio macros in the Nice range in the kernel code define the boundaries between real-time process priorities and nice values, and real-time processes take precedence over normal processes.

Through the top, PS and other commands we can find the priority of the process, the following is the bottom half of the output of the top command, process information query results:

  1. PID USER PR NI VIRT RES SHR S%cpu%MEM time+ COMMAND
  2. Root 0 0 0 0 R 1.9 0.0 60:23.34 KSOFTIRQD/9
  3. 8689 Root 0 273m 67m 11m S 0.0 0.4 2:04.01 java
  4. 11058 Root 0 0 0 S 0.0 0.0 1:45.68 kipmi0
  5. 11771 root-98 0 20388 19m 7256 S 0.0 0.1 0:16.06 had
  6. 3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0

Where (the PR column value +100) indicates the process priority, the above output, the Java process priority is 120, is the normal process; the had process has a priority of 2, which is a real-time process. RT indicates that the process is a real-time process and has a priority of 0.

You can use the Renice command to set the priority of a normal process:

    1. Linux # Renice 10-p 21691
    2. 21691:old priority-20, new priority 10

Scheduling algorithm

The scheduling algorithm has been continuously improved from the initial traversal O (n) algorithm to the O (1) algorithm in version 2.5 kernel to the 2.6 kernel CFS (completely fair scheduler).

The O (1) Scheduling algorithm allows the scheduler to select the time of the next execution process to become constant, not as the number of running processes changes, but with the following drawbacks:

    1. Algorithms for dynamically tuning process priorities are complex and not easy to code maintenance
    2. Not "biased" interactive program, if the response to the mouse movement process and a background process has the same priority, when the mouse is moved, another process may be more scheduled, which makes the mouse movement response slow or have ghosting

Thus applied to the server, O (1) Scheduling algorithm performance is fair, in the more interactive desktop applications are very bad.

CFS solves the above O (1) algorithm problem, its core data structure is red black tree, the red black tree key for the process of virtual run time (vruntime). Each time the schedule is selected, the process executes with the minimum vruntime, i.e. the leftmost node of the red-black tree corresponding to the process.

In the CFS scheduling algorithm, the interactive process will gain greater weight and more chance of being executed. If there is now a vruntime 1ms interactive process, and another vruntime 10ms CPU consumption process, the scheduler continuously scheduling interactive process 10 times, and then scheduling vruntime 10ms CPU consumption process. The scheduling method used in Android is also CFS.

In the previous scheduling algorithm, the nice priority determines the time slice size of the process, and in CFS, NICE is associated with process weights, and the weight-heavy process gets a long run time. The prio_to_weight array defines nice's correspondence with weights, which are defined in.

Scheduling policy

Kernel provides 5 scheduling strategies for common processes in the following 3 ways:

    1. Sched_other
    2. Sched_batch
    3. Sched_idle

The 3 scheduling policies corresponding to the process, according to the CFS schedule, priority is reduced.

For real-time processes, the following 2 types are available:

    1. Sched_fifo
    2. Sched_rr

FIFO is a first-come-first service, and a process with that scheduling policy runs until it is suspended by an I/O request or preempted by a real-time process with a higher priority.

RR is called scheduling (Round Robin), except that it sets the maximum time slice for the process to run, and the other scheduling mode is the same as the FIFO.

Because the priority of the real-time process is higher than the priority of the normal process, when the process with the above scheduling policy becomes operational, it will preempt the normal process scheduled in CFS mode.

CFS scheduling algorithm related code in SCHED_FAIR.C, real-time process-related scheduling algorithms are defined in SCHED_RT.C.

There are two function calls associated with the process scheduling policy, which are used to obtain and set the process scheduling policy:

    1. Sched_getscheduler
    2. Sched_setscheduler

The following program example uses the Sched_setscheduler call to set the priority of the program itself to the highest real-time priority:

    1. Sched_test.c
    2. #include
    3. #include
    4. #include
    5. int main ()
    6. {
    7. struct Sched_param param;
    8. param.sched_priority = Sched_get_priority_max (SCHED_FIFO);
    9. Sched_setscheduler (Getpid (), Sched_fifo,? m);
    10. printf ("running...\n");
    11. while (1);
    12. return 0;
    13. }

After executing the program, we can see in the top

    1. PID USER PR NI VIRT RES SHR S%cpu%MEM time+ COMMAND
    2. 24484 root RT 0 3704 R 0.0 1:32.51 sched_test

You can use the Renice command to set the priority of a real-time process only through a function call, unlike setting a nice value.

Load Balancing

The scheduler for multicore Cpu,kernel also needs to address another issue: How to load balance multiple cores. If a nucleus is busy running a program, and the other cores are idle, then the meaning of multicore existence is lost.

For CPUs with symmetric multi-processing (symmetric MULTIPROCESSING,SMP) architectures, in the same physical CPU, the caches can be shared between multiple cores, but not between the physical CPUs. Thus in the same physical CPU multi-core move process, compared to the physical CPU moving process, the overhead is small.

Migration Kernel threads Complete the work of balancing multicore workloads, migration pull up in the following ways:

    1. Clock Interrupt handler function Timer_interrupt call Scheduler_tick function
    2. In Scheduler_tick, the Trigger_load_balance function is called
    3. If the CPU load condition is met, Trigger_load_balance generates a soft interrupt via the RAISE_SOFTIRQ call
    4. Subsequent soft interrupts are processed and the migration thread is pulled up to complete load balancing work

In addition, we can set a process to execute on a specific CPU and not participate in load-balancing scheduling with the Taskset command:

    1. Linux # TASKSET-CP 0./sched_test

Above command set Sched_test to run on number No. 0 CPU, using top we can see the effect of execution:

  1. Linux # Ps-elf | grep Test | Grep-v grep
  2. 4 R Root 24502 24053 99-40--633-12:49 PTS/7 00:00:15./sched_test
  3. Linux # Top-p 24502
  4. Top-12:50:19 up 5 days, 3:04, 7 users, load average:1.55, 1.86, 2.16
  5. Tasks:1 Total, 1 running, 0 sleeping, 0 stopped, 0 zombie
  6. Cpu0:100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
  7. Cpu1:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
  8. Cpu2:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
  9. Cpu3:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
  10. mem:23980m Total, 375M used, 23604M free, 37M buffers
  11. swap:2055m Total, 0M used, 2055M free, 221M cached
  12. PID USER PR NI VIRT RES SHR S%cpu%MEM time+ COMMAND
  13. 24502 Root RT 0 2532 364 292 R 0.0 0:40.73 sched_test

Reference:chapter 4-process scheduling, Linux kernel development.3rd.edition

Kernel notes--process scheduling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.