Linux real-time task scheduling algorithm analysis

Last Update:2018-12-03 Source: Internet

Author: User

Tags high cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original address: http://blog.csdn.net/imtgj/article/details/7107489? Reload

In view of the recent issues related to CPU usage that involve the scheduling algorithm of Linux kernel, it is necessary to understand. Therefore, I wrote this article. Common Linux tasks include real-time tasks and non-real-time tasks. The scheduling algorithm of real-time tasks is familiar to priority preemption or priority preemption and time slice. The main idea is to give priority to efficiency. The Scheduling Algorithm for non-real-time tasks is CFS (completely fair algorithm). From the perspective of its name, we can see that its idea is fairness first and efficiency both. As we use a large number of real-time tasks, this article first introduces the scheduling mechanism of real-time tasks.

1. Priority Scheduling under up (uniprocess)
Those familiar with VxWorks know that VxWorks uses Priority preemption, from 0 to 255, with a higher priority and a lower priority. If the time slice is added for scheduling, the task will be rescheduled after the task runs its time slice. At this time, if the task has the same priority, the task can be run. This should be noted that even if the time slice of the current process is used up, tasks with lower priority still cannot obtain the CPU, and re-scheduling will still be scheduled to the previous running tasks. The so-called time slice means that tasks with the same priority occupy the CPU in turn, and different priorities are only preemptible. If time slice scheduling is not enabled, tasks with the same priority can only obtain the CPU after the previous tasks are completed.
To put it simply: there is only one seat. Who will sit? It is bigger than the Fist. The fist is as big as the fist.
In fact, the scheduling mechanism in Linux in the up mode is not much different from that in VxWorks. The difference is the implementation and scheduling of internal code.
In VxWorks 5.5.1, only one ready queue is maintained, which is sorted by priority. The Queues with the same priority are ranked first and sorted in order. During each scheduling, the next task is taken from the queue header to run.
In Linux, multiple ready queues are maintained by priority, and each priority has a ready queue. When a task enters the queue, the corresponding bit is set in the queue bit chart, this indicates that the task is ready for scheduling. Scan the queue bit chart during scheduling and run the next task from the high-priority queue.
Another difference is that as a real-time operating system, VxWorks checks whether scheduling is required only when the task is ready, and Linux checks whether scheduling is required only when the scheduling point arrives.
The scheduling under up is very easy to understand. What about SMP?

Ii. Priority Scheduling under SMP
With money, I bought a lot of stools, even two-person sofa and four-person sofa. It is not easy to let the right person sit on the right stool.
The goal is to grab the same CPU while the SMP is to grab one CPU from multiple CPUs. A common misunderstanding is: when talking about SMP, you will think of Server Load balancer. In fact, real-time task scheduling does not consider load balancing, and its scheduling algorithm is preemptible. Since it is a competition, of course it is a one-to-one task. The server Load balancer of real-time tasks should be considered by the designers: the relationship between core and task, how to better schedule real-time tasks.
What if I grab one from multiple CPUs? Of course, it is the seat close to your side (current CPU). If it can be grabbed, otherwise it will pick the weakest one. Therefore, the system maintains the priority of all the current CPUs, which is actually the priority global table of the current running tasks on each CPU. This table is updated during Task Scheduling on each CPU. When a task is ready to be preemptible, the task will start with the lowest priority.
It seems so simple, but in fact, this is just a big principle, because SMP involves task binding, task migration, CPU status (active to inactive), related queue operations, there are still many details to consider.
1. CPU priority status table
The structure of the Global Status table is similar to that of the ready queue. Each priority list contains the CPU bitmap of the priority and the bitmap of the priority list (whether the priority has CPU instructions ), a cpu priority array. Bitmap is used to speed up the search.

The following describes how to schedule real-time tasks in SMP in three scenarios.

1. The task is awakened.
When a task is ready (for example, when the sleep time is reached, or when the waiting semaphore is idle), the task will be awakened. In this case, you need to select a suitable CPU.
The criteria for determining which CPU is suitable are as follows:
1) if the current task is not a real-time task, you don't have to worry about it. It is appropriate to seize the current CPU.
2) The priority of the wake-up task is greater than that of the current task. If the current task is allowed on multiple CPUs, you do not have to worry about it. It is appropriate to seize the current CPU.
3) The priority of the wake-up task is greater than that of the current task, but the current task is only bound to the current CPU, you need to find a suitable CPU.
4) The wake-up task can run on multiple CPUs, and its priority is lower than the current task, you need to find a suitable CPU.

In the next two cases, you need to find the CPU priority status table to find the list of CPUs with the lowest priority that matches the task (for example, if the task is bound to some CPUs, although the CPU with a lower priority is still available, or not), and then select one from the list. The standard is: if the CPU of the wake-up task last run is in the list, select it; if the current CPU is in the list, select it; otherwise, select one based on the scheduling domain. To put it simply, the scheduling domain groups the CPU Based on the relationship between distance and distance. We will learn more about Server Load balancer later.
In general, the first and second cases are the response as quickly as possible (real-time tasks). If you pick the current CPU, you can think of it as fastpath; there is no way to find the CPU set with the lowest priority, consider the cache and memory utilization, and select one from it, which can be considered as slowpath.
After the target CPU is selected, the ready queue of the CPU is entered, waiting for scheduling, and the task is mounted to the waiting push Queue (the task is not only bound to the target CPU ).

Check whether the current task can be preemptible. If yes, set the scheduling flag. In special cases, if the current running task has the same priority as the Awakened task, you need to determine whether the running task can run on other CPUs (for example, if the Awakened task is bound to the current CPU, if the running task is not bound to the current CPU, the current task can be run on another CPU. This is a pleasure.) If yes, wake up the task into the queue header and set the scheduling flag.

If the task cannot be preemptible, it will try to push the task to another CPU to run. This is a so-called "push" mechanism. In this case, the processing process for selecting the CPU is similar to that when the task is awakened. Select the CPU list with the lowest priority and select the appropriate CPU based on factors such as scheduling domain and cache utilization.
Correspondingly, the "pull" mechanism is to change the status of the current task during scheduling preparation (the priority adjustment is lower, from real-time task to non-real-time task ), pull the task from the push queue of other CPUs to run.

Here, the task has been placed in a proper ready queue, waiting for scheduling to run.

2. The current running task abandons the CPU
When a task dies, voluntarily (such as sleep), or is forced (such as not getting a semaphore) to discard the CPU, scheduling is triggered. In this case, you need to select a suitable task to run.
If the current task is a real-time task and the ready queue does not have a task with a higher priority than the current task, the system may still have a real-time task with a lower priority than the current task. This will trigger the pull operation. This operation mainly solves the following scenarios: assume that the system has two CPUs and four tasks with the priority of T1> T2> T3> T4, and T1 as the current task of cpu1, t3 is in the readiness queue of cpu1, T2 is the current task of cpu2, and T4 is in the readiness queue of cpu2. If T2 is executed quickly and scheduling is triggered, T4 cannot be selected. T3 must be pulled from cpu1 (if T3 is push-able, that is, T3 is not bound to cpu1 ). It can be seen from this that the push and pull mechanisms are actually designed to solve the problem that the first time a task is queued may not be the best. If the system only has one ready queue, it will certainly not be wrong to retrieve tasks from the queue header every time. Although simple, it causes all CPUs to compete in the same queue, the scheduling of each CPU must be processed in serial mode, and the performance is unacceptable. Therefore, if you change the queue to a ready queue for each CPU, but an error queue may occur, push and pull processing will be introduced.

Then, based on the instructions of the Priority bitmap, extract the next task from the highest priority queue.

3. The task is created.
For real-time tasks, the current CPU is used as the initial CPU of the task;
The task status is set to ready, which is incorporated into the ready queue of the current CPU;
Check whether the current task can be preemptible. If yes, set the scheduling bit.

Scheduling is the management and allocation of resources. The difference is that up only needs to manage task resources, while SMP also needs to manage CPU resources, and the complexity of the two needs to be multiplied to get the best combination.

By now, the scheduling process of real-time tasks has been clearly analyzed.
Next let's take a look at the recent scheduling problems:

Iii. One Scheduling Problem
The test uses tesgine to send packets of the same traffic to eight packet receiving tasks. It is found that the CPU usage of each task is not similar, the difference is close to 20%, and the task with a high CPU usage is packet loss. In addition, it is found that the CPU usage of two cores is always higher than that of other cores, And the CPU usage of each core is not similar. Our system has 6-core and 12 vcpus.

Through the above analysis, we can easily explain this phenomenon:
There are six two-person sofas in one room, each of which has a coffee table in front of each sofa, and candy is placed on the coffee table. When I first came to six people, each person sat on a sofa, and the other two were only crowded with others, so that they were filled with two sofa, the other four sofa chairs are only one person (the kernel is not balanced ). A person sitting on a sofa alone is quite cool. If he wants to eat the candy on the coffee table, he can take it, you should also check whether you should meet anyone next to it. If everyone is interested in jelly, you must wait for another person to finish taking it. Therefore, if the same candy is consumed, the person sitting on the sofa is definitely longer (with a high CPU usage) than the person sitting on the sofa ). The diligent Filipino maids kept sending candy to everyone. The sofa-sitting person slowly began to pile up the coffee and finally couldn't put it down (packet loss ).
Why don't people sitting on the sofa move to another sofa? What are the advantages of moving to another sofa? In this sofa's ass, I just sat down hot (Cache hot), and I still want to sit in another sofa with others.
Another small problem: the kernel uses seats as the management object, rather than the sofa as the object for allocation. Why are there six people who come first sitting on a sofa, instead of having two people sit on the same couch first? In fact, when the kernel numbers the seats, it makes a little trick. six two-person Sofas are numbered as follows: Sofa 1 has two seats with a seat number, the seat number of sofa 2 is 1, 7 ,... therefore, you only need to enter the seat by the right seat number.
Is the task started at the beginning like this? Actually, it is not. This is a phenomenon in a stable State (in our system, it is very fast to reach this State ). As mentioned above, when a task is created, the initial CPU used to run the task is the CPU used to run the Code created at that time. The package receiving task has a priority of 30. At that time, only the timer task (20) was higher than the package receiving task in the test environment ). False at that time to create the run of the CPU is 0, the package tasks are R1-R8, the timer task for the T1-T8, the timer task will certainly seize the Package task, but the timer task runs very fast, most of them in sleep, the package receiving task was very busy. At that time, it was observed that the CPU usage was 70-90%. Assume that R1 runs first, and other package receiving tasks are ready. Because R1 is running, other package receiving tasks can only be migrated to other CPUs for running, so R2 reaches vcpu6 (the reason for scheduling domain ), r3-> vcpu1, R4-> vcpu2, R5-> cpu3, R6-> vcpu4, R7-> vcpu5, r8-> vcpu7. (assume that the package receiving task runs in sequence R1-> R8 ready scheduling,
When each package task is ready, the number of the Package task before it is running, and the system does not have other tasks). The actual running situation is slightly complicated, but it does not affect the final result, at this time, the timer task is awakened to seize R1, so R1 is migrated to vcpu 8. Because most of the timer tasks are sleeping, therefore, the T1-T8 task is very likely to run on vcpu0, even if the timer task is running, Another timer task is ready, it will only be migrated to the vcpu with the lowest priority (vcpu9, vcpu10, vcpu11), rather than the vcpu of the package receiving task. After several scheduling times, it quickly becomes stable. Because the package receiving task has the highest priority on its running CPU, it will not be migrated, this is the final phenomenon we see.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More