A macro understanding of the CFS scheduler in Linux

Source: Internet
Author: User
I re-read the CFS scheduler today, so I can't help writing another article about CFs. The running time of the CFS scheduler is 0 (logn ), in the past, the running time of the scheduler was O (1). Does this mean that the efficiency of CFS is worse than that of O (1? This is not the case. We know that the running queues under the CFS scheduler are organized based on the red/black tree. We can find that the next process is to cut down the nodes in the lower left corner and complete them at a fixed time. The so-called O (logn) it refers to the insertion time, but the statistical performance of the red and black trees is good, and there is not much probability that it has really used so much time, because the special arrangement of red and black nodes not only ensures a certain degree of balance between the tree, but also does not take too much time to maintain this balance, insert operations can be completed quickly in most cases, especially for well-organized data. In fact, the reason why CFS can enter the kernel is not because it can provide good performance, because in a single thing architecture, performance improvement must have a limit, we cannot simply go in one direction. For example, the CPU assembly line was considered good at the beginning of the design, but Intel's netburst architecture simply lengthened the assembly line. The longer the assembly line, the higher the performance, the higher the throughput, however, there will never be a monotonic increasing function in engineering, and there will always be an inflection point.

The essence of CFS is that it changes the way the kernel views process scheduling. Is the finer the time slice, the better? The more preemption points, the better? If so, I'm sorry. By the kernel around 2.6.17, the time slice has been very fine, and there cannot be more preemption points. If this reasoning is correct, isn't it true that the Scheduler cannot continue to develop? Linux is definitely not such a character. O (1) the scheduler designer is well-founded, another completely different scheduling method is provided. This is CFS, which discards time slices and complex algorithms, and starts a new era of schedulers from a new starting point, in the first version of 2.6.23, the kernel CFS is very simple and has a very clear idea. The essence of CFS is to provide a virtual clock. All processes reuse the Time of the virtual clock. CFs abstracts the concept of the clock from the hardware related to the underlying system, the process scheduling module directly communicates with this virtual clock interface without worrying about hardware clock operations. As a result, the entire process scheduling module is complete, from the clock to the scheduling algorithm, different policies for different processes are all provided by the virtual system. In this new kernel, scheduling classes are introduced. Therefore, the new scheduler schedules processes with different features under a unified Virtual Clock according to different policies.

Let's take a look at the benefits of doing so. CFs abandoned the time slice and adopted a smooth mechanism to push all processes forward together in catch-up mode, how can we differentiate processes with different priorities? All processes cannot go forward at the same speed. Obviously, no! We all believe that the best way to catch up with each other is as beneficial as competition. When I was in middle school, the teacher taught us to catch up with each other, at that time, the teacher meant that everyone was chasing me! Since we believe in the best way to catch up with each other, the next step is how to catch up with each other. If we use a hard clock in the traditional scheduling system, therefore, it is difficult to manage different processes with different priorities. Therefore, we propose the concept of virtual clock. Each process has a virtual clock, and then the entire system has a virtual clock, the Virtual Clock of the system advances at a fixed pace, while the Virtual Clock of each process advances at a different pace because of its different priorities. For example, the virtual clock of a low-priority process is fast, however, the pace with a higher priority is slow, which is difficult to achieve with the hardware clock. If the hardware clock is used, we need to maintain the status of each process, then, set the status of each process to a global variable. As you can imagine, this variable should be a linked list or array. You need to search for this linked list whenever the hardware clock is interrupted, deletion, election, and other operations are very cumbersome, not to mention the time, and the memory occupied by the space is unacceptable. The concept of virtual clock proposed by CFS makes it easier to catch up with each other. In the time slice-based scheduling mechanism, process scheduling is passive, the scheduling module had to find a way to find out the characteristics of each process and set different policies for them based on different features, and then assign them different time slices based on these different policies, this will inevitably lead to abnormal prediction algorithms, which are very complicated and make our study seem to be able to sell for us. However, this world does not like complexity, so our study must give up; now let's take a look at the CFS scheduler. The process has its own priority and is mapped to the weight value. The virtual clock speed of each weight value process is different. Everyone just goes forward according to their own virtual clock, the scheduling module selects the process that lags the most behind the system virtual clock for execution. This is the meaning of fairness, so in O (1) how does the prediction algorithm for interactive processes in the scheduler be reflected in CFS? In fact, there is no need to consider this issue, because in the O (1) scheduler, the interaction process must be differentiated because the O (1) scheduler is fully scheduled based on priority, all other scheduling bases require dynamic prediction, and dynamic prediction requires complex algorithms. In the CFS scheduler, priority is not the basis for scheduling, in fact, the priority is mapped to the process weight, and then each process catches up with each other, this can ensure that any process will not be delayed for too long, the problem of the interactive process is solved, O (1) the Scheduler cannot catch up with each other because processes cannot catch up with each other. For example, in O (1), the priority of an interactive process and a non-interactive process is P, and the Scheduler only schedules them based on their priority, I don't know anything about it. I only need to detect their sleep time to judge the process category. By judging if a process is an interactive process, for the moment, instead of putting it in the expired queue, it will continue to run to compensate for its excessive sleep, which is not the case in the CFS scheduler, even if an interactive process and a non-interactive process are mapped to a weight, the interaction process sleeps at will, as long as it wakes up, it can be not too large (nor too small, according to its own weight and the current system virtual clock) the key is added to the red/black tree. As the system continues to run, it will run fairly. In O (1), if the interaction process is not predicted, there is a reason for the delay of the interaction process. Another paragraph is too long.

In O (1), the cause of the interaction process delay response is that two arrays exist, one running queue and one expired queue. Generally, the system allocates time slices based on the process priority, as long as the time slice is completed, it will be placed in the expired queue, and the problem lies in this expired queue. In principle, only all processes with all priorities are put in the expired queue, that is to say, if all processes with all priorities are completed, they will switch between the expired queue and the running queue. If they are not differentiated, the more processes run in the system, the interaction process seems to be slower, so O (1) uses sleep time to distinguish between the interaction process and the non-interaction process. It can be seen in O (1, the latency of interaction processes mainly lies in the number of all processes. In fact, this does not matter to the server, but not to the interaction process, because the server process emphasizes the fairness under the priority, the interaction process emphasizes responsiveness. For fairness, it is okay to wait for everyone to wait. For the interaction process, it is not allowed to wait too long. In any case, O (1) the scheduler is not very satisfactory. The process running the queue can be fair and efficient, but the process of the expired queue is facing a waiting period of time... after refreshing In the future, although many people like to take a rest for a while, I believe more people like to have a better time in the near future. In this case, the granularity of the O (1) scheduling cycle is the sum of the running time of all processes. Although this algorithm can ensure the fairness of the running queue process, it cannot guarantee the global fairness, no matter how long a process is running, it must wait for a long time. In the whole lifecycle of a process, it seems to be intermittent. in CFS, the execution of any process will be continuous. This is the progress of CFs. The previous scheduler must take the hardware clock as the standard, so it cannot be too flexible. The CFS uses the virtual clock as the standard, flexible.

In kernel 2.6.23, the just-implemented CFS scheduler is very simple. Every time the clock is ticking, the current process is first queued, and its virtual clock and System Virtual Clock are pushed to the queue, then, judge whether the process in the lower-left corner of the Red-black tree is the current process and choose whether to schedule it. The calculation of the scheduler's key is to use the current virtual clock minus the waiting time of the process to be calculated, if the computing process is running, the waiting time is a negative value. In this way, the longer the waiting process, the smaller the key, the easier it is to be selected for running;

After kernel 2.6.25, a simpler way is to set a virtual clock for the running queue, which increases monotonically and tracks the minimum Virtual Clock of the queue, the key value is calculated based on the difference between the vruntime of the process and the virtual clock of the queue. This method is truly catch-up, which is simpler than 2.6.23, but clever, you do not have to team the current process in every tick answer, but determine whether to schedule the process based on the actual running time of the current process and the ideal running time.

The essence of CFS is that it changes the way the kernel views process scheduling. Is the finer the time slice, the better? The more preemption points, the better? If so, I'm sorry. By the kernel around 2.6.17, the time slice has been very fine, and there cannot be more preemption points. If this reasoning is correct, isn't it true that the Scheduler cannot continue to develop? Linux is definitely not such a character. O (1) the scheduler designer is well-founded, another completely different scheduling method is provided. This is CFS, which discards time slices and complex algorithms, and starts a new era of schedulers from a new starting point, in the first version of 2.6.23, the kernel CFS is very simple and has a very clear idea. The essence of CFS is to provide a virtual clock. All processes reuse the Time of the virtual clock. CFs abstracts the concept of the clock from the hardware related to the underlying system, the process scheduling module directly communicates with this virtual clock interface without worrying about hardware clock operations. As a result, the entire process scheduling module is complete, from the clock to the scheduling algorithm, different policies for different processes are all provided by the virtual system. In this new kernel, scheduling classes are introduced. Therefore, the new scheduler schedules processes with different features under a unified Virtual Clock according to different policies.

Let's take a look at the benefits of doing so. CFs abandoned the time slice and adopted a smooth mechanism to push all processes forward together in catch-up mode, how can we differentiate processes with different priorities? All processes cannot go forward at the same speed. Obviously, no! We all believe that the best way to catch up with each other is as beneficial as competition. When I was in middle school, the teacher taught us to catch up with each other, at that time, the teacher meant that everyone was chasing me! Since we believe in the best way to catch up with each other, the next step is how to catch up with each other. If we use a hard clock in the traditional scheduling system, therefore, it is difficult to manage different processes with different priorities. Therefore, we propose the concept of virtual clock. Each process has a virtual clock, and then the entire system has a virtual clock, the Virtual Clock of the system advances at a fixed pace, while the Virtual Clock of each process advances at a different pace because of its different priorities. For example, the virtual clock of a low-priority process is fast, however, the pace with a higher priority is slow, which is difficult to achieve with the hardware clock. If the hardware clock is used, we need to maintain the status of each process, then, set the status of each process to a global variable. As you can imagine, this variable should be a linked list or array. You need to search for this linked list whenever the hardware clock is interrupted, deletion, election, and other operations are very cumbersome, not to mention the time, and the memory occupied by the space is unacceptable. The concept of virtual clock proposed by CFS makes it easier to catch up with each other. In the time slice-based scheduling mechanism, process scheduling is passive, the scheduling module had to find a way to find out the characteristics of each process and set different policies for them based on different features, and then assign them different time slices based on these different policies, this will inevitably lead to abnormal prediction algorithms, which are very complicated and make our study seem to be able to sell for us. However, this world does not like complexity, so our study must give up; now let's take a look at the CFS scheduler. The process has its own priority and is mapped to the weight value. The virtual clock speed of each weight value process is different. Everyone just goes forward according to their own virtual clock, the scheduling module selects the process that lags the most behind the system virtual clock for execution. This is the meaning of fairness, so in O (1) how does the prediction algorithm for interactive processes in the scheduler be reflected in CFS? In fact, there is no need to consider this issue, because in the O (1) scheduler, the interaction process must be differentiated because the O (1) scheduler is fully scheduled based on priority, all other scheduling bases require dynamic prediction, and dynamic prediction requires complex algorithms. In the CFS scheduler, priority is not the basis for scheduling, in fact, the priority is mapped to the process weight, and then each process catches up with each other, this can ensure that any process will not be delayed for too long, the problem of the interactive process is solved, O (1) the Scheduler cannot catch up with each other because processes cannot catch up with each other. For example, in O (1), the priority of an interactive process and a non-interactive process is P, and the Scheduler only schedules them based on their priority, I don't know anything about it. I only need to detect their sleep time to judge the process category. By judging if a process is an interactive process, for the moment, instead of putting it in the expired queue, it will continue to run to compensate for its excessive sleep, which is not the case in the CFS scheduler, even if an interactive process and a non-interactive process are mapped to a weight, the interaction process sleeps at will, as long as it wakes up, it can be not too large (nor too small, according to its own weight and the current system virtual clock) the key is added to the red/black tree. As the system continues to run, it will run fairly. In O (1), if the interaction process is not predicted, there is a reason for the delay of the interaction process. Another paragraph is too long.

In O (1), the cause of the interaction process delay response is that two arrays exist, one running queue and one expired queue. Generally, the system allocates time slices based on the process priority, as long as the time slice is completed, it will be placed in the expired queue, and the problem lies in this expired queue. In principle, only all processes with all priorities are put in the expired queue, that is to say, if all processes with all priorities are completed, they will switch between the expired queue and the running queue. If they are not differentiated, the more processes run in the system, the interaction process seems to be slower, so O (1) uses sleep time to distinguish between the interaction process and the non-interaction process. It can be seen in O (1, the latency of interaction processes mainly lies in the number of all processes. In fact, this does not matter to the server, but not to the interaction process, because the server process emphasizes the fairness under the priority, the interaction process emphasizes responsiveness. For fairness, it is okay to wait for everyone to wait. For the interaction process, it is not allowed to wait too long. In any case, O (1) the scheduler is not very satisfactory. The process running the queue can be fair and efficient, but the process of the expired queue is facing a waiting period of time... after refreshing In the future, although many people like to take a rest for a while, I believe more people like to have a better time in the near future. In this case, the granularity of the O (1) scheduling cycle is the sum of the running time of all processes. Although this algorithm can ensure the fairness of the running queue process, it cannot guarantee the global fairness, no matter how long a process is running, it must wait for a long time. In the whole lifecycle of a process, it seems to be intermittent. in CFS, the execution of any process will be continuous. This is the progress of CFs. The previous scheduler must take the hardware clock as the standard, so it cannot be too flexible. The CFS uses the virtual clock as the standard, flexible.

In kernel 2.6.23, the just-implemented CFS scheduler is very simple. Every time the clock is ticking, the current process is first queued, and its virtual clock and System Virtual Clock are pushed to the queue, then, judge whether the process in the lower-left corner of the Red-black tree is the current process and choose whether to schedule it. The calculation of the scheduler's key is to use the current virtual clock minus the waiting time of the process to be calculated, if the computing process is running, the waiting time is a negative value. In this way, the longer the waiting process, the smaller the key, the easier it is to be selected for running;

After kernel 2.6.25, a simpler way is to set a virtual clock for the running queue, which increases monotonically and tracks the minimum Virtual Clock of the queue, the key value is calculated based on the difference between the vruntime of the process and the virtual clock of the queue. This method is truly catch-up, which is simpler than 2.6.23, but clever, you do not have to team the current process in every tick answer, but determine whether to schedule the process based on the actual running time of the current process and the ideal running time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.