BFS is a process scheduler, which can be interpreted as a "brain fault scheduler ". This odd name has multiple meanings and is easy to accept: It is so simple, but so outstanding that people may doubt their thinking ability.
BFS will not be merged into Linux mainline maintained by Linus, and BFS itself does not intend to do so. However, BFS has a large number of fans, and there is only one reason: BFS is very good, it makes the user's desktop environment unprecedented smoothness. In an era where hardware is becoming more advanced and systems are still dull, it is exciting.
In 2010, Android used BFS as the standard scheduler of its operating system, which also proved the value of BFS.
BFS vs CFS: Performance Testing Competition
BFS has been well received by many users, such as "Fast, fast" and "quick future of desktop. These words are eye-catching, so I began to look for BFS testing data, hoping to find the numbers or curves that illustrate all of these. But the results are quite disappointing...
Test Jens Axboe
Shortly after the release of BFS, in September 2009, Ingo Molnar released his evaluation report, comparing CFS and BFS. As the author of CFS, he claims that the test results are not surprising: CFS is superior to BFS in all aspects. However, people have different responses to his evaluation results. Some people agree with the results and may be confused. Jens Axboe is a suspect. He wrote a program named Latt. c and tried to test the two mysterious attributes of the Scheduler: "Interactivity" and "Fluidness ".
His test results are the opposite, indicating that BFS is superior to CFS in terms of interactivity and its CPU utilization is higher. However, the stability of BFS is poor, and in some cases it also shows poor interactivity.
From the test data of Jens, BFS is slightly better than CFS, but its advantage is not as exaggerated as it is widely used. Interested readers can find detailed data for the Jens test in the lkml mailing list: http://thread.gmane.org/gmane.linux.kernel/886319/focus=887636
As a result, I was a little disappointed and did not see that BFS was far ahead. On the contrary, it is difficult to tell who is the champion in the men's-meter finals of the Olympic Games. However, it is worth noting that this test unexpectedly caused people to recognize a serious problem of CFS itself.
The sleeper fairness feature of CFS causes serious scheduling latency in some cases. In Jens xmodmap testing, a latency of 10 s even occurs. In addition, the tests around Jens have been published, and there are many interoperability problems when using CFS. For example, when compiling the kernel, audio and video may experience severe pauses, BFS does not have these problems. However, these CFS problems all disappear after the sleeper fairness feature is disabled.
This forces the developers of the CFS scheduler to temporarily disable the sleeper fairness feature, and once said that the feature will be officially disabled in the upcoming 2.6.32 until the problem is resolved. Surprisingly, Ingo threw a new patch in a week, namely, Gentle Fairness. Using this patch, the 10 s delay disappears, and other negative reports about the mouse lag and the video pause on the CFS disappear...
Phoronix Testing
You can in http://www.phoronix.com/scan.php? Page = article & item = bfs_scheduler_benchmarks & num = 1 and http://global.phoronix-test-suite.com /? K = profile & u = zero-9274-28890-6247 see Phoronix professional test on BFS. This test was also completed in September 2009. As mentioned above, BFS and CFS have been updated since then, so this test cannot fully reflect the latest status of the two schedulers. However, as an authoritative evaluation organization, this evaluation result is worth looking.
According to the test results of Phoronix, BFS is a little ahead in multiple tests, while CFS is a little superior in other test projects. I can't help but feel a little sad.
The only test item that can reflect BFS "QPS" comes from the network server throughput test, where this most convincing and powerful histogram is posted.
Figure 1. network throughput Test
But in addition, in general, the test results of Phoronix only show that BFS and CFS are consistent.
University of New Mexico Computer System Evaluation
Taylor Groves, Je Knockel and Eric Schulte of the University of New Mexico released a BFS vs. CFS evaluation report in December 2009.
Their evaluation focuses on three aspects: latency, Turnaround Time, and interactivity. The following is an excerpt from their test results.
Figure 2. latency
Figure 3. Turnaround Time
Figure 4. Interaction
The three figures finally gave me a chat to comfort the hard work I was looking for. Based on the evaluation results, I finally got the conclusion:
In turnaround time, CFS is better than BFS. However, the scheduling latency of BFS is less than that of CFS. This shows that BFS is more suitable for interactive application environments. CFS is more suitable for batch processing environments. This is the same as the experience of many users.
Summary
The above three evaluations were completed before the release of Linux2.6.32. However, CFS introduced the GENTLE_FAIR_SLEEPERS feature in Linux2.6.32. As mentioned in section 2.1, this patch is said to greatly improve interactivity. Unfortunately, it seems that no one has conducted comparative tests on CFS and BFS since then. Therefore, Linux has entered the 2.6.35 era, and we cannot come to the conclusion that BFS and CFS are superior or inferior.
On the other hand, although the professional evaluation does not show the obvious advantages of BFS, most users think that BFS can significantly improve the interactive application experience from the information collected on the Internet, this is a personal experience, such as whether the mouse moves smoothly or not. In this type of experience, the difference between the two schedulers is quite large, which cannot be described using the previous test data.
Therefore, I believe that people do not understand the real cause of the impact on interaction, and the data concerned by professional testing cannot accurately describe subjective feelings such as "smoothness. Therefore, for BFS, we may wish to believe it once.
So what improvements have BFS made? If these improvements are so effective, Why Are mainstream kernels reluctant to accept BFS?
BFS vs CFS, different designs
Con Kolivas worked as an anesthesiologist in the hospital during the day to relieve people's pain. In his spare time, he used Linux to relieve his pain. Well, Kolivas does not learn Linux to solve the pain. I just want to speculate. But according to Kolivas, he did not even learn the C language when he came into contact with the Linux kernel... This fact proves that language is only a tool, and a deep understanding of the nature of the problem is the key to programming. There may be persistence. The battle between CFS and RSDL led Kolivas to leave the Linux community. After a year, when Kolivas began to read the kernel code again, he immediately found that CFS had the following design problems:
The goal of CFS is to support all application scenarios from the desktop to the high-end server. This large and comprehensive design leads to some implementation compromise. In addition, features that are only needed on high-end machines will introduce unnecessary complex code.
Secondly, to maintain the fairness of multiple CPUs, CFS adopts the load balancing mechanism. Kolivas believes that these complex codes offset the benefits of per CPU queue.
Finally, the mainstream kernel CFS still has some preferences for sleep processes, which means "unfair ".
Different design objectives
In reality, the scheduling algorithm is similar to an awkward housewife. Meeting the children's requirements for dinner may hurt the appetite of the elderly. The Linux kernel has been trying to make a dish that family members and elders like. In this regard, CFS has done well. However, a dish that can be accepted by everyone may be a little dull. BFS is only intended to satisfy one kind of taste, so that the taste can reach its limit.
According to Magazine of Linux, Con Kolivas started to think about BFS when he saw the cartoon from xkcd below.
Figure 5. xkcd cartoon of the Linux Scheduler
It originated from some Linux users who found that although Linux claimed to be able to make full use of the computing power of 4096 CPU Systems, it could not play Youtube videos smoothly on ordinary laptop.
This made people start to think about whether the complex features of CFS have significance for the Desktop environment? Is it necessary for people to use a scheduler that supports 4096 CPUs in their personal computers?
BFS is a natural response to this challenge. It is not intended to support 4096 CPUs. BFS is intended for desktop computers used by ordinary people. In addition, BFS also deletes features that are only required on the server. For example, BFS abandons the group scheduling feature of CFS, and features like CGROUP are redundant technologies for Common Desktop Users.
This is easy to understand: in a system with only one CPU, who will design multiple cgroups? Where can I use NUMA domain or other concepts?
In addition, BFS uses a single run queue and does not require complex load balancing mechanisms. Since the concept of CGROUP is no longer available, Server Load balancer between groups is no longer required.
These simple pruning greatly simplifies BFS code. The simplified code means that the number of commands required for executing a scheduling operation is reduced, and the corresponding footprint is naturally reduced.
Of course, code simplification is only an obvious aspect. More importantly, the difference in this concept will have a more profound impact on the implementation of the final scheduler, which is hard to address.
Multi-queue vs. single-queue
When the Linux kernel enters 2.6, the scheduler uses per cpu run queue to overcome the limitation of a single run queue. In a multi-CPU system, a single run queue means that the run queue has become the bottleneck of the system, because at the same time, when a CPU accesses the run queue, other CPUs must wait even if they are idle. After the run queue of per CPU is used, each CPU does not need to use a large lock to process scheduling in parallel.
However, many things are not as simple as they seem at first glance.
Kolivas found that the benefits of using per cpu run queue will be offset by the load balance code pursuing fairness. In the current CFS scheduler, each CPU only maintains the fairness of all processes in the local run queue. To achieve cross-CPU scheduling fairness, CFS must regularly load balance, remove some processes from the run queue of the busy CPU to other idle run queue.
This load balance process requires other run queue locks, which reduces the concurrency brought about by multi-run queues.
In addition, in complex cases, footprint introduced due to load balance will be very impressive.
Of course, the locking operation introduced by load balance is still lower than the cost of global locks. This difference is more significant as the number of CPUs increases. However, please note that BFS is not intended to work for systems with 1024 CPUs. If the number of CPUs in the system is limited, the advantage of multiple run queue is not obvious.
After BFS uses a single queue, every new process that needs to be scheduled can find the most appropriate CPU in the global scope, without waiting for the load balance code like CFS to decide, this reduces the latency between multiple CPUs, and the final result is a smaller scheduling latency.