A scheduling algorithm called CFS (completely-fair-scheduler) used in Linux kernel. The description found on the internet is not intuitive and difficult to read. But found a very easy-to-understand (Avenue to Jane Ah ...) ):
Http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt
To prevent links from invalidating, paste the full text as follows:
This is the CFS scheduler.80% of CFS's design can be summed-in a single sentence:cfs basicallymodels an "ideal, precis E multi-tasking CPU "on real hardware." Ideal multi-tasking CPU "is a (non-existent:-)) CPUs that have 100%physical power and which can run each task at precise Equal speed, Inparallel, all at 1/nr_running speed. For Example:if there is 2 tasksrunning then it runs each at 50% physical power-totally in parallel. On real hardware, we can run only a single task at once, so while Thatone task runs, the other tasks, is waiting for The CPU is at Adisadvantage-the current task gets an unfair amount of CPU time. Incfs This fairness imbalance is expressed and tracked via the Per-taskp->wait_runtime (nanosec-unit) value. "Wait_runtime" is the amount oftime the task should now run in the CPU for it to become completely fairand balanced. (Small Detail:on ' ideal ' hardware, the p->wait_runtime value would always being zero-no task would ever get ' out of B Alance' From the ' ideal ' share of CPU time. CFS ' s task picking logic is based on this p->wait_runtime value and itis thus very simple:it always tries to run the Task with the Largestp->wait_runtime value. In other words, CFS tries to run the task withthe ' gravest need ' for more CPU time. So CFS all tries to split upcpu time between runnable tasks as close to ' ideal multitaskinghardware ' as possible. Most of the rest of CFS's design just falls out of this really simpleconcept, with a few add-on embellishments like nice l Evels,multiprocessing and various algorithm variants to recognize sleepers. In practice it works like this:the system runs a task a bit, and whenthe task schedules (or a scheduler tick happens) the Task ' s CPU usage is ' accounted for ': the (small) time it just spent using the physical Cpuis deducted (minus) from P->wait_r Untime. [Minus the ' fair share ' it would havegotten anyway]. Once P->wait_runtime gets low enough so, Anothertask becomes the ' leftmost task ' of The time-ordered Rbtree it maintains (plus a small amount of ' granularity ' distance relative to the leftmosttask so that W e do not over-schedule tasks and trash the cache) then thenew leftmost task was picked and the current task is preempted. The Rq->fair_clock value tracks the ' CPU time a runnable task would havefairly gotten, had it been runnable during that Time '. So by Usingrq->fair_clock values we can accurately timestamp and measure the ' expected CPU time ' a task should has Gott En. All runnable tasks aresorted under the Rbtree by the "Rq->fair_clock-p->wait_runtime" key, Andcfs picks the ' leftmost ' Task and sticks to it. As the system progressesforwards, newly woken tasks is put into the tree more and more to theright-slowly but surely GI Ving Tsun a chance for every task to become the ' leftmost task ' and thus get on the CPU within a deterministic amount oftime. Some Implementation Details:-The introduction of scheduling Classes:an extensible hierarchy of Scheduler ModuLes. These modules encapsulate scheduling policy details and is handled by the scheduler core without the core code Assumi Ng about them too much. -SCHED_FAIR.C implements the ' CFS Desktop Scheduler ': It's a replacement for the vanilla Scheduler ' s sched_other Inter Activity code. I ' d like to give credits to Con Kolivas for the general approach Here:he have proven via rsdl/sd that ' fair scheduling ' I s possible and that it results in better desktop scheduling. Kudos con! The CFS patch uses a completely different approach and implementation from RSDL/SD. My goal was to make CFS's interactivity quality exceed that's rsdl/sd, which is a high standard to meet:-) Testing feedback is welcome to decide this one or another. All of SD's logic could be added via a KERNEL/SCHED_SD.C module as well, if Con was interested in s Uch an approach. ] CFS ' s design is quite radical:it does does use runqueues, it uses a time-ordered rbtree to build a ' TimEline ' of the future task execution, and thus have no ' array switch ' artifacts (by which both the vanilla scheduler and RSD L/SD is affected). CFS uses nanosecond granularity accounting and does not rely on any jiffies or other HZ detail. Thus the CFS Scheduler have no notion of ' timeslices ' and has no heuristics whatsoever. There is only one central tunable:/proc/sys/kernel/sched_granularity_ns which can be used to tune the Schedul Er from ' Desktop ' ("Low latencies") to ' server ' (good batching) workloads. It defaults to a setting suitable for desktop workloads. Sched_batch is handled by the CFS Scheduler module too. Due to its design, the CFS scheduler are not prone to any of the ' attacks ' that exist today against the heuristics of the Stock Scheduler:fiftyp.c, THUD.C, chew.c, ring-test.c, massive_intr.c all work fine and does not impact interactivity and produce the expected behavior. The CFS scheduler have a much stronger handling of nice levels and SCHed_batch:both types of workloads should be isolated much more agressively than under the vanilla scheduler. (another detail:due to nanosec accounting and timeline sorting, Sched_yield () are very simple under CFS, and In fact under CFS Sched_yield () behaves much better than under any other scheduler I had tested so far. )-Sched_rt.c implements Sched_fifo and SCHED_RR semantics, in a simpler it than the vanilla scheduler does. It uses runqueues (for all the RT priority levels, instead of the vanilla scheduler) and it needs no expire D Array. -reworked/sanitized SMP load-balancing:the runqueue-walking assumptions is gone from the load-balancing code now, and Iterators of the scheduling modules are used. The balancing code got quite a bit simpler as a result.
Have time to read the book "Linux Kernel Development" and "Understanding Linux KERNRL". There's a talk inside. There is also a code example.
Of course, Linux schedule algorithm certainly more than this one. Continuously updated.
:)
Linux:the Schedule algorithm in Linux kernel