As we all know, the latest Linux kernel uses the scheduling mechanism of CFs. Many articles on the Internet have made a detailed analysis of the source code of CFS scheduling. However, most articles focus too much on details, therefore, we did not summarize the concept of CFS as a whole. For this reason, this article describes the basic principles of CFS scheduling and the entire execution process of fair scheduling.
Since CFS (completely fair Schedule) is named "fair", how does it implement completely fair scheduling? Now that we are talking about fairness, we should have a judgment standard. Before that, let's talk about several important concepts.
Sched entiy: The scheduling object, which can be understood as a process.
Virtual running time (vruntime): the running time of each scheduling object.
Fair Scheduling Queue (cfs_rq): The operation queue of a fair scheduling entity.
1. How is the weight value of each process determined?
The basis for fairness is discussed above. The basis for CFS fairness is the weight of each scheduling entity (weight), which is determined by the priority. That is, the higher the priority, the higher the weight, the Linux kernel uses a Conversion Relationship between Nice-prio-weight to determine the weight of each scheduling entity. Let's review that when a process is created, its priority is inherited from the parent process. If you want to change the priority, the Linux Kernel provides several system calls to change the nice value of the process, to change the weight, it is better to call the sys_nice () system. Let's take a look at the Conversion Relationship between them:
Among them, max_rt_prio = 100, nice value before-20 to 19, then the priority is between 100-139.
Let's look at the Conversion Relationship between Prio and weight. This is an empirical formula. Through the above analysis, we can modify weight by modifying nice, which answers the question of how weight of each scheduling entity is determined.
2. Based on these weight, How Does CFS reflect fairness?
CFS can implement several different fairness policies, which are differentiated based on different scheduling objects.
By default, a fair policy does not enable group scheduling, that is, the scheduling unit is each scheduling entity. Let's take a detailed look at how scheduling:
Assume that the system has three processes, A, B, and C. weight = 1, B. weight = 2, C. weight = 3. then we can calculate the total weight of the entire fair scheduling queue is cfs_rq.weight = 6. The natural idea is that fairness is the proportion of your weight to determine your importance, then, the importance of a is 1/6. Similarly, the importance of B and C is 2/6 and 3/6, respectively. obviously, the most important thing for C is to be scheduled first, and the resources used should be the most. Assume that the total time for A, B, and C to run once is six time units, A accounts for 1 unit, B accounts for 2 units, and C accounts for 3 units. This is the fairness strategy of CFS.
The Linux kernel uses the formula:
L
Ideal_time: the time that each process should run.
Sum_runtime: The time at which all tasks in the running queue are run once.
Se. Weight: Weight of the current process
CFS. Weight: Total weight of the entire cfs_rq
Here se. Weight and CFS. weight can be calculated based on the above explanation. How is sum_runtime calculated? In the Linux kernel, this is an experience value, and its empirical formula is:
(1) sum_runtime = sysctl_sched_min_granularity * nr_running (IF process count> 5)
(2) sum_runtime = sysctl_sched_latency = 20 MS (IF process count <= 5)
Note: sysctl_sched_min_granularity = 4 ms
In Linux kernel code, a variable named vruntime is used to implement the above principle:Each process has a vruntime. When scheduling is required, run the process with the minimum vruntime in the running queue. vruntime is maintained in the clock interruption, the vruntime of the current process needs to be updated for each clock interruption, that is, the vruntime increases with the following formula:(1) vruntime + = delta * nice_0_load/se. weight; (if curr. Nice! = Nice_0_load)(2) vruntime + = delta; (if curr. Nice = nice_0_load)After each vruntime update, a check will be conducted. Do you want to set the scheduling _need_sched, indicating whether to be preemptible or automatically discard the CPU, in fact, when there is no process migration between the wake-up and CPU, only the current process voluntarily gives up the CPU, that is, each process will run its own ideal_time.
That is, set the preemptible bits here.
Based on the above analysis, we have basically analyzed the General Scheduling Principle of processes when no group scheduling is enabled. The wake-up and process apologies are not taken into account here, this document will be detailed later.
At this point, we may have several questions:
1. Here we only set the tif_need_sched bit. Who will check this preemption bit to implement process switching?
This is also done in the clock interrupt. When the clock interrupt is to be returned, it will call the schedule () function. This function will check whether tif_need_sched is set, to determine whether to perform a real process switch.
2 or a B c, if you do not consider wakeup and Process Migration, the ideal running time of A is three time units, because only in
If (delta_exec> ideal_runtime)
Resched_task (rq_of (cfs_rq)-> curr );
If the scheduling bit is set at this time, will a be finished after this period of time? Let's take a closer look at this formula:
Vruntime + = delta * nice_0_load/se. weight; (if curr. Nice! = Nice_0_load)
Nice_0_load is a set value and the default process weight.
Se, weight is the weight of the current process
Delta indicates the time when the current process runs.
We can come up with the following link:
Vruntime is proportional to Delta, that is, the longer the current running time, the faster the vruntime increases.
Vruntime is inversely proportional to se. Weight, that is, the greater the weight, the slower the vunruntime growth.
Now let's consider an extreme situation: no wake-up, no process migration, and all three processes A and C are running for the first time.
Then the system will randomly choose one from a B c for running, because their vruntime is equal for the first time. Assuming that B is selected for running, B will find its running time in a clock interruption after 2 time units> its ideal running time (runtime> ideal_time ), then the tif_need_sched bit will be set for process switching. Assuming that C is selected for the second time, a runs a slightly greater than three time units, finally, a runs in a time unit slightly greater than 1. In this case, we will ask whether a B c has finished running after this operation? (Because our ideal time is calculated based on experience values.) If not, what is the next round of running? My understanding is experience. We can only know it when it is running. We can only feel that it is correct. I hope you can leave a message to me !!!
(How can I propose a quantitative evaluation method)
3. Let's take another extreme example. Assume there are two users a and B. Note that the user is used here. User A has one process A and. weight = 1; user B also has 1 process B and B. weight = 1000. According to the above fairness theory, we can find that B users may occupy the CPU all the time, and it will be worse if there are more users. To solve this problem, CFS introduces group scheduling, that is, the scheduling object is not limited to the scheduling entity, but can be a user-based scheduling unit, that is, a and B-bit scheduling units, a B each occupies 50% of the CPU. In addition, as long as the process in a group is scheduled, other processes will also be scheduled, but the CPU usage is related to the user.
Ideal_time = sum_runtime * se. Weight/cfs_rq.weight