First, preface
This article mainly describes the concept of process prioritization. From the user space, the process priority is nice value and scheduling priorities, corresponding to the kernel, there are static priorities, realtime priority, normalized priority and dynamic priority concepts, we hope to be in the second chapter to describe these related concepts clearly. In order to deepen the understanding, in the third chapter we give the analysis of several typical data flow processes.
Second, overview
1. Blueprint
2. View of user space
In user space, there are two meanings of process precedence: Nice value and scheduling priority. For a normal process, the process priority is nice value, from 20 (highest priority) ~19 (lowest priority), which can be changed by modifying nice value to get the ratio of CPU resources to the normal process. With the advent of real-time requirements, the process is given another attribute scheduling priority, which is called a real-time process. The priority of real-time processes can range through sched_get_priority_min and Sched_get_priority_max, and for Linux, real-time process scheduling The priority range is 1 (lowest priority) ~99 (highest precedence). Of course, the normal process also has scheduling priority, which is set to 0.
3, the implementation of the kernel
In the kernel, there are several members in the task struct with a process priority, as follows:
struct Task_struct {
......
int Prio, Static_prio, Normal_prio;
unsigned int rt_priority;
......
unsigned int policy;
......
}
The policy member records the scheduling policies for the thread, while the other members represent the various types of precedence, as described in the following subsections.
4. Static priority
The Static_prio member in the task struct. We call it the static priority, which features the following:
(1) The smaller the value, the higher the process priority
(2) 0–99 for real-time processes (no practical meaning), 100–139 for normal processes
(3) The default value is 120
(4) The user space can be modified by NICE () or setpriority. This value can be obtained through getpriority.
(5) The newly created process inherits the static priority of the parent process.
The static priority is the starting point for all related priority calculations, either inherited from the parent process or the user space is self-setting. Once the static priority has been modified, both the normal priority and the dynamic precedence need to be recalculated.
5, real-time priority
The rt_priority member in the task struct represents the real-time priority of the thread, that is, scheduling priorities from the perspective of the user space. 0 is a normal process, 1~99 is a real-time process, and 99 has the highest priority.
6, normalization of priority
The Normal_prio member in the task struct. We call it normalized priority (normalized priorities), which is calculated based on static priority, scheduling priorities, and scheduling policies, as follows:
static inline int Normal_prio (struct task_struct *p)
{
int prio;
if (Task_has_dl_policy (p))
Prio = max_dl_prio-1;
else if (Task_has_rt_policy (p))
Prio = max_rt_prio-1-p->rt_priority;
Else
Prio = __normal_prio (p);
return prio;
}
Here, let's talk about normalization (normalization), a somewhat obscure term. If you have done audio and video fixed-point algorithm optimization, should be not unfamiliar with the word. Different fixed-point data have different representations, there are Q31, there are Q15, the location of the decimal point of the data is different, can not be compared, add and subtract operations, so need to be normalized, all converted to a specific data format (in fact, the location of the decimal point). In mathematics, 1 meters and 1mm in the operation of the time also need to be normalized, all converted to the same dimension is OK. For the priority here, the scheduler needs to consider a variety of factors, such as scheduling strategy, nice value, scheduling priorities, and so on, the factor all into account, normalized to a number on the axis, in order to express its priority, This is normalized priority. For a thread, the smaller the number of its normalized priority, the greater its precedence.
The scheduling policy is that the deadline process is higher than the RT process and the normal process, so its normalization priority is negative:-1. If a real-time scheduling policy is used, then the thread's normalized priority and rt_priority are related. The rt_priority member in the task struct is the real-time priority of the user-space perspective (scheduling priorities), Max_rt_prio-1 is 99,max_rt_prio-1-P->rt_ Priority flips the real-time process's scheduling priorities, with the highest precedence being 0 and a minimum of 98. By the way, normalized priority is 99 of the situation is meaningless. For normal processes, normalized priority is its static precedence.
7. Dynamic Priority
The Prio member in the task struct represents the dynamic priority of the thread, which is the priority that the scheduler uses when scheduling. The dynamic priority can be modified at run time, for example, when dealing with priority rollover issues, the system may temporarily raise the priority of a normal process. The code that typically sets the dynamic priority is this: P->prio = Effective_prio (p), the code that calculates the dynamic priority is as follows:
static int Effective_prio (struct task_struct *p)
{
P->normal_prio = Normal_prio (p);
if (!rt_prio (P->prio))
Return p->normal_prio;
Return p->prio;
}
Rt_prio is a function that determines whether a real-time process is based on the current priority, including two cases in which the process is a real-time process and the scheduling policy is Sched_fifo or SCHED_RR. Another situation is artificially elevating the process to an area of RT priority (for example, when using the priority inheritance method to solve the problem of prioritizing rollover in the system). In both cases, we do not change its dynamic priority, that is, Effective_prio returns the current dynamic priority P->prio. In other cases, the dynamic priority of the process follows the normalized priority.
Three, the typical data flow analysis
1. User space Settings Nice value
User space setting the operation of Nice value is implemented in the kernel primarily by the Set_user_nice function, either Sys_nice or sys_setpriority, and the Set_user_nice function is called after the parameter check and the permission check. Complete the specific settings. The code is as follows:
Void Set_user_nice (struct task_struct *p, long Nice)
{
int old_prio, delta, queued;
unsigned long flags;
struct RQ *rq;
RQ = Task_rq_lock (P, &flags);
if (Task_has_dl_policy (P) | | task_has_rt_policy (p)) {-----------(1)
P->static_prio = Nice_to_prio (Nice);
Goto Out_unlock;
}
queued = task_on_rq_queued (p);-------------------(2)
if (queued)
dequeue_task (RQ, p, Dequeue_save);
P->static_prio = Nice_to_prio (Nice);----------------(3)
Set_load_weight (P);
Old_prio = p->prio;
P->prio = Effective_prio (p);
Delta = p->prio-old_prio;
if (queued) {
Enqueue_task (RQ, p, Enqueue_restore);------------(2)
if (Delta < 0 | | | (Delta > 0 && task_running (RQ, p))) ------------(4)
Resched_curr (RQ);
}
Out_unlock:
Task_rq_unlock (RQ, p, &flags);
}
(1) If it is a real-time process or a deadline type of process, then nice value is not really meaningful, but we still set its static priority, of course, such a setting does not actually play any role, and will not actually change the scheduler behavior, so directly back, There are no dequeue and enqueue movements.
(2) In step has already handled the scheduling strategy is the RT class and the deadline class process, therefore, executes here, only possibly is the ordinary process, uses the CFS algorithm. If the task is on the run queue (queued equals True), then since we have modified nice value, the scheduler needs to revisit the task in the current runqueue. Therefore, we need to remove the task from RQ and, after recalculating the priority, insert the runqueue corresponding runable task in the red-black tree again.
(3) The core code is P->static_prio = Nice_to_prio (Nice); this sentence, the others are side effect. For example, load weight. When the CPU is running for a moment, its load is 100%, and there is no chance of scheduling the idle process to rest. When there is no real-time process or deadline process in the system, all the runnable processes together to partition the CPU resources, in this different process to share a specific proportion of CPU resources, we call the load weight. The different nice value corresponds to a different CPU load weight, so when you change the value of friendly, you must also update the CPU load weight of the process through set_load_weight. In addition to the load weight, the dynamic priority of the thread also needs to be updated, which is done by P->prio = Effective_prio (p);
(4) The Delta records the difference in the dynamic priority of the old and new threads, and when the priority of the thread (Delta < 0) is debugged, it is possible to generate a dispatch point, so call Resched_curr to make a token for the currently running task, In order to be dispatched when the user space is returned. In addition, if you modify the dynamic priority of a task in the current running state, then the drop-down (Delta > 0) means that the process may need to yield the CPU, and therefore requires reschedule to resched_curr the task that marks the current running state.
2, the process default scheduling policy and scheduling parameters
Let's consider this question: What is the default scheduling policy for a thread before user space sets the scheduling policy and scheduling parameters? This needs to go back to the fork (specific code in the Sched_fork function), which is related to the Sched_reset_on_fork setting in the task struct. If this flag is not set, then in the fork, the child process follows the parent process scheduling policy, if the flag is set, then the child process scheduling policy and scheduling parameters can not inherit from the parent process, but need to be set to default. The code snippet is as follows:
int sched_fork (unsigned long clone_flags, struct task_struct *p)
{
...
P->prio = current->normal_prio;-------------------(1)
if ( Unlikely (P->sched_reset_on_fork)) {
if (Task_has_dl_policy (p) | | Task_has_rt_policy (P)) {----------(2)
p >policy = Sched_normal;
P->static_prio = Nice_to_prio (0);
p->rt_priority = 0;
} else if (Prio_to_nice (P->static_prio) < 0)
P->static_prio = Nice_to_prio (0);
P->prio = P->normal_prio = __normal_prio (p); ------------(3)
Set_load_weight (P);
p->sched_reset_on_fork = 0;
}
......
}
(1) Sched_fork is just a fragment of the fork process, and at the beginning of the fork, Dup_task_struct has copied a process descriptor (task struct) that is completely one to the parent process, so if there is no reset in step 2, Then the child process is the scheduling policy and scheduling parameters (various priority) following the parent process, of course, sometimes in order to solve the PI problem and temporarily raise the dynamic priority of the parent process, in the fork should not be passed to the child process, so here resets the dynamic priority.
(2) The default scheduling policy is Sched_normal, the static priority is equal to 120 (that is, nice value equals 0), RT first equals 0 (normal process). Regardless of the parent process, even the deadline process, its fork's subprocess needs to revert to the default parameters.
(3) Since the scheduling policy and the static priority have been modified, dynamic and normalized priorities need to be updated. Also, load weight needs to be updated. Once the sub-process has reverted to the default scheduling policy and priority, the SCHED_RESET_ON_FORK flag has completed its historical mission and can be clear.
OK, at this point, we understand the scheduling strategy and scheduling parameters in the process of processing, here is still to append a question: Why not all inherit the parent process scheduling policies and parameters? Why reset to default when fork? In Linux, we have resource limitations for each process. For example, for those real-time processes, if it continues to consume CPU resources without initiating a system call that can cause blocking, then we suspect that the realtime process has flown and locked the system. In this case, we have to intervene, so we introduce the resource limit of rlimit_rttime this per-process. However, if the user space of the realtime process through the fork can actually also bypass the rlimit_rttime this limit, thus wanton grab CPU resources. However, savvy kernel developers have already seen through this, and to prevent the real-time process from "leaking" into its sub-processes, sched_reset_on_fork this flag was raised.
3, User space set scheduling policy and scheduling parameters
Through the Sched_setparam interface function can modify the RT priority of the scheduling parameters, and through the Sched_setscheduler function will be stronger, not only can set RT priority, but also can set the scheduling strategy. And Sched_setattr is a synthesis of the interface, you can set a thread scheduling policy and the scheduling parameters under the schedule policy. Of course, for the kernel, these interfaces all use the __sched_setscheduler kernel function to complete the modification of the specified thread scheduling policy and scheduling parameters.
__sched_setscheduler is divided into two parts, first for security check and parameter check, followed by specific settings.
Let's look at the security check first. If the user space is free to modify the scheduling policy and scheduling priority, then the world is out of order, each process may want to put their own scheduling policies and priorities up, so as to obtain sufficient CPU resources. Therefore, the user space to set the scheduling policy and scheduling parameters to comply with certain rules: if there is no ability to cap_sys_nice, then basically the thread can be allowed to only the operation is degraded. For example, change from Sched_fifo to Sched_normal, or not modify scheduling policy, but rather lower the static precedence (nice value) or real-time precedence (scheduling priority). The exception here is the Sched_deadline setting, supposedly if the process itself scheduling strategy is sched_deadline, then the "priority" should be allowed to reduce the operation (here with priority is not so appropriate, actually is to reduce the run time, or increase period, This relaxes the acquisition of CPU resources, but the current 4.4.6 kernel does not allow (perhaps later versions of the kernel to allow). In addition, if the ability to Cap_sys_nice is not available, then the operation of setting the scheduling policy and scheduling parameters can only be restricted to threads belonging to the same logged-on user. If you have the ability to cap_sys_nice, then there are not so many restrictions, can be promoted from the normal process into a real-time process (modify policy), can also increase the static priority or real-time priority.
The specific modification is simple, is through the __setscheduler_params function completes, actually is is according to sched_attr in the parameter set to the task struct related member, everybody can read the code to understand by oneself.
Reference Documentation:
1, Linux under the various man page
2. Linux 4.4.6 Kernel source code
Linux Scheduler-Process priority