Why do you have a process priority?
This does not seem to explain too much, after all, since the birth of a multitasking operating system, the ability of the process to execute CPU is a matter that must be artificially controlled. Because some processes are relatively important, some processes are less important.
The way the process priority works has been largely unchanged since the invention, whether it is a single CPU era or a multi-core CPU era, which is achieved by controlling the length of CPU time that the process consumes.
This means that in the same scheduling cycle, high-priority processes take longer, while low-priority processes take up less time.
Please do not confuse the two concepts in the system: Nice (NI) and Priority (PR), they are inextricably linked, but for the current Linux system, they are not the same concept.
Let's look at this command:
Do you really understand what the difference is between the pri column and the specific meaning of the ni column ?
Similarly, if the top command is:
Do you understand the difference between the PR value and ni value ? If not, then we can figure out what a nice value is first.
What is the nice value?
Nice value should be familiar with the concept of Linux/unix, it is a process to reflect the "priority" status of the value of the value range is 20 to 19, a total of 40 levels.
The smaller the value, the higher the process priority, and the higher the value, the lower the priority.
For example, we can use the Nice command to set a nice value setting for a bash command that will be executed by:
- [Email protected] zorro]# nice-n bash
So I open a bash again, and its nice value is set to 10, and by default, the priority of the process should be inherited from the parent process, which is typically 0.
We can view the nice values of the current shell directly with the nice command:
- [Email protected] zorro]# Nice
- 10
Compare the normal situation:
- [[Email protected] zorro]# exit
To exit bash with the current nice value of 10 and open a normal bash, we'll look at its nice value:
- [Email protected] zorro]# bash
- [Email protected] zorro]# Nice
- 0
In addition, using the Renice command can be a running process to make nice value adjustment, we can also use such as top, PS and other commands to see the process of the nice value, the specific method I will not say, you can refer to the relevant man page.
It is important to note that I am here to use the name of the nice value rather than the priority.
The nice value is not a priority, but it does affect the prioritization of the process.
In English, if we describe a person nice, that generally means that the person's popularity is better. What kind of person is good at people? tend to be humble, polite people.
For example, you go to lunch with a nice person, ordered two of the same meal, first on a copy, the nice person will generally say: "You eat First!" "That's the popularity of good, this man nice! But if the other one is late, the nice man will be hungry.
What does that mean?
The more nice people have the ability to seize resources, and the less nice people are more capable of seizing. This is the meaning of the Nice value size, the lower the Nice value, the less the process, the more the ability to preempt the CPU, the higher the priority (the author of this explanation is too image, small series can not help but to manually praise!).
On Linux that was originally scheduled with O1, we also called the nice value static priority , which is basically the nice value, which is the same as when the good value is set, unless we use renice to change it.
The value of priority is changed in the previous kernel's O1 scheduler, so it is also called dynamic precedence .
What are priority and real-time processes?
Let's look at what the priority value is, which is the PRI value seen in the PS command or the PR value seen in the top command.
In order to differentiate these concepts, later:
- Unify the nice value to represent the NI value, or the static priority, which is the priority that is adjusted with the nice and Renice commands;
- The utility priority value represents the PRI and PR values, or is called the dynamic precedence.
- We have also unified the concept of the term "priority" as the meaning of the first value.
In the kernel, the value range of the process priority is defined by a macro whose name is Max_prio, which has a value of 140.
This value is also composed of two additional values, one is the Nice_width macro representing the value range of the nice value, and the other is the Max_rt_prio macro that represents the priority range of the real-time process (realtime).
To be blunt, Linux actually achieves 140 priority ranges, with values ranging from 0-139, and the smaller the value, the higher the priority level. The nice value of 20 to 19, mapped to the actual priority range is 100-139.
The default priority of the newly generated process is defined as:
- #define Default_prio (Max_rt_prio + NICE_WIDTH/2)
It actually corresponds to 0 of the nice value.
Under normal circumstances, the priority of any one process is this value, even if we adjust the priority of the process through the nice and Renice commands, its value range will not exceed 100-139 of the range, unless the process is a real-time process, Then its priority value becomes one of the 0-99 ranges.
This implies a message that the current Linux is an operating system that already supports real-time processes.
What is a real-time operating system?
We are no longer here to explain in detail its meaning and application in the industrial field, interested can refer to the real-time operating system of Wikipedia.
Simply put, the real-time operating system needs to ensure that the relevant real-time processes respond in a short time, with no longer latency, and require minimal interrupt latency and process switching latency.
For such a requirement, the general process scheduling algorithm, whether O1 or CFS are not satisfied, so the kernel at the time of design, the real-time process to map 100 priority, these priorities are higher than normal process priority (nice value), and real-time process scheduling algorithm is different, They use a simpler scheduling algorithm to reduce scheduling overhead.
In general, the processes running in a Linux system can be divided into two categories:
- Real-time processes
- Non-real-time processes
The main difference is that they are differentiated by priority.
All priority values in the 0-99 range, are real-time processes, so this priority range can also be called real-time process priority, and 100-139 in the range of non-real-time processes.
In the system, you can use the Chrt command to view, set the real-time priority state of a process. We can first look at the use of the CHRT command:
Let's focus on the displayed policy Options section and discover that the system provides 5 scheduling strategies for various processes.
However, it is not stated that these five scheduling strategies are used for two processes, for real-time processes can be used in the scheduling strategy is: Sched_fifo, SCHED_RR, and for the non-real-time process is: Sched_other, Sched_other, Sched_ IDLE.
The overall priority policy of the system is:
- If there are real-time processes in the system that need to be executed, real-time processes are performed first.
- The execution of a non-real-time process is not scheduled until the real-time process exits or the CPU is actively surrendered.
A real-time process can specify a priority range of 1-99, and a program to execute in real-time is performed in the following way:
- [Email protected] zorro]# Chrt bash
- [Email protected] zorro]# chrt-p $$
- PID 14840 ' s current scheduling POLICY:SCHED_RR
- PID 14840 ' s current scheduling priority:10
As you can see, the newly opened bash is already a real-time process, with a default scheduling policy of SCHED_RR and a priority of 10. If you want to modify the scheduling policy, add a parameter:
- [Email protected] zorro]# chrt-f bash
- [Email protected] zorro]# chrt-p $$
- PID 14843 ' s current scheduling Policy:sched_fifo
- PID 14843 ' s current scheduling priority:10
Just now, SCHED_RR and Sched_fifo are real-time scheduling policies that can only be set for real-time processes. For all real-time processes, a process with a high priority (that is, the small priority number) is guaranteed to be executed before the lower-order process.
The scheduling strategy of SCHED_RR and Sched_fifo only happens when the priority of two real-time processes is the same, and the difference is also the name:
Sched_fifo
In the first-out queue mode to dispatch, in the same priority, who first execute the first to dispatch who, unless it exits or actively release the CPU.
Sched_rr
Multiple processes of the same priority are processed in a time-slice rotation. The time slice length is 100ms.
This is the description of the priority of Linux for real-time processes and the associated scheduling algorithm. The overall is very simple, also very practical.
And the more troublesome is the non-real-time process, which is the main classification of processes on Linux. For non-real-time process priority processing, we would first like to introduce their related scheduling algorithms: O1 and CFS.
What is O1 dispatch?
The O1 scheduling algorithm was introduced in Linux 2.6, and the kernel replaced the scheduling algorithm with CFS after the Linux 2.6.23.
Although the O1 algorithm is not the current kernel of the default use of the scheduling algorithm, but because a large number of online servers may use the Linux version or the old version, so I believe that many servers are still using the O1 scheduler, then a little bit of a simple explanation of the scheduler is also meaningful.
The name of this scheduler is called O1, mainly because the time complexity of the algorithm is O1.
The O1 Scheduler is still designed according to the idea of the classic time slice allocation.
In short, the idea of a time slice is to divide the CPU execution time into a small segment, if it is a 5ms paragraph. Therefore, if multiple processes are to be executed "at the same time", it is actually that each process takes up 5ms of CPU time in turn, and from the 1s time scale, these processes are executed "simultaneously".
Of course, for multicore systems, it's all about doing the same for every core. And in this case, how to support the priority level?
In fact, the time slices are allocated to a number of different sizes, high-priority processes use large time slices, small priority processes use small time slices . In this way, a process with a high priority will take up more time and therefore receive special treatment after a period of speed.
Another special thing about the O1 algorithm is that even a process with the same nice value will be divided into two types depending on its CPU usage: CPU consumption and IO consumption.
The typical CPU-consuming process is characterized by the need to always occupy the CPU and the time slices given to it will always be exhausted before the program can dispatch.
For example, a variety of common arithmetic operation procedures.
The characteristic of Io consumption is that it usually releases the CPU on its own initiative before the time slice is exhausted.
For example, an editor like Vi,emacs is a typical IO-consuming process.
Why is it so differentiated? Because IO-consuming processes are often processes that interact with people, such as shells, editors, and so on.
When there is such a process in the system, and the CPU-consuming process exists, and its nice value is the same, assuming that their time slice length is the same, is 500ms, then the person's operation may be due to CPU consumption of the process has been CPU-intensive.
As you can imagine, when bash is waiting for someone to enter, it does not account for the CPU, at which point the CPU consumes the program will always calculate, assuming each time is divided into 500ms times, when the person in bash typing a character, then bash is likely to wait a hundreds of MS to give a response, Because the time slices of other processes are probably not exhausted when the person is typing characters, the system does not dispatch bash to be processed.
To improve the responsiveness of IO-consuming processes, the system distinguishes between these two types of processes, and the process of dynamically adjusting CPU consumption decreases its priority, while the IO consumption type increases its priority to reduce the actual length of time slices of the CPU consumption process.
The range of known nice values is-20-19, the range of the corresponding priority values is 100-139, and for a process with a default nice value of 0, its initial priority value should be 120, and as it continues executing, the kernel observes the CPU consumption status of the process. and adjust the priority value dynamically, the adjustable range is +-5.
That is, the highest priority can be automatically adjusted to 115, to a minimum of 125. This is why the nice value is called the static priority, and the priority value is called the reason for the dynamic precedence. However, this dynamic adjustment function is not required after the scheduler has been replaced with CFS, because the CFS has switched to another CPU time allocation, which we'll talk about later.
What is CFS completely fair dispatch?
O1 is already the last generation of the scheduler, because its support for multicore, multi-CPU system performance is not good, and the kernel function to join Cgroup, etc., Linux after 2.6.23 began to enable CFS as a general priority (Sched_other) process scheduling method.
In this redesigned scheduler, the concept of CPU consumption is no longer important in terms of time slices, dynamic, static priority, and IO consumption. CFS uses a new way to support the above functions.
The basic idea of its design is that we want to implement a scheduler that is completely fair to all processes.
Again the old question: How to be completely fair? The answer is similar to the CFQ in the previous IO schedule:
If there are currently n processes that need to be scheduled to execute, then the scheduler should be in a relatively small time frame, the n processes are all scheduled to execute again, and they split the CPU time, so that all processes can be fairly scheduled.
Then this relatively small time is any one R state process is scheduled to the maximum delay time, that is: any one R state process, will certainly be in this time range is dispatched response. This time can also be called the dispatch period, its English name is called: Sched_latency_ns.
Priority of CFS
Of course, there is also a need for support priorities in CFS. In the new system, the priority is determined by the speed of time consumption (vruntime growth).
That is, for CFS, the absolute value of the measured time accumulation is the same as recorded in Vruntime, but the rate at which different priority processes grow is different, the high-priority process grows slowly, and the low-priority time grows faster.
For example, a process with a priority of 19, which actually consumes 1 seconds of cpu, records 1s in vruntime. But if it is a 20 priority process, then it is likely to actually account for the CPU with 10s, in vruntime will record 1s.
The ratio of CPU consumption time of the different nice values of the CFS is set in the kernel according to the principle of "each difference CPU consumes 10% or so."
The approximate meaning here is that if there are two nice values of 0 of the process to occupy the CPU, then they should be 50% of the CPU per person, if the nice value of one of the processes to adjust to 1, then you should ensure that the priority of the process is lower than the cost of more than 10% CPU, is the nice value of 0 of the 55%,nice value is 1 of 45%. The percentage of CPU time they consume is 55:45.
The ratio of this value is approximately 1.25. That is, the percentage of CPU time between adjacent two nice values should be approximately 1.25. Based on this principle, the kernel does a time-scale correspondence between the 40 nice values, which exist in the kernel as an array:
What is the multi-CPU CFS scheduler?
In the above narrative, we can assume that there is only one CPU in the system, then the related dispatch queue has only one.
The reality is that the system is multi-core or even multiple CPUs, CFS from the outset to consider this situation, it maintains a scheduling queue for each CPU core, so that each CPU on its own queue process scheduling.
This is the fundamental reason why CFS is more efficient than the O1 scheduling algorithm: One queue per CPU avoids the use of large kernel locks on global queues, which improves parallel efficiency.
Of course, the most direct effect of this is that the load between the CPUs may be uneven, in order to maintain the load balance between the CPU, the CFS to periodically load balance operations on all CPUs, so it is possible for the process to switch on different CPU scheduling queue behavior.
This process also requires locking the associated CPU queue, which reduces the parallelism of multiple run queues.
In general, however, the parallel queue of CFS is more efficient than the O1 global queue approach. Especially in the case of more and more CPU cores, the efficiency decrease of global lock increases significantly.
At last
The purpose of this article is to start from the priority of Linux system process, through understanding the relevant knowledge points, I hope you have a whole understanding of the process scheduling of the system.
In this paper, we also analyze the CFS scheduling algorithm in more depth. In my experience, this knowledge is very useful to us when observing the state of the system and the related optimizations.
For example, what do ni and PR values mean when using the top command? Similar to the NI and PRI values in the PS command, the difference between the Ulimit command-E and-R parameters, and so on. Of course, I hope that after reading this article, we can let you know more about the command display.
In addition, we will find that although the PR value in the top command has the same meaning as the PRI value in the Ps-l command, they show a different value in the same priority.
Linux process priority NI and PR