The kernel on the micro, the CPU running time into a number of points, and then arranged to each process rotation, resulting in the macro of all processes as if at the same time executing. Dual-core CPU, in fact, up to only two processes at the same time, everyone in the top, Vmstat command to see the running process, not really in possession of the CPU ha.
Therefore, some well-designed high-performance processes, such as nginx, are actually a few CPUs, with several working processes, the reason is here. For example, your server has 8 CPUs, then the Nginx worker should be only 8, when you more than 8, the kernel may put more than one nginx worker process to 1 runqueue, what happens? Is on this CPU, will be more evenly distributed time to these nginx workers, each worker process after running a time slice, the kernel needs to do a process switch, the running process context is saved. Assuming that the kernel allocates the time slice is 100ms, does the process switch time is 5ms, then the process performance drops is very obvious, is related with the worker which you configures, the more drops the more the more severe.
Of course, this is related to the design of Nginx. Nginx is an event-driven all-asynchronous process, the design of its own almost no blocking and interruption, nginx designers hope that each nginx worker can monopolize the CPU almost all the time slice, this is the number of Nginx worker configuration basis.
Of course, the actual running process, most of the time is not nginx this want to monopolize the whole CPU of the process, many processes, such as VI, it is in a lot of time is waiting for user input, then VI waiting for the IO interrupt, is not occupy the time slice, the kernel faces a variety of processes, It takes a skill to allocate CPU time slices.
The kernel allocates the time slice to have the strategy and the tendency. In other words, the kernel is eccentric, it likes the IO consumption process, because this kind of process if not timely response, users will be very uncomfortable, so it always subconsciously allocate more CPU time to this kind of process. CPU consumption process kernel is not very concerned about. Does that make sense? Too, CPU consumption slow a bit of user perception, electrical signals and biological signals operating speed gap is huge. Although the kernel allocates as many time slices as possible to the IO-consuming process, the IO consumption process often sleeps, giving it no time slices at all. Is that reasonable?
So how does the kernel implement this eccentricity? This is achieved by dynamically adjusting the priority of the process and assigning different lengths of CPU time. First say how the kernel determines the length of the time slice.
For each process, there is an integral type Static_prio that represents the static priority set by the user, which corresponds to the nice value in the kernel. Look at the Static_prio members in the process description structure.
[CPP]View PlainCopy
- struct Task_struct {
- int Prio, Static_prio;
- ......}
What is the nice value? In fact, the priority is another representation of the user process, and the value range for Nice is 20 to +19,-20 with the highest priority and +19 minimum. As I said in the previous article, there are 140 kernel priorities, and how does the nice priority that a user can set correspond to these 140 priorities? Look at the code:
[CPP]View PlainCopy
- #define Max_user_rt_prio 100
- #define Max_rt_prio Max_user_rt_prio
- #define Max_prio (Max_rt_prio + 40)
As you can see, Max_prio is 140, which is the kernel-defined maximum priority.
[CPP]View PlainCopy
- #define User_prio (P) ((p)-max_rt_prio)
- #define Max_user_prio (User_prio (Max_prio))
And Max_user_prio is 40, meaning that the normal process specifies a priority level of up to 40, as we said earlier-20 to +19.
[CPP]View PlainCopy
- #define Nice_to_prio (Nice) (Max_rt_prio + (Nice) + 20)
The nice value is-20 is the highest, which corresponds to how much static_prio? Nice_to_prio (0) is 120,nice_to_prio (-20) is 100.
When the process has just been forked out by its parent process, the remaining time slices of its parent process are split evenly. After the time slice is executed, the time slice is redistributed according to its initial priority, the lowest priority is +19, the minimum time slice 5ms is assigned, the priority is 0 o'clock 100ms, and the priority is 20 when the maximum time slice is 800ms. Let's see how the kernel calculates the length of the time slices, so let's look at the task_timeslice time slice calculation function:
[CPP]View PlainCopy
- #define Scale_prio (x, PRIO) \
- Max (x * (Max_prio-prio)/(MAX_USER_PRIO/2), Min_timeslice)
- static unsigned int task_timeslice (task_t *p)
- {
- if (P->static_prio < Nice_to_prio (0))
- return Scale_prio (def_timeslice*4, P->static_prio);
- Else
- return Scale_prio (Def_timeslice, P->static_prio);
- }
Here are a bunch of macros, and we'll list the macros in turn to see their values:
[CPP]View PlainCopy
- # define HZ 1000
- #define Def_timeslice (hz/1000)
So, Def_timeslice is 100. Assuming that the nice value is-20, then Static_prio is 100, then Scale_prio (100*4, 100) equals 800, meaning the highest priority-20 case, can be divided into the time slice is 800ms, if the nice value is + 19, it can only be divided into the minimum time slice 5ms,nice value is the default of 0 can be divided into 100ms.
It seems that time slices are only related to nice values. In fact, the kernel will have a dynamic adjustment of 5 to +5 for the initial nice value. What is the basis of this dynamic adjustment? Quite simply, if the CPU is using a lot of processes, the nice value is higher, which is equivalent to the priority lower point. The CPU uses less process, think it is the interactive process, then the nice value is lowered point, that is, the priority of higher. The benefits are obvious because 1, the initial priority value of a process is not necessarily accurate, and the kernel adjusts its execution based on the real-time performance of the process. 2, the performance of the process is not consistent, such as a start is only listening to 80 ports, the process most of the time in sleep, time slices with less, so nice value dynamic reduction to improve priority. When a client accesses port 80, the process starts to use the CPU heavily, and the nice value increases dynamically to reduce the priority.
The idea is clear, the code implementation, the priority of dynamic compensation in the end based on what? Effective_prio returns the priority after the dynamic compensation, the comments are very detailed, we first look at the next.
[CPP]View PlainCopy
- /*
- * Effective_prio-return the priority that's based on the static
- * Priority not modified by bonuses/penalties.
- *
- * We Scale The actual sleep average [0 .... MAX_SLEEP_AVG]
- * Into the-5 ... 0... +5 bonus/penalty Range.
- *
- * We use 25% of the full 0...39 priority range so:
- *
- * 1) Nice +19 interactive tasks don't preempt nice 0 CPU hogs.
- * 2) nice-20 CPU hogs does not get preempted to Nice 0 tasks.
- *
- * Both Properties is important to certain workloads.
- */
- static int Effective_prio (task_t *p)
- {
- int bonus, prio;
- if (Rt_task (p))
- return p->prio;
- Bonus = Current_bonus (p)-MAX_BONUS/2;
- Prio = p->static_prio-bonus;
- if (Prio < Max_rt_prio)
- Prio = Max_rt_prio;
- if (Prio > Max_prio-1)
- Prio = max_prio-1;
- return prio;
- }
You can see that bonus will compensate for the initial priority level. How do you figure out the bonus?
[CPP]View PlainCopy
- #define Current_bonus (P) \
- (Ns_to_jiffies (P)->sleep_avg) * Max_bonus/\
- MAX_SLEEP_AVG)
As you can see, the process description character also has a sleep_avg, and the dynamic compensation is entirely based on its value. Sleep_avg is the key, which represents the time that the process sleeps and runs, and when the process goes from hibernation to runtime, Sleep_avg adds the time it takes to hibernate. At run time, each clock beat Sleep_avg is decremented until 0. So, the bigger the sleep_avg, the greater the dynamic priority compensation, and the Nice value-5 compensation when the MAX_SLEEP_AVG is reached.
The kernel is such a preference for interactive processes, which can be seen from the priority and time slice allocations above. In fact, the kernel also has methods to treat interactive processes. In the previous article said, Runqueue in the active and expired queue, the general process time slices with the last expired queue, while the interactive process of IO consumption, will go directly into the active queue, to ensure a high sensitivity response, what is known as a myriad of pampering in one.
Linux kernel scheduling algorithm (2)--cpu How time slices are assigned to go!