Design and Implementation of Linux kernel Reading Notes (4)-Main content of process scheduling: What is scheduling implementation principle system call related to method scheduling implemented by scheduling on Linux 1. what is scheduling the current operating system is multi-task, in order to allow more tasks to run better on the system at the same time, A hypervisor is required to manage all tasks (processes) running simultaneously on the computer ). This management program is a scheduling program. Its function is simple to say: it determines which processes are running and which processes are waiting to determine how long each process runs. In addition, in order to get a better user experience, running processes can be immediately interrupted by other more urgent processes. In short, scheduling is a balanced process. On the one hand, it should ensure that each running process can use the CPU to the maximum extent (that is, as few processes as possible, too many process switches, CPU time will be wasted on switching); on the other hand, ensure that each process can use the CPU fairly (that is, to prevent a process from occupying the CPU for a long time ). 2. Scheduling implementation principle as mentioned earlier, the scheduling function is to determine which process runs and how long the process runs. Determining which process is running and how long it takes is related to the priority of the process. To determine how long a process can last, the concept of time slice is introduced in scheduling. 2.1 There are two methods to measure the priority of a process: nice and real-time. Nice value range:-20 ~ + 19. The higher the value, the lower the priority. That is to say, the process with nice value-20 has the highest priority. The real-time priority ranges from 0 ~ 99, opposite to the nice value definition, the real-time priority is higher than the value, the higher the priority. Real-time processes are processes with high response time requirements. Therefore, if the system has processes with high real-time priority in the running queue, they will seize the running time of general processes. It is hard to understand the two priorities of a process. which priority is higher? What if a process has two priorities at the same time? In fact, the linux kernel has long had a solution. Which of the following gives priority to the first question? The answer is that the real-time priority is higher than the nice value. In the kernel, the real-time priority ranges from 0 ~ For the definition of MAX_RT_PRIO-1 MAX_RT_PRIO, see include/linux/sched. h 1611 # define MAX_USER_RT_PRIO 1001612 # define MAX_RT_PRIO MAX_USER_RT_PRIOnice in kernel range is MAX_RT_PRIO ~ MAX_RT_PRIO + 40 is MAX_RT_PRIO ~ MAX_PRIO1614 # define MAX_PRIO (MAX_RT_PRIO + 40) The second problem is that a process has two priorities at the same time. What should I do? The answer is simple: a process cannot have two priorities. If a process has a real-time priority, there is no Nice value, and a Nice value has no real-time priority. Run the following command to view the real-time priority and Nice value of a process: (RTPRIO is a real-time priority, NI is a Nice value) $ ps-eo state, uid, pid, ppid, rtprio, ni, time, commS uid pid ppid rtprio ni time commands 0 1 0-0 00:00:00 systemdS 0 2 0-0 00:00:00 kthreaddS 0 3 3 2-0 00:00:00 ksoftirqd/0 S 0 6 2 99-00:00:00 migration/ 0 S 0 7 2 99-00:00:00 watchdog/0 S 0 8 2 99-00:00:00 migration/1 S 0 10 2-0 00:00:00 ksoftirqd/1 S 0 12 2 99-00:00:00 watchdog/ 1 S 0 13 2 99 -00:00:00 migration/2 S 0 15 2-0 00:00:00 ksoftirqd/2 S 0 16 2 99-00:00:00 watchdog/2 S 0 17 2 99-00:00:00 migration/3 S 0 19 2- 0 00:00:00 ksoftirqd/3 S 0 20 2 99-00:00:00 watchdog/3 S 0 21 2--20 00:00:00 cpusetS 0 22 2-20 00:00:00 khelper 2.2 The time slice has a priority, you can decide who runs first. However, for the scheduler, it is not the end of a running task, but you must know the interval for the next scheduling. So we have the concept of time slice. A time slice is a numerical value that indicates the time that a process can run continuously before being preemptible. It can also be considered as the time when the process runs before the next Scheduling (unless the process voluntarily abandons the CPU or has a real-time process to seize the CPU ). Setting the size of the time slice at www.2cto.com is not simple. Setting the time slice is too large and the system response is slow (the scheduling cycle is long). Setting the time slice is too small, resulting in processor consumption caused by frequent process switching. The default time slice is generally 10 ms 2.3 scheduling implementation principle (based on priority and time slice). The following is an intuitive example: assume that there are only three processes in the system: ProcessA (NI = + 10), ProcessB (NI = 0), ProcessC (NI =-10), and NI indicates the nice value of the process, time slice = 10ms1) before scheduling, map the process priority to a certain weight into a time slice (Here we assume that the higher priority level is equivalent to more than 5msCPU time ). If ProcessA allocates a time slice for 10 ms, the priority of ProcessB is 10 higher than that of ProcessA (the smaller the nice value, the higher the priority). ProcessB should allocate 10*5 + 10 = 60 ms, and so on, processC allocates 20*5 + 10 = 110ms2). When scheduling starts, processes with more CPU time are preferentially scheduled. Because ProcessA (10 ms), ProcessB (60 ms), ProcessC (110 ms ). It is clear that ProcessC 3) is scheduled for 10 ms (a time slice), ProcessA (10 ms), ProcessB (60 ms), and ProcessC (100 ms) are scheduled again ). ProcessC has just run for 10 ms, so it becomes 100 ms. At this time, ProcessC4 is still scheduled for four times (four time slices), ProcessA (10 ms), ProcessB (60 ms), and ProcessC (60 ms ). At this time, ProcessB and ProcessC have the same CPU time. In this case, we have to check who is in front of the CPU running queue. If ProcessB is in front of the queue, ProcessB5) will be scheduled after 10 ms (one time slice, processA (10 ms), ProcessB (50 ms), ProcessC (60 ms ). Re-schedule ProcessC6) ProcessB and ProcessC run alternately until ProcessA (10 ms), ProcessB (10 ms), ProcessC (10 ms ). In this case, you have to check ProcessA, ProcessB, and ProcessC who will schedule them before the CPU running queue. Assume that ProcessA (7) is scheduled to run after 10 ms (a time slice), ProcessA (exit after the time slice is used up), ProcessB (10 ms), and ProcessC (10 ms ). 8) after two time slices, ProcessB and ProcessC run and exit. This example is very simple, mainly to illustrate the scheduling principle. Although the actual scheduling algorithm is not so simple, the basic implementation principle is similar: 1) determine the CPU time that each process can occupy (here there are many algorithms for determining the CPU time, depending on different needs) 2) run the CPU time first 3) after running, after deducting the CPU time of the running process, return to 1) 3. linux scheduling implementation method the scheduling algorithm on Linux is constantly developing. After kernel 2.6.23, the "completely fair scheduling algorithm" (CFS) is adopted. When allocating the CPU time of each process, the www.2cto.com CFS algorithm does not assign them an absolute CPU time, But assigns them a percentage of CPU time used according to the priority of the process. For example, ProcessA (NI = 1), ProcessB (NI = 3), and ProcessC (NI = 6). In the CFS algorithm, the CPU usage is as follows: ProcessA (10% ), processB (30%), ProcessC (60%) because the total is 100%, ProcessB priority is three times that of ProcessA, ProcessC priority is six times that of ProcessA. The CFS Algorithm on Linux mainly includes the following steps: (taking ProcessA (10%), ProcessB (30%), ProcessC (60%) as an example) 1) Calculate the vruntime of each process (note 1 ), update the vruntime of a process using the update_curr () function. 2) Select a process with the minimum vruntime to run. (Note 2) 3) after the process is run, update the vruntime of the process and transfer it to step 2. (note 3) Note 1. The vruntime here is the sum of the virtual running time of the process. Vruntime is defined in struct sched_entity in the kernel/sched_fair.c file. Note 2. This is a bit difficult to understand. Selecting the process to run based on vruntime does not seem to have any relationship with the percentage of CPU time occupied by each process. 1) For example, run ProcessC first (vr is short for vruntime), then after 10 MS: ProcessA (vr = 0), ProcessB (vr = 0), ProcessC (vr = 10) 2) then the next scheduling can only run ProcessA or ProcessB. (Because processes with the minimum vruntime will be selected.) For a long time, ProcessA, ProcessB, and ProcessC run in a fair and alternate manner and have no relationship with the priority. In fact, vruntime is not the actual running time. It is the result of weighted calculation of the actual running time. For example, in the preceding three processes, ProcessA (10%) only allocates 10% of the total CPU processing time. If ProcessA runs for 10 ms, its vruntime will increase by 100 ms. Similarly, if ProcessB runs for 10 ms, its vruntime will increase (100/3) The ms, and ProcessC runs for 10 ms, its vruntime will increase (100/6) ms. In actual running, because the vruntime of ProcessC increases the slowest, it gets the most CPU processing time. The above Weighting Algorithm is self-simplified for ease of understanding. For Linux's vruntime weighting method, we have to look at the source code ^-^ note 3. linux stores all processes in a red/black tree to quickly find the minimum vruntime. In this way, the leftmost leaf node of the tree is a process with the smallest vruntime. When a new process is added or an old process exits, the tree will be updated. In fact, the scheduler on Linux is provided in the module mode. Each scheduler has different priorities, so multiple scheduling algorithms can exist at the same time. Each process can select its own scheduler. During Linux scheduling, first select a scheduler based on the scheduler's priority and then select the processes under the scheduler. 4. scheduling-related system call scheduling-related system calls mainly fall into two categories: 1) Scheduling Policies and process priorities (that is, the preceding parameters, priorities, time slices, and so on) -In the table below, the first 8 2) are processor-in the table below, the last 3 system call descriptions www.2cto.com nice () are used to set the nice value sched_setscheduler () of the process to set the scheduling policy of the process, that is, set the scheduling algorithm sched_getscheduler () used by the process to obtain the scheduling algorithm sched_setparam () of the process to set the real-time priority sched_getparam () of the process to obtain the maximum real-time priority of the process, due to user permission issues, non-root users cannot set the real-time priority to 99 sched_get_priority_min () to obtain the minimum value of the real-time priority. The reason is similar to sched_rr_get_interval () to obtain the time slice of the process. Sched_setaffinity () sets the processing affinity of the process, which is actually the cpu_allowed mask that is saved in task_struct. Each bit of the mask corresponds to a processor available in the system. By default, all bits are set, that is, the process can be executed on all the processors in the system. You can use this function to set different masks so that processes can only run on one or more processors in the system. Sched_getaffinity () gets the processing affinity of the Process sched_yield () temporarily giving up the processor