I. New and Old Version schedulers in LinuxComparison
Before kernel version 2.6, the scheduler had obvious restrictions when many tasks were active. This is because the scheduler uses an algorithm with the complexity of O (n. In this scheduler, the time consumed by a scheduled task is a function of the number of tasks in the system. In other words, the more active tasks, the longer the scheduled task takes. When the task load is very heavy, the processor will consume a lot of time due to scheduling, and the time used for the task itself will be very small. Therefore, this algorithm lacks scalability.
In the Symmetric Multi-Processing System (SMP), The scheduler before version 2.6 uses a running queue for all processors. This means that a task can be scheduled on any processor-this is a good thing for Server Load balancer, but it is a disaster for the memory cache. For example, assume that a task is being executed on a CPU-1 and its data is in the cache of this processor. If this task is scheduled to run on a CPU-2, the data needs to invalidate it in the CPU-1 and put it in the cache of the CPU-2.
In the past, the scheduler also used a run queue lock. Therefore, in the SMP system, selecting a task to execute will impede other processors from operating the run queue. The result is that the idle processor can only wait for the processor to release the queue lock, which will reduce the efficiency. (Linux kernel Study Notes: SMP, Uma, numa)
Finally, in the early kernel, preemption is impossible; this means that if a low-priority task is being executed, the high-priority task can only wait for it to complete.
1.1linux2.6Introduction to Scheduler
The scheduler of version 2.6 is designed and implemented by Ingo Molnar. INGO has been involved in Linux kernel development since 1995. The motivation for writing this new scheduler is to create a full O (1) Scheduler for wakeup, context switching, and timer interrupt overhead. One problem that triggers requirements for the new scheduler is the use of Java Virtual Machine (JVM. The Java programming model uses a lot of execution threads. In the O (n) scheduler, this will generate a lot of scheduling load. O (1) the scheduler will not be affected too much in this case of high load, so JVM can effectively execute.
The 2.6 scheduler solves three major problems (O (N) and SMP scalability issues found in the previous Scheduler) and solves other problems. Now we will start to explore the basic design of the 2.6 scheduler.
1.1.1Main scheduling Structure
First, let's review the scheduler structure of version 2.6. Each CPU has a running queue, which contains 140 priority lists, which serve in the FIFO order. All scheduled tasks are added to the end of the priority list of their respective running queues. Each task has a time slice, depending on how long the system allows the task to be executed. The first 100 priority lists of running queues are reserved for real-time tasks, and the last 40 are used for user tasks (see figure 1 ). Let's see why this difference is very important later.
Figure 1. Running queue structure of the Linux 2.6 Scheduler
In addition to the CPU running Queue (called the active runqueue), there is also an expired running queue. When a task in the active running queue uses its own time slice, it is moved to the expired running Queue (expired runqueue). During the moving process, the time slice will be re-calculated (so it will reflect its priority; it will be described in more detail later ). If there is no task with a given priority in the active running queue, the pointer pointing to the active running queue and the expired running queue will be exchanged, in this way, the expiration priority list can be changed to the activity priority list.
The scheduler is very simple: it selects a task in the queue with the highest priority for execution. To make this process more efficient, the kernel uses a bitmap to define when a task exists in a given priority list. Therefore, in most architecture, which of the five 32-bit characters (140 priorities) has the highest priority using a find-first-bit-set command. The time required to query a task for execution does not depend on the number of active tasks, but on the number of priority. This makes the scheduler of version 2.6 a process of complexity O (1), because the scheduling time is both fixed and not affected by the number of active tasks.
Although priority scheduling can also work in the SMP system, its large lock architecture means that when a CPU selects a task for distribution and scheduling, the running queue will be locked by this CPU, other CPUs can only wait. The scheduler of version 2.6 does not use a lock for scheduling. On the contrary, it has a lock for each running queue. This allows all CPUs to schedule tasks without competing with other CPUs. In addition, since each processor has a running queue, tasks are usually closely related to the CPU, which can better utilize the hot cache of the CPU.
Another advantage of the 2.6 scheduler is that it allows preemption. This means that tasks with lower priority cannot be executed when a high-priority task is ready to run. The scheduler will seize a low-priority process, put the process back in its priority list, and then re-schedule it.
It seems that the O (1) and preemption features of the 2.6 scheduler are not enough. The scheduler also provides dynamic task priority and SMP Load Balancing functions.
1.1.2Dynamic task priority
To prevent tasks from occupying the CPU exclusively and starve other tasks that need to access the CPU, the scheduler of Linux 2.6 can dynamically modify the task priority. This is done by punishing CPU-bound tasks and rewarding I/O-bound tasks. I/O-bound tasks usually use the CPU to set I/O, and then wait for the I/O operation to complete. This behavior provides CPU access for other tasks.
Because I/O-bound tasks are selfless for CPU access, their priority is reduced (rewarded) by up to five. A cpu-bound task is penalized by adding a maximum of five priority levels.
Whether the task is I/O bound or CPU bound depends on the interaction principle. Task interaction indicators are calculated based on the time spent in task execution and the time spent in sleep. Note that because I/O tasks are scheduled first and then sleep, therefore, I/O-bound tasks will spend more time sleeping and waiting for I/O operations to complete. This will increase the interaction index. It is worth noting that priority adjustment only applies to user tasks and does not apply to real-time tasks.
1.1.3smpServer Load balancer
When creating tasks in the SMP system, these tasks are put into a given CPU running queue. Generally, we cannot know when a task is short-lived or needs to run for a long time. Therefore, the initial task to CPU allocation may not be ideal.
To maintain task load balancing among CPUs, tasks can be re-distributed: Move tasks from the CPU with heavy loads to the CPU with light loads. In Linux 2.6, the scheduler uses load balancing to provide this function. Every 200 ms, the processor checks whether the CPU load is not balanced. If not, the processor performs a task balancing operation between CPUs.
One negative impact of this process is that the cache of the new CPU is cold for the migrated tasks (data needs to be read into the cache ).
Remember that the CPU cache is a local (On-Chip) memory that provides faster access than the system memory. If a task is executed on a CPU, data related to the task will be stored in the local cache of the CPU, which is called hot. If there is no data in the local cache of the CPU for a task, the cache is called cold.
Unfortunately, keeping the CPU busy will cause the CPU cache to be cold for the migrated tasks.
In summary, version 2.6 inherits and carries forward the features of the version 2.4 Scheduler:
(1) Interactive job priority
(2) High Performance of scheduling/wakeup in Light Load Conditions
(3) Fair Share
(4) priority-based scheduling
(5) CPU usage
(6) Efficient SMP affinity
(7) Real-time scheduling and CPU binding
New features on this basis:
(1) O (1) Scheduling Algorithm with a constant scheduler overhead (unrelated to the current system load), better real-time performance
(2) high scalability and greatly reduced lock Granularity
(3) newly designed SMP affinity Method
(4) optimized Batch Job Scheduling for computing density
(5) The scheduler works more smoothly under heavy load conditions.
(6) Child processes run before parent processes and other improvements
(7) added support for preemptible kernels
Ii. Code Analysis
2.1 main files
Include/Linux/sched. h
2.2 File Code
Analysis of far code can be used to download linux2.6 source code using software source insight for analysis, you can also view the Linux source code in the http://lxr.oss.org.cn (site information is more abundant, each version of the Code has, compare source insight to find different codes, but the efficiency is low)
Task_struct contains the process description, control information, and resource information. It is a static description of the process. The kernel of version 2.6 still uses task_struct to characterize the process. Although the thread is optimized, the kernel representation of the thread is still the same as that of the process. With the improvement of the scheduler, the content of task_struct has also been improved. New features such as interactive process priority support and kernel preemption support have been reflected in task_struct. In task_struct, some attributes are newly added, some attributes have changed their meanings, and some attributes only change their names. It can be called a process control block (TCB). It mainly contains the process identifier, priority, stack space, and process status.
The source code of task_struct is defined in kernel/include/Linux/sched. h.
Struct
Task_struct {
391Volatile longState;
/*-1 unrunnable, 0 runnable,> 0 stopped */
392StructThread_info * thread_info;
393 atomic_tusage;
394Unsigned longFlags;
/* Per process flags, defined below */
395Unsigned longPtrace;
396
397 int lock_depth;/* Lock depth */
398
399 int Prio, static_prio;
400StructList_head
Run_list;
401 prio_array_t * array;
402
403Unsigned longSleep_avg;
404LongInteractive_credit;
405Unsigned longTimestamp;
406 int activated;
407
408Unsigned longPolicy;
409 cpumask_t cpus_allowed;
410UnsignedInt time_slice, first_time_slice;
411
412StructList_head
Tasks;
413/*
414* Ptrace_list/ptrace_children formsthe list of my children
415* That were stolen by a ptracer.
416*/
417StructList_head ptrace_children;
418StructList_head ptrace_list;
419
420StructMm_struct * Mm, * active_mm;
2.2.1Core Structure State in task_struct
The Process status is still represented by state.
106 #define TASK_RUNNING 0
107 #define TASK_INTERRUPTIBLE 1
108 #define TASK_UNINTERRUPTIBLE 2
109 #define TASK_STOPPED 4
110 #define TASK_ZOMBIE 8
111 #define TASK_DEAD 16
State in linux2.4 task_struct
#define TASK_RUNNING 0
#define TASK_INTERRUPTIBLE 1
#define TASK_UNINTERRUPTIBLE 2
#define TASK_ZOMBIE 4
#define TASK_DEAD 8
2.6 two new states are added: traced and dead.
The newly added task_dead refers to a process that has exited and does not require the parent process to be recycled. Task_traced for debugging.
Several original states:
Task_zombie: A terminated process with tasks retained (it is dead and the account has not been canceled ).
Task_running ready state (to be precise, You Should Be task_runable)
Task_interruptible and task_uninterruptible
Task_stopped describes a stopped process. When a process receives a special signal or is called by the ptrace system, it monitors the process and gives the monitoring permission to the monitoring process. In linux2.4, the kernel is used to create a process with two consecutive physical pages (8 KB) for each process. Its top (low address) is used to store the task_struct structure (about 1 kb) of the process ), the remaining 7 kb is the system space stack of the process. The kernel can quickly locate the process through the stack register pointer ESP.
2.2.2thread _ INFO
In linux2.6. The top of the two pages no longer store the entire task_struct structure of the process, but thread_info in task _. Most of the information about task_struct is stored outside the stack. It can be conveniently accessed through the thread_info task pointer.
Thread_info is an important struct describing a task. The data structure definition of thread_info (/include/asm-386/thread _. h) is shown below)
27 struct thread_info {
28 struct task_struct *task; /* main task structure */
29 struct exec_domain *exec_domain; /* execution domain */
30 unsigned long flags; /* low level flags */
31 unsigned long status; /* thread-synchronous flags */
32 __u32 cpu; /* current CPU */
33 __s32 preempt_count; /* 0 => preemptable, <0 => BUG */
34
35
36 mm_segment_t addr_limit; /* thread address space:
37 0-0xBFFFFFFF for user-thead
38 0-0xFFFFFFFF for kernel-thread
39 */
40 struct restart_block restart_block;
41
42 unsigned long previous_esp; /* ESP of the previous stack in case
43 of nested (IRQ) stacks
44 */
45 __u8 supervisor_stack[0];
46 };
Some major environment information is as follows:
The task Pointer Points to the corresponding task control block.
Preempt_count indicates whether the kernel can be preemptible. If it is greater than 0. indicates that the kernel cannot be preemptible. If it is equal to 0, it indicates that the kernel is in a safe state (that is, there is no lock) and can be preemptible.
Flags contains a tif_need_resched bit. If this flag is set to 1, the scheduler should be started as soon as possible.
2.2.3timestamp
The time when the process schedules the event (unit: nanosecond, see below ). Includes the following types:
- The wake-up time (set in activate_task );
- Schedule ());
- Time (schedule () to be switched ());
- Assign values related to Server Load balancer (see "Server Load balancer related to schedulers ").
From the difference between this value and the current time, you can obtain information related to priority calculation, such as "waiting for running in the ready queue" and "Running duration" (see "optimization "). the Priority Calculation Method ").
2.2.4static _ PRIO
Prio is the dynamic priority of a process. It is equivalent to the goodness () calculation result in 2.4, ranging from 0 ~ The value between MAX_PRIO-1 (max_prio is defined as 140), where 0 ~ MAX_RT_PRIO-1 (max_rt_prio defined as 100) is within the scope of Real-Time Processes, max_rt_prio ~ MX_PRIO-1 is a non-real-time process. The greater the value, the smaller the process priority. It is the main basis for the scheduler to select a candidate process next, and static _ is the static priority of the process, which should be inherited from the parent process at the beginning of the process. The nice value follows the Linux tradition and ranges from-20 to 19.
The greater the value, the smaller the priority of the process. Nice is maintained by users, but only affects the priority of non-real-time processes. 2.6 The Nice value is no longer stored in the kernel, instead of static_prio. The size of the initial time slice of a process is only determined by the static priority of the process. Both real-time and non-real-time processes are the same, but static_prio of the real-time process is not involved in priority calculation.
The relationship between Nice and static_prio is as follows: static_prio = max_rt_prio + nice + 20
The kernel defines two macros for this conversion: prio_to_nice () and nice_to_prio ().
76 #define NS_TO_JIFFIES(TIME) ((TIME) / (1000000000 / HZ))
77 #define JIFFIES_TO_NS(TIME) ((TIME) * (1000000000 / HZ))
Two time units:
The system time is measured in nanosecond (one thousandth of a second), but the granularity of this value is too small. Most core applications can only obtain its absolute value and cannot perceive its accuracy.
Time-related core applications usually focus on clock interruption. in Linux 2.6, the system clock is interrupted once every 1 Millisecond (the clock frequency, expressed in Hz macro, is defined as 1000, that is, 1000 interruptions per second, which is defined as 2.4 In -- 100, and many applications still use the 100 clock frequency). This time unit is called a jiffie. Many core applications use jiffies as the time unit, such as the process running time slice.
The conversion formula between jiffies and absolute time is as follows: nanosecond = jiffies * 1000000
The core uses two macros for interchange of two time units: jiffies_to_ns () and ns_to_jiffies (). Many time macros also have two forms, such as ns_max_sleep_avg and max_sleep_avg.
2.2.5Activated
Activated indicates the reason why the process enters the ready state, which affects the calculation of the scheduling priority. Activated has four values:
- -1. The process is awakened from the task_uninterruptible status;
- 0. default value. The process is in the ready state;
- 1. The process is awakened from the task_interruptible status and is not in the interrupt context;
- 2. The process is awakened from the task_interruptible status and in the context of interruption.
The initial value of activated is 0, which is changed in two places. One is in Schedule (), and the other is activate_task (). This function is called by the try_to_wake_up () function, used to activate a sleep process:
- If activate_task () is called by the interrupt service program, that is, the process is activated by the interrupt, the process is most likely to be interactive. Therefore, set activated to 2; otherwise, set activated to 1.
- If the process is awakened from the task_uninterruptible status, activated =-1 (in the try_to_wake_up () function ).
2.2.6sleep _ AVG
The average wait time of the process (in the unit of nanosecond). The value ranges from 0 to ns_max_sleep_avg. The initial value is 0, which is equivalent to the difference between the waiting time and running time of the process. Sleep_avg has rich meanings. It can be used to evaluate the interaction degree of a process and express the urgency of running the process. This value is a key factor for Dynamic Priority calculation. The larger the sleep_avg is, the higher the priority of the calculated process (the smaller the value) will be. A specific function will be introduced later.
2.2.7interactive _ credit
This variable records the interaction degree of the process. The value ranges from-credit_limit to credit_limit + 1. When a process is created, the initial value is 0, and then 1 minus 1 is added according to different conditions. Once the value exceeds credit_limit (may only be equal to credit_limit + 1), it will not be downgraded, indicates that the process has passed the "interactive" test and is considered as an interactive process.
2.2.8time _ slice
The time slice balance of the process is equivalent to 2.4 counter, but it does not directly affect the dynamic priority of the process. The time_slice value of the process indicates the remaining size of the running time slice of the process. When the process is created, the slice is split equally with the parent process, and the time slice is decreased in the running process. Once it is set to 0, then, the above benchmark value is re-assigned according to the static_prio value and the request is scheduled. The decrease and reset of the time slice are performed in the clock interrupt (sched_tick (). In addition, the change of the time_slice value is mainly in the process of creating and exiting the process:
A) Process Creation
Similar to 2.4, in order to prevent the process from stealing time slice by repeatedly fork, the child process is not allocated its own time slice when it is created. Instead, it shares the remaining time slice of the parent process with the parent process. That is to say, after the fork ends, the sum of the time slices is equal to the time slice of the original parent process.
B) Process exited
When a process exits (sched_exit (), it determines whether it has never been re-allocated with a time slice based on the first_time_slice value. If yes, return the remaining time slice to the parent process (ensure that it cannot exceed max_timeslice ). This action prevents the process from being penalized for creating short-term sub-processes (in contrast to not being rewarded for creating sub-processes ). If the process has used up the time slice from the parent process, there is no need to return it (this is not considered in 2.4 ).
In 2.4, the remaining time slice of a process is the most influential factor to the dynamic priority except for the nice value, and the time slice of a process with many sleep times will be superimposed constantly, in this way, the scheduler has a higher priority. But in fact, the process does not indicate that the process is interactive and only indicates that it is Io-intensive. Therefore, this method has low accuracy, sometimes the database applications that frequently access the disk are treated as interactive processes by mistake, which leads to slow response of real user terminals.
2.6 of schedulers divide ready processes into active and expired categories based on whether the time slice is exhausted, which correspond to different ready queues, respectively, the former has absolute scheduling priority over the latter-only when the active process time slice is exhausted, the expired process has the opportunity to run. However, when a process is selected in the active state, the scheduler no longer takes the remaining time slice of the process as a factor affecting the scheduling priority, A non-real-time interactive process with too long time slice will be artificially divided into several segments (each segment is called a running granularity, as defined below) for running. After each segment is completed, it is all stripped from the CPU and placed to the corresponding
The end of the active-ready queue provides opportunities for other processes with the same priority to run.
This operation is performed after schedule_tick () decreases the time slice. At this point, even if the time slice of the process is not used up, as long as the process meets the following four conditions at the same time, it will be forcibly stripped from the CPU and re-queued for the next scheduling:
- The process is currently in the active ready queue;
- This process is an interactive process (task_interactive () returns true, see "more precise interactive process priority", when nice is greater than 12, the macro returns Constant False );
- The time slice consumed by this process (the baseline value of the time slice minus the remaining time slice) is exactly an integer multiple of the running granularity;
- The remaining time slice is not smaller than the running Granularity
The execution granularity definition timeslice_granularity is defined as a macro related to the sleep_avg of the process and the total number of CPUs of the system. Sleep_avg actually represents the difference between the non-running time and the running time of the process, which is closely related to the interaction degree judgment. Therefore, the definition of the running granularity shows the following two scheduling policies of the kernel:
- The higher the interaction degree of processes, the smaller the running granularity, which is allowed by the running characteristics of interactive processes. Correspondingly, CPU-bound processes should not be sharded to avoid cache refreshing;
- The larger the number of CPU resources, the larger the running granularity.
2.2.9 New data structure runqueue
2.4 ready queue is a simple bidirectional linked list with the runqueue_head header. In 2.6, ready queue is defined as a much complex data structure struct runqueue, and it is particularly important that, each CPU will maintain its own ready queue, which will greatly reduce competition (this is introduced earlier and is also the most important part of the scheduler, which will be detailed at the end ).
O (1) many key algorithms are related to runqueue.
1) prio_array_t * active, * expired, arrays [2]
The most critical data structure in runqueue. The ready queue of each CPU is divided into two parts based on whether the time slice is used up, which are accessed by the active pointer and the expired pointer respectively. The active point points to the ready process that is not used up and can be scheduled currently, expired points to the ready process used up by the time slice. Each ready process is represented by a structprio_array structure:
Struct prio_array { Int nr_active;/* Number of processes in the Process Group */ Struct list_head queue [max_prio]; /* For hash tables indexed by priority, see */ Unsigned long bitmap [bitmap_size]; /* Accelerate the bitmap used to access the hash table. For details, refer to the following figure */ }; |
In the kernel of version 2.4, the process of finding the best candidate readiness process is carried out in the scheduler schedule, every scheduling is performed once (goodness () is called in the for loop). This search process is related to the number of currently ready processes. Therefore, the search takes O (N) and N indicates the number of ready processes. Because of this, the execution time of the scheduling action is related to the current system load and cannot be specified with an upper limit, which is contrary to the real-time requirement.
In the new O (1) scheduling, this search process is divided into n steps, and each step consumes an O (1) magnitude.
Prio_array contains an array of ready queues. The index of the array is the priority of the process (processes with the same priority are placed in the queue of the linked list of corresponding array elements. During scheduling, the first item in the list with the highest priority in the active queue is directly provided as a candidate process, while the priority calculation process is distributed to the execution process of each process.
To accelerate the search for a linked list with ready processes, the 2.6 core creates a single-bit ing array to correspond to each priority linked list. If the priority linked list is not empty, the corresponding bit is 1, otherwise, it is 0. The core also requires that each architecture construct a sched_find_first_bit () function to perform this search operation and quickly locate the first non-empty ready process linked list.
This algorithm disperses the centralized computing process to ensure the maximum running time of the scheduler, while retaining more information in the memory also accelerates the process of locating candidate processes. This change is simple and efficient, and is one of the highlights of the 2.6 kernel.
Arrays binary array is the container of two types of ready queues. Active and expired point to one of them. Once the active process has used up its own time slice, it is transferred to expired and a new initial time slice is set. When active is empty, it indicates that all the time slices of the current process have been consumed. At this time, the active and expired checks and starts the next round of time slice decline.
Recall that in the 2.4 scheduling system, the calculation of the process time slice was time-consuming. In earlier kernel versions, once the time slice ran out, the time slice was re-computed during the clock interruption. Later, in order to improve efficiency, reduce the processing time of clock interruptions. 2.4 The scheduling system recalculates all the time slices of the ready processes in the scheduler after they are consumed. This is a process of O (n) magnitude. To ensure the execution time of O (1) scheduler, 2.6 of the time slice is calculated separately when each process consumes the time slice, and the rotation of the time slice is completed through the simple comparison described above. This is another highlight of the 2.6 scheduling system.
2) spinlock_tlock
Runqueue's spin lock should be locked when runqueue needs to be operated, but this lock operation only affects the ready queue on one CPU. Therefore, the probability of competition is much lower.
3) task_t * curr
The process that the CPU is running.
4) tast_t * idle
The idle process pointing to the current CPU is equivalent to init_tasks [this_cpu ()] in 2.4.
5) intbest_expired_prio
Record the highest priority (minimum value) in the expired ready process group ). This variable is saved when the process enters the expired Queue (schedule_tick (). For the purpose, see "expired_timestamp ).
6) unsigned longexpired_timestamp
After a new round of time slice declines, this variable records the time when the earliest process consumes the time slice event (the absolute value of jiffies is assigned in schedule_tick ), it represents the maximum wait time of the ready process in expired. Its usage is reflected in the expired_starving (RQ) Macro.
As mentioned above, two ready queues, active and expired, are maintained on each CPU. Generally, the process ended with a time slice should be transferred from the active queue to the expired Queue (schedule_tick (), but if the process is an interactive process, the scheduler will keep it in the active queue to improve its response speed. This measure should not allow other ready processes to wait too long. That is to say, if the processes in the expired queue have been waiting for a long enough time, even the interactive process should be transferred to the expired queue to empty the active process. This threshold value is reflected in expired_starving (RQ)
On the premise that both expired_timestamp and starvation_limit are not equal to 0, if both of the following conditions are met, expired_starving () returns true:
- (Current absolute time-expired_timestamp)> = (total number of ready processes in the starvation_limit * queue + 1), that is, at least one process in the expired queue has waited long enough;
- The static priority of a running process is lower than the highest priority in the expired Queue (best_expired_prio, which is a large value). At this time, the active state should be cleared and switched to expired as soon as possible.
7) structmm_struct * prev_mm
The active_mm structure pointer of the scheduled process (called PREV) after the process switchover. In 2.6, Prev active_mm is released after the process is switched over (mmdrop (). In this case, Prev active_mm may be null, so it is necessary to reserve prefix in runqueue.
8) unsigned longnr_running
Number of ready processes on the CPU. This value is the total number of processes in the active and expired queues. It is an important parameter to describe the CPU load.
9) unsigned longnr_switches
Records the number of process switches that have occurred since the scheduler runs on the CPU.
10) unsignedlong nr_uninterruptible
Logs the number of processes whose cpu is still in the task_uninterruptible status, which is related to the load information.
11) atomic_tnr_iowait
Records the number of processes in sleep state when the CPU waits for Io.
12) unsignedlong timestamp_last_tick
The time when the last scheduling event of the ready queue is used in load balancing.
13) intprev_cpu_load [nr_cpus]
Record the load status of each CPU During Load Balancing (now the nr_running value in the ready queue) to analyze the load.
14) atomic_t * node_nr_running; int prev_node_load [max_numnodes]
These two attributes are only valid in the NUMA architecture and record the number of ready processes on each NUMA node and the load of the last load balancing operation.
15) task_t * migration_thread
The migration process pointing to the current CPU. Each CPU has a core thread used to perform Process Migration.
16) structlist_head migration_queue
List of processes to be migrated.
(The above functions will be described in details in the specific scheduling function recalc_task_prio)