Process-Process scheduling (1)

Last Update:2016-05-04 Source: Internet

Author: User

Tags cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Process-Process scheduling (1) Context switch

Processes can be scheduled, but each process must be guaranteed to execute sequentially, and all the information required for a process execution can be maintained by the process's PCB (task_struct). So when the process is switched on the current process can be stored in the running state information (snapshot) to its PCB (so that the next time the scheduler selects it and then the previous state continues to execute), will be executed immediately the running status of the process information (in the PCB) to recover, so that the schedule can be reasonably completed, This process is called context switching.

Interrupt

Context switching is done in the kernel, transparent to the user, so the context switch must first fall into the kernel (usually through clock interrupts and system calls). The context switch requires hardware support. The current process is running, and when an interrupt occurs, the interrupt hardware presses the program counter, the program status Word, and sometimes one or more registers (the process's running state information) into the kernel stack of the current process, and the PC jumps to the Interrupt service program Access (according to the hardware vector method or software query method to get the Interrupt Service Program entry address) to execute the Interrupt service program. Note that all of this work is done by hardware, completing a stack switch (from the user stack of the process to the kernel stack of the process, which is done by the interrupt hardware) while the work is complete. Then, the control of the PC is transferred to the software (Interrupt service program), in general, the Interrupt service program has its own interrupt stack (just as the process has its own kernel stack), in order not to destroy the kernel stack, A stack switch occurs during the execution of the Interrupt service program (switching from the kernel stack to the interrupt stack, this time the software is completed), into the interrupt context , and then the Interrupt service program invokes the interrupt handling routine that handles the specific interrupt request. Let the interrupt processing routine run in the interrupt context. The interrupt processing routine returns to the interrupt service after completion of the interrupt processing, and the return process involves a stack switch (switching from the interrupt stack to the kernel stack, this time the software is finished), and finally the interrupt service executes the interrupt return instruction. The state information in the kernel stack is restored to the appropriate registers by the interrupt hardware, the state information saved in the kernel stack is emptied while the registers are restored, and the system is returned from the kernel state to the user state, where a stack switchover (from the kernel stack to the user stack is performed by the interrupt hardware) occurs.

The interrupt process for context switching occurs

The Interrupt Service program calls the interrupt handling routine, and the interrupt processing routine invokes the scheduler during execution, and the scheduler checks to see if context switching is required (not a context switch occurs for each interrupt, for example, a clock break may not cause a context switch). Discover the need for context switching, and then the first thing to do is to save the current process's running state to its PCB (this job is done by the scheduler, by the software process).

The work of saving the state of the current process is done by a short assembly-language routine that breaks the state data saved by the hardware into the kernel stack (POP) to the PCB of the process, while the pop operation empties the status information of the process in the kernel stack. Prepares for loading the status information for the next process selected by the scheduler.

After the running state of the current process is saved, the state information of the process that was interrupted in the process kernel stack has been emptied, ready to load the running state information for the next process. Next, the scheduler chooses a process (a process in the current highest priority queue), presses its last saved run state information (in the PCB) into the kernel stack, updates the THREAD_INFO structure in the kernel stack, and then returns to the Interrupt service program. Then, the interrupt service program executes from the interrupt return instruction (RETURN-FROM-TRAP), the rest of the recovery work is done by the interrupt hardware, then the system switches from the kernel state to the user state, the entire context switching process is complete.

Linux Process Categories

1. Real-time process: High priority, fast response, priority range [0,99].
2. Normal process: Divided into interactive process (I/O consumption type) and batch process (CPU consumption type), priority is lower than real-time process, range is [100,139].

In Linux, the scheduling algorithm explicitly confirms the identity of all real-time processes, but does not differentiate between interactive processes and batch processes. The Linux2.6 Scheduler implements a heuristic algorithm based on the past behavior of the process to determine whether the process should be treated as an interactive or batch process at the moment. The scheduling priority of the interactive process is higher than the batch process.

Scheduling-related system calls in Linux

system Calls	Description
Nice ()	Change the static priority of a normal process
GetPriority ()	Get the maximum static priority for a common set of processes
SetPriority ()	Set a static priority for a common set of processes
Sched_getscheduler ()	Get a scheduling policy for a process
Sched_setscheduler ()	Set scheduling policy and real-time priority for a process
Sched_getparam ()	Get the real-time priority of a process
Sched_setparam ()	Set the real-time priority of a process
Sched_yield ()	Voluntary abandonment of the processor without blocking
Sched_get_priority_min ()	Get the minimum real-time priority of a strategy
Sched_rr_get_interval ()	Time slice value to get time slice rotation strategy
Sched_setaffinity ()	To set the CPU affinity mask for a process
Sched_getaffinity ()	Gets the CPU affinity mask for the process

several priority fields

nice
static_prio

Nice and Static_prio are the two priority fields that normal processes use, and they are used by real-time processes that use a rotation strategy, just to recalculate the length of the rotation time slice.

rt_priority

Rt_priority is the priority field used by the real-time process.

prio

This is the field that determines the priority of the process scheduling, that is, the process is determined by this field in which ready queue.

Linux Process scheduling policy

The user can call the system call Sched_setscheduler () to set the scheduling policy.

unsigned int policy;

Linux process scheduling is preemptive, allowing high-priority processes to preempt low-priority processes, which are the basic mechanisms that must be ensured by the algorithms that correspond to the following scheduling strategies. In addition, scheduling only occurs between processes that are in the ready queue (Ranqueue).

The scheduling priority of a real-time process is determined only by the real-time priority (rt_priority, range [0,max_rt_prio-1]) static, which is not dynamically changed from the beginning of the user designation, and is not affected by the Nice value ( The nice value only affects the scheduling priority of the normal process within the [100,139] range of the scheduling priority. The scheduling priority (Prio) of the real-time priority is obtained by its real-time priority (rt_priority) calculation (Prio = max_rt_prio-1-rt_priority), which ranges between 0~max_rt_prio-1. The default Max_rt_prio is 100, so the default real-time process's scheduling priority (Prio) range is [0,99], while the scheduling priority of real-time processes does not change dynamically, but can be used by users using system calls Sched_setparam () and Sched_ Setscheduler () to reset its real-time priority to change the scheduling priority. Real-time processes are located in high-priority queues (specifically, priority queues within the [0~max_rt_prio-1] interval) and are always in the active run queue, so as long as there is a real-time process present, the normal process never wants to run.

The scheduling priority of the normal process is calculated based on the static priority Static_prio and the average sleep time sleep_avg dynamic, Static_prio is based on the nice value, and the two can be converted to each other. For detailed instructions, see the Task_struct code comment. The default scheduling priority (Prio) range for a normal process is [100,139].

1. SCHED_FIFO

A value of 1-real-time process-with priority-based FIFO algorithm. Real-time processes based on the SCHED_FIFO scheduling mechanism will always occupy the CPU at runtime, unless the ready queue has a higher priority real-time process, or a voluntary call to block primitives (such as sleep_on_interruptible() ), or stop, or be killed, or automatically discard the CPU by calling Sched_yield ().

2.SCHED_RR

A value of 2-real-time process-uses a priority-based rotation method. Once the current process voluntarily calls the blocking primitive (such as sleep_on_interruptible() ), or stops, or is killed, or by calling Sched_yield () automatically discards the CPU or a time slice is consumed, or there is a higher priority in the ready queue for a real-time process, the dispatch occurs when the interrupt returns. The scheduler, based on the SCHED_RR scheduling mechanism, places the process at the end of the same priority queue and then runs the next real-time process in that priority queue. This is a time-sharing system to achieve good interaction of the basic algorithm, also known as the temporal rotation scheduling algorithm.

3. SCHED_NORMAL

A value of 0-non-real-time process-using priority-based multilevel feedback queue rotation method to dynamically adjust the scheduling priority based on the past execution of the process, the process of completing the rotation time slice is also generally put into the expired operational queue. It balances the turnaround time and interactivity of the job and prevents starvation.

Scheduling policy for normal processes: Sched_normal

The scheduling priority of a normal process (Prio) is determined by 2 factors, one is the static priority of the process (Static_prio, also called the base priority, the default value is 120), and the other is the average sleep time of the process (SLEEP_AVG). The scope of the Static_prio is [100,139], and the new process always inherits the static priority of its parent process.

static_prio

Static_prio is calculated by NICE (range [ -20,19]) (Static_prio = + Nice), so the user can call Nice () and setpriority () via the system To set the nice value to change the static priority of the process that you own (and the static priority can only be changed by these two system calls, otherwise the static priority does not change, although the dynamic priority changes).

The normal process every time there is a limited rotation time slice, this is the process before being preempted to occupy the CPU time slice length, if there is no high-priority process steals, the current process will be able to occupy the CPU until the end of its rotation time slice. static priority is used to calculate the rotation time slice of ordinary process, the basic rule is that the higher the static priority, the smaller the Static_prio, the longer the rotation time slice. So for the normal process, the higher the static priority of the process to get the continuous execution of the CPU time slice longer, it can be said that the nice value only determines the average process can be obtained by the rotation time slice length. The specific calculation rules can be consulted. /KERNEL/SCHED.C in Task_timeslice (), this function calculates and returns a process rotation time slice length, or refer to the basic time slice section under "Understanding the Linux Kernel" in chapter seventh of the general process scheduling section.

prio

This field in Task_struct is the scheduling priority of the process, which is used to determine which priority queue the process is in, thus implementing an O (1) dispatch. For normal processes, this field is called its dynamic priority , calculated by static priority and average sleep time, and the range is [100,139].

The main idea of the general process scheduling strategy (Sched_normal) is to measure whether the process is an I/O consumption or a CPU-consumed type through the past execution of the process, and the measure is the average sleep time sleep_avg of a process. If the sleep_avg of a process is large, the process tends to be more I/O-consuming, with higher priority and smaller prio.

Dynamic Priority:prio = max (100, min (static_prio - bonus，139) )

Bonus = 10*sleep_avg/hz-5, the range is between [ -5,5]. (The Sleep_avg range is [0,hz]), so the process scheduling priority remains the same when Sleep_avg = Hz/2 , and the process scheduling priority increases when HZ > Sleep_avg > Hz/2 ; 0 < Sleep_avg < HZ/2 , process scheduling priority is reduced.

For the interactive process, it will not be a one-time cycle of the rotation of the chip to light (this is generally the CPU-consuming process of doing things), but after a short while, to do an I/O operation, discard the CPU, which allows it to long time in the demand executable queue, The time slices are recalculated and then placed in the expired executable queue for a long time without a chance to execute. And the frequent execution of I/O operations is a major feature of the interactive process, which is the root cause of the exchange process of the interactive process, the system gives this process with higher bonus, so that its scheduling priority is higher, there are more opportunities to be dispatched.

Scheduling strategies for real-time processes

Refer to the Task_struct code comments and the preamble of the Linux process scheduling strategy.

Timing of process scheduling

When the scheduling time comes, the kernel or driver will call schedule (), the timing of scheduling in Linux is mainly:
First, when the state of current transitions from running to other states, such as:
1) process termination. Exit () calls schedule () at the end.
2) The process enters a wait state for some reason (which is then removed from the ready queue and inserted into the wait queue).
It is more common that the process calls Nanosleep () or the system call of the wait series. In addition, the most common cause in a device driver is when the driver raises an I/O operation and enters a wait state waiting for the end of the I/O operation. In most cases, the driver calls schedule () directly.

Second, the time slice of the current process is exhausted.
If the time slice is used up, it is judged by the clock interrupt handler, if run out, the need_resched position of the current process will be 1. If the need_resched bit of current is 1, then schedule () is called when the interrupt is going to return to the user state.

Third, when the process is returned from the interrupt, exception, system call state (that is, kernel state).
The need_resched tag is checked every time it returns from the kernel to the user state, and if the current need_resched is set to 1 in interrupts, exceptions, and system calls, the process is dispatched and the clock interrupt falls into this class.

Calculate Scheduling Priority: Effective_prio (P)

Calculates the scheduling priority of the process p, which is called when the process scheduling priority is updated recalc_task_prio() .

//.. /KERNEL/SCHED.C/** return the priority, which is based on the static*, is modified by bonuses/penalties. * * We Scale the actual sleep average [0 .... max_sleep_avg]* into the-5 ...   0... +5 bonus/penalty Range. *   */  Static intEffective_prio (task_t *p) {intBonus, Prio;if(Rt_task (P))//If it is a real-time process    returnp->prio;//Return scheduling priority for real-time processes  //If it is a normal processBonus = Current_bonus (p)-Max_bonus/2;//10* (SLEEP_AVG/HZ)-5, the value is between [ -5,5]. Prio = p->static_prio-bonus;if(Prio < Max_rt_prio) Prio = Max_rt_prio;if(Prio > max_prio-1) Prio = max_prio-1;returnPrio//Normal process scheduling priority   //prio = max (min (Static_prio-bonus, 139))}//The following code snippet is a macro definition for reference.

//.. /include/linux/sched.h#define Max_user_rt_prio#define Max_rt_prio Max_user_rt_prio //100 #define Max_prio (Max_rt_prio + //140 )#define RT_TASK (p) (Unlikely ((p)->prio < Max_rt_prio)) //prio<100? //.. /KERNEL/SCHED.C#define User_prio (P) ((p)-max_rt_prio) //p-100 #define Max_user_prio (User_prio (Max_prio)) //40 #define Prio_bonus_ratio#define Max_bonus (Max_user_prio * prio_bonus_ratio/100) //10 #define Def_timeslice (hz/1000)#define MAX_SLEEP_AVG (Def_timeslice * max_bonus) //hz #define Ns_to_jiffies (Time) (time)/(1000000000/hz))#define MAX_SLEEP_AVG (Def_timeslice * max_bonus) //hz #define Current_bonus (p) \(Ns_to_jiffies (P)->sleep_avg) * max_bonus/max_sleep_avg)//10*sleep_avg/hz value between [0,10]

Reference task_struct

//---------------------------------------Linux 2.6 Process scheduling related information-----------------------------------------  LongNiceThe initial priority of the process, the range [ -20,+19], the higher the default 0,nice value, the lower the priority, the assigned  //time slices may be less. The nice value can be modified by calling Nice () from the system.   intStatic_prio;//Static priority. Range is [Max_rt_prio, max_rt_prio+39], default  //[100,139]. The normal process uses the static priority Static_prio and the average sleep time to sleep_avg the scheduling priority prio of the dynamic computing process.    / * static_prio= Max_rt_prio + nice + 20 on ... There are two macros in/KERNEL/SCHED.C to convert between nice and Static_prio values #defineNICE_TO_PRIO (Nice) (Max_rt_prio + (Nice) +) #definePRIO_TO_N ICE (Prio) ((prio)-max_rt_prio-20) * /  unsigned intrt_priority;//real-time priority, [0,max_rt_prio-1], by default range [0,99], in  //setscheduler () is set and is not changed once set. The real_time process uses the real-time priority rt_priority The static calculation process's scheduling priority prio.    / * 0, normal 1-99, realtime * /  intPrio//Store the priority to be used by the scheduler, corresponding to the priority level in the priority bitmap. The higher the value  //large, indicating that the process has a lower priority.   / * 0-99, Realtime process 100-140, Normal process for Realtime Process:prio = MAX_RT_PRIO-1–RT_PR Iority for Normal Process:prio = max (Static_prio-bonus, Min (139)) where bonus is between [ -5,5].  The larger the bonus, the smaller the prio, the higher the priority. */   unsigned LongSleep_avg;//The value of this field is used to support the scheduler on the type of process (I/O consumption or CPU consumption type)  //To make judgments, the larger the value, the more time you sleep, the more I/O consumption, the more rewarding the process will be for the processes, and the more opportunities for the process to execute, the reverse  //It tends to be more CPU-intensive and will give the process more penalty. The range of Sleep_avg is 0~max_sleep_avg.   unsigned Long LongTimestampThe last time the process was inserted to run the queue, or when the last process switch involved in this process occurred //room.   unsigned Long LongLast_ran;//The time when the process switch for this process was last replaced. cputime_t Utime;//The process is in user-state CPU usage time. cputime_t stime;//The CPU usage time of the process in the kernel state.   unsigned LongSleep_time;//Process sleep time  unsigned intTime_slice;//Process remaining time slice, when a normal process (or real-time process based on the temporal slice rotation strategy)  //When the time slice is exhausted, the time slice is recalculated according to the task's static priority Static_prio. Task_timeslice () Returns a new time for the given task  //tablets. For highly interactive processes, when the time slice is exhausted, it is then placed in the active array instead of the expired array, which is implemented in Scheduler_tick ().   unsigned intFirst_time_slice;//If the process is sure not to run out of time slices, set the flag to 1  Const structSched_class *sched_class;//scheduling-related functions  structSched_entity se;//Dispatch entity  structSched_rt_entity RT;//real-time Task Scheduler entity#ifdef config_preempt_notifiers  /*list of struct preempt_notifier:*/  structHlist_head preempt_notifiers;//related to preemption#endif#if defined (config_schedstats) | | Define (CONFIG_TASK_DELAY_ACCT)  unsigned intPolicy//Represents the process scheduling policy for the process. The scheduling policies are://sched_normal 0, non-real-time process, using a priority-based rotation method. //sched_fifo 1, real-time process with FIFO algorithm. //SCHED_RR 2, real-time process, with priority-based rotation method  structSched_info Sched_info;//Scheduling related information, such as the time that the process is running on the CPU/waiting in the queue ...#endif  structList_head tasks;//Task queue, through the two-way circular linked list of the fields hosted in the PCB (task_struct)  //Link the host PCB together.   volatile Longneed_resched;//Dispatch flag indicating whether the process needs to be re-dispatched, if not 0, when returning from the kernel state to  //The dispatch occurs when the user state or returns from the interrupt. When the time slice of the process is exhausted, scheduler_tick () sets this identity when a high-priority process enters the  //Line status, TRY_TO_WAKE_UP () sets this identity.   structList_head run_list;//The running queue where the process resides. This queue has a priority k corresponding to it, all located in this  //The process in the queue has a priority of K, which is dispatched between these K-priority processes using the rotation method. The value of K is 0~139. This struct, located in the host PCB  The run_list field of the//list_head type will form a two-way circular linked list with a priority of k, like a thin rope, with all priority k in the operational state  //The process of the PCB (task_struct) link up. prio_array_t *Array;//point to the Ready process List of the CPU on which the current process resides. cpumask_t cpus_allowed//Bitmask of CPU that can execute the process  structThread_info *thread_info;/*thread_info in the schedule related fields: __u32 flags;//holds the tif_need_resched flag, if the scheduler must be called, set the flag __u32 cpu;//the CPU logical number of the running queue where the run process is running * /

Process-Process scheduling (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More