Linux scheduling Summary

Source: Internet
Author: User

Linux scheduling Summary

Scheduling:

Two Tasks of the scheduling program of the operating system:

1: scheduling:

Implement Scheduling Policies and decide the ordering of ready processes and threads competing with cpu. To put it bluntly, the process and thread should discard the cpu and select the ready process and thread for execution.

2: Dispatch:

The original Implementation of the scheduling mechanism how to reuse the cpu at a time, deal with the details of context exchange, complete the process, thread and cpu binding and discard the work.

Linux 2.4 scheduling:

1: policy:

Process Scheduling Policy:

1) SCHED_FIFO: the first-in-first-out policy used by the real-time process. The process will always occupy the cpu unless it automatically gives up the cpu.

2) SCHED_RR: rotation policy of the real-time process. When the time slice of the assigned u process is used up, the process will be inserted into the queue of the original priority.

3) SHED_OTHER: time slice rotation scheduling based on the priority of common processes.

2: priority: the static priority of the process.

3: nice: The factor used by the process to control the priority. In-20 ~ An integer between 19. Adding nice will lower the priority. The default value is 0.

4: rt_priority: Priority of the real-time process.

5: counter: a timer for the current time slice of the process. It is used to dynamically calculate the Dynamic Priority of a process. The system stacks the remaining time of processes with many sleep times.

6: schedule () process:

1) Check for Soft Interrupt requests. If yes, run the command first.

2) If the scheduling policy of the current process is RR and counter = 0, move the process to the end of the running process queue. Recalculate the counter value.

3) if the status of the current process is TASK_INTERRUPTIBLE and a signal is received, set the process status to TASK_RUNNING. If the status of the current process is not TASK_RUNNING, the process is removed from the executable queue and the need_resched of its process descriptor is set to 0.

4) select the most powerful value in the runable queue and store it in variable c. The corresponding process descriptor is saved in variable next.

5) Check whether c is 0 and c = 0. The time slice of all processes in the queue is used up. In this case, the time slice of all processes in the queue is recalculated. Re-Execute step 1.

6) If netx = current process, the scheduling process ends. Otherwise, switch the process.

Process: A 2.4 scheduling algorithm organizes all ready processes into a runable queue, whether in a single-core environment or smp environment, the cpu only cyclically traverses from this runable queue until the process to be run is selected. If the time slice of all processes is used up, re-calculate the time slice of all processes.

2.4 scheduled data structure:

2.4 insufficient scheduling:

1) An obvious disadvantage is that the time complexity is O (n), and the queue needs to be traversed each time, which is inefficient !. Although the complexity of O (n) does not seem to be very bad, and the number of processes that the system can accommodate may not necessarily be very large, the complexity of O (n) is quite intolerable.

2) because multiple CPUs still use the same running queue in the smp environment, switching between multiple CPUs reduces the cpu cache efficiency and reduces system performance.

3) multiple CPUs share a running queue, so that each cpu needs to lock the running queue during queue operations. If other idle CPUs need to access the running queue, you can only wait. From and, we can see that the 2.4 scheduling algorithm is not highly scalable to the smp environment! The smp environment cannot be well supported.

4) kernel preemption is not supported. The kernel cannot respond to real-time tasks in a timely manner and cannot meet the requirements of real-time systems (even if linux is not hard-real-time, it cannot meet the requirements of Soft Real-Time ).

5) the remaining time of a process is the biggest factor affecting the dynamic priority except for the nice value. The system overwrites the remaining time of processes with many sleep times to obtain a larger dynamic priority. This shows that the system prefers to prioritize the execution of I/O processes. In this way, the kernel improves the priority of interaction processes and gives them priority in execution. But does a sleep process represent a stable process? No, it only indicates that it is an I/O process. I/O processes require I/O interaction. For example, when a disk is read or written, the process is often in sleep state. If this process is regarded as an interactive process, it will affect other real interactive processes.

6) Simple load balancing. When the cpu is idle, the ready process is scheduled to this cpu for execution. Or if a cpu process has a lower priority than a process, it is scheduled to run on that cpu. The disadvantages of this simple load balancing are self-evident. Process Migration is frequent, and 2 and 3 occur. Such Load Balancing has more disadvantages than advantages!

Linux 2.6 O (1) scheduling:

1: policy: The scheduling policy is the same as that of 2.4.

2: rt_priority: Priority of the real-time process. From 0 ~ Between 99. MAX_RT_PRIO: 100. Does not participate in priority calculation.

3: static_prio: the static priority of a non-real-time process, which is converted from the nice value,-20

2: Priority array code

1) nr_active: processes that can run in the array

2) DECLARE_BITMP (...): The macro of the Priority bitmap. Find the queue with the highest priority and ready processes.

3) list_head queue: uses a general linked list and a priority queue.

Active or expired ):

Each cpu maintains its own running queue, and each running queue maintains its own active queue and expried queue respectively. When the time slice of the process is used up, it will be put into the expired queue. When the time slice of all processes in the active queue is used up, after the process is executed, switch the active queue and expried. In this way, the expried queue becomes an active queue. In this case, we only need to exchange pointers. To find the next process to run, the Scheduler only needs to find the queue with the highest priority and ready process according to the bitmap macro mentioned above. In such a data organization, the time complexity of 2.6 of scheduling programs is increased from 2.4 of O (n) to O (1 ). It is highly scalable to the smp environment.

The data structure is organized as follows:

Linux 2.6 O (1) scheduling process priority:

2.6 scheduling has 140 priority levels, from 0 ~ 139, 0 ~ 99 is the real-time priority, and 100 ~ 139 is not a real-time priority. As shown in the figure above.

Features:

1) Dynamic Priority is calculated based on the static priority and the running status and interaction of processes. So what really participates in scheduling is the dynamic priority of the process. The interaction of processes is determined by the sleep time of processes (this is basically the same as that of 2.4 ). Therefore, the dynamic priority is calculated based on the static priority and process sleep time. It should be noted that the dynamic priority is the basis for non-real-time processes to insert priority queues. However, real-time processes insert queues Based on rt_prioirty. The real-time priority of real-time processes will not change from Process Creation to Process Termination. However, the time slice is calculated based on the static priority.

2) The higher the process priority, the longer the time slice it executes each time.

3) use the TASK_INTERACTIVE () macro to determine whether a process is an interactive process. The macro is determined based on nice. The higher the nice value, the lower the priority, and the lower the interaction.

4) if a process is highly interactive, it will not be moved to the expired queue after running its own time slice, but will be inserted to the end of the original queue. In this way, interactive processes can quickly respond to users and improve interaction. If it is moved to the expired queue, the interactive process may suffer severe hunger before switching the queue pointer, resulting in a serious reduction in interaction.

5) when a new process is created, the child process shares the remaining time slice of the process with the parent process. That is, after fork () ------> do_fork (), the sum of the time slices of the Parent and Child processes is equal to the size of the time slice of the original parent process. This is done to prevent the parent process from stealing time slices by creating sub-processes. If a child process is never reassigned to a time slice when it exits, and there is still a time slice remaining, the remaining time slice will be returned to the parent process. In this way, the parent process will not be punished on the time slice because of the creation of the Child process.

2.6 O (1) Calculation Code of Dynamic Scheduling Priority:

1) inclutive_prio (p ):

2) normal_prio:

3) _ normal_prio:

Scheduling and preemption of linux 2.6 O (1) scheduling:

1: Direct scheduling: the current process directly calls schedule () to give up the cpu because of blocking.

1) the current process is put into the corresponding waiting queue.

2) Change the Process status of the current process. Change TASK_RUNNING to TASK_UNINTERRUPTIBLE or TASK_INTERRUPTIBLE.

3) Select a new process to run. Call schedule () to obtain the next process to be run.

4) when resources are available, the current process is removed from the waiting queue.

2: passive scheduling: the current process is forced to discard the cpu because the time slice is used up or is preemptible by a process with a higher priority. In this case, the current process will not be immediately scheduled, but the kernel needs to be scheduled by setting the TIF_NEED_RESCHED bit to 1. At the right time, the kernel will be rescheduled.

1) Question: Why cannot I schedule it immediately?

When a process runs in the kernel, it may need to apply for shared resources, such as spin locks. If a process needs to seize the current process at this time, it will immediately give up the cpu, if the new process also needs the same shared resources, it will lead to a deadlock! Therefore, the process only sets a flag to notify the kernel that scheduling is required.

2) question: when is the right time?

The kernel checks TIF_NEED_RESCHED when it is about to return to the user space. If it is set, schedule () is called, which will lead to user preemption.

A: When a user space is returned from the interrupt handler.

B: When the user space is returned from the system call.

Load Balancing for linux 2.6 O (1) scheduling:

Complicated!

Linux 2.6 O (1) scheduling transition:

1: SD Scheduler:

O (1) scheduling complexity mainly involves the Calculation of Dynamic Priority. The scheduler determines and modifies the priority of a process based on some obscure empirical formulas and average sleep time. This is a big drawback of O (1) Scheduling (or even fatal .). Features of SD scheduling:

1) The data structure is similar to O (1) scheduling, but the expired queue is missing.

2) After a process uses its time slice, it will not be placed in the expired queue, but in the next priority queue (that is why there is no expired Queue ). At the lowest level, when the time slice is used up, it will return to the initial priority queue and re-downgrade the process! Every downgrade is like the process of going down the stairs, so this is called a stair algorithm.

3) Two time slices: coarse-grained and fine-grained. Coarse granularity is composed of multiple fine-grained time slices. When a coarse-grained time slice is used up, the process starts to degrade, and a fine-grained time slice is used up for reincarnation. In this way, cpu-consuming processes stay at the same priority for the same time. I/O-consuming processes will stay in the queue with the highest priority for a long time, and may not necessarily fall down to a low-priority queue.

4) no hunger, code is simpler than O (1) scheduling, and the most important thing is to prove the feasibility of a completely fair idea.

5) The algorithm framework relative to O (1) scheduling remains unchanged. Each priority adjustment process is a process switching process. fine-grained time slice is usually compared with O (1) the scheduling time slice is much shorter. In this way, the additional overhead is inevitably increased, and the throughput is reduced. This is the main reason why SD algorithms are not used!

2: RSDL Scheduler:

The core idea of improving the SD algorithm is "completely fair" and there is no complicated dynamic priority adjustment strategy. The introduction of "group time quota" → total time available to all processes in each tg priority queue, "priority time quota" → tp, tp is not equal to the time slice of the process, but smaller than the process time slice. When the Process tp is used up, it is downgraded. Similar to the SD algorithm. After the tg of each queue is used up, all processes in the queue will be forcibly downgraded regardless of whether or not the queue has used up tp.

Linux 2.6 O (1) scheduling is insufficient:

1: Complex Empirical formulas that are hard to understand.

2: Is it fair?

3: Real-Time?

Outstanding Scheduling Algorithm in linux → cfs:

According to the author of cfs: "The work of cfs 80% can be summarized in one sentence: cfs simulates an ideal multi-task processor on real hardware. "In an ideal multi-task processor, each process can obtain the cpu execution time at the same time. When there are two processes in the system, the cpu time is divided into two parts, each of which accounts for 50%.

1: Virtual running time. The vt of a process is proportional to the actual running time and inversely proportional to its weight. The weight is determined by the process priority, and the priority is determined by the nice value. The higher the process priority weight, the smaller the process vt when the actual running time is the same. All non-real-time runnable processes are organized by a red/black tree. The process with the smallest vt is selected during scheduling. Because the key value of the Left subtree of the red/black tree is smaller than that on the right, you can select the process (entity) in the lower left corner of the tree during each scheduling.

2: completely fair thinking. Cfs no longer tracks the sleep time of processes, nor distinguishes interactive processes. It treats all processes in a unified manner, and each process obtains a fair cpu usage time within the specified time, this is the fairness in cfs!

3: cfs introduces a series of features such as modularization, fair scheduling, and group scheduling. Although it is completely fair scheduling, it is inherently unfair between processes (some kernel threads are used to handle emergencies), so this completely fair approach cannot be achieved. Cfs uses weight weights to distinguish unequal relations among processes, which is also the basis for cfs to achieve fairness. The weight is determined by the priority. The higher the priority, the greater the weight. However, the relationship between priority and weight is not a simple linear relationship. The kernel uses some empirical values for conversion.

If there are three processes a, B, and c with the weights of 1, 2, and 3 respectively, the weights of all processes are 6 and allocated according to the principles of cfs fairness, then the importance of a is 1/6, and B and c are 2/6 and 3/6. In this way, if the total time of a, B, and c running is 6 time units, a accounts for 1, B accounts for 2, and c accounts for 3.

Cfs Scheduler:

The data structure diagram of each part is as follows:

Virtual running time

With an ideal multi-task processor, each process can obtain the cpu time at the same time. But in fact, when a process occupies the cpu, other processes must wait, resulting in unfair. Therefore, the linux cfs scheduling introduces virtual runtime. The virtual running time is mainly determined by two factors: the actual running time and the weight of the virtual running time. The virtual running time increases with the actual running time but is not equal to the actual running time. As mentioned above, the kernel uses the red/black tree to sort the virtual running time. In this way, the leftmost process (scheduling entity) of the red/black tree is the most unfair process, it must be used as the next scheduled process.

The virtual running time of a process is calculated by calc_delta_fair. It is updated after each clock interruption. Formula:

If (se. load. weight! = NICE_0_LOAD)

Vruntime + = delta * NICE_0_LOAD/se. load. weight;

Else

Vruntime + = delta;

Delta is the actual running time of the Process increase. NICE_0_LOAD is the weight of the nice 0 process. The virtual running time is inversely proportional to the weight. The larger the weight of a process, the slower the virtual running time increases, the more left the location, and the more likely the virtual running time is to be scheduled.

The best way to understand cfs is to read the source code. The code below (some people have done a good job on the Internet ):

Call relationship diagram of each function:

(1)

Tick interrupt
The scheduler_tick () function is called in the tick interrupt handler. The code of this function is as follows:
The scheduler_tick () function is called in the tick interrupt handler. The code for this function is as follows:
Void scheduler_tick (void)
{
/* Obtain the current CPU */
Int cpu = smp_processor_id ();
/* Get the runqueue corresponding to the current CPU */
Struct rq * rq = cpu_rq (cpu );
/* Currently running process */
Struct task_struct * curr = rq-> curr;

Sched_clock_tick ();

Spin_lock (& rq-> lock );
/* Update the current timestamp of rq. Even if rq-> clock is changed to the current timestamp */
Update_rq_clock (rq); scheduler_tick ()
/* Update the rq load */
Update_cpu_load (rq );
/* Call the task_tick function of the scheduling module */
Curr-> sched_class-> task_tick (rq, curr, 0 );
Spin_unlock (& rq-> lock );

# Ifdef CONFIG_SMP
Rq-> idle_at_tick = idle_cpu (cpu );
Trigger_load_balance (rq, cpu );
# Endif
}
From the code above, we can see that after a part of joint processing, the process will be transferred to the task_tick () function of the scheduling module.
Corresponding to CFS, its sched_class structure is as follows:
Static const struct sched_class fair_sched_class = {
. Next = & idle_sched_class,
. Enqueue_task = enqueue_task_fair,
. Dequeue_task = dequeue_task_fair,
. Yield_task = yield_task_fair,

. Check_preempt_curr = check_preempt_wakeup,

. Pick_next_task = pick_next_task_fair,
. Put_prev_task = put_prev_task_fair,

# Ifdef CONFIG_SMP
. Select_task_rq = select_task_rq_fair,

. Load_balance = load_balance_fair,
. Move_one_task = move_one_task_fair,
# Endif

. Set_curr_task = set_curr_task_fair,
. Task_tick = task_tick_fair,
. Task_new = task_new_fair,

. Prio_changed = prio_changed_fair,
. Switched_to = switched_to_fair,

# Ifdef CONFIG_FAIR_GROUP_SCHED
. Moved_group = moved_group_fair,
# Endif
};
The corresponding processing function of task_tick is task_tick_fair (). The Code is as follows:

(2)

 

Execution Process of schedule ()

The schedule () function is called when a process needs to be preemptible or the process master delivers the processor. to reduce the length, the schedule () function code is not analyzed here. only list the main functions that call the module in this function. as shown below:

Schedule () ---->

Sched_class-> put_prev_task (rq, prev) ---->

Sched_class-> pick_next_task ()

Corresponding to the CFS, The put_prev_task () function is put_prev_task_fair (). This operation is to put the process back into the queue.

(3)

 

Scheduling PROCESS OF THE NEW PROCESS
When creating a new process, there is the following process in do_fork:
Long do_fork (unsigned long clone_flags,
Unsigned long stack_start,
Struct pt_regs * regs,
Unsigned long stack_size,
Int _ user * parent_tidptr,
Int _ user * child_tidptr)
{


If (unlikely (clone_flags & CLONE_STOPPED )){
/*
* We'll start up with an immediate SIGSTOP.
*/
Sigaddset (& p-> pending. signal, SIGSTOP );
Set_tsk_thread_flag (p, TIF_SIGPENDING );
_ Set_task_state (p, TASK_STOPPED );
} Else {
Wake_up_new_task (p, clone_flags );
}

}

That is, when a process is created with the CLONE_STOPPED flag at the end, wake_up_new_task () will be called for the new process ().
With the attachment added, you can download the Code with comments (from the previous good article on the Internet ).

------------------------------------------ Split line ------------------------------------------

Free in http://linux.bkjia.com/

The username and password are both www.bkjia.com

The specific download directory is in/July 6,/July 8,/Linux scheduling Summary/

For the download method, see

------------------------------------------ Split line ------------------------------------------

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.