Linux kernel--process management and scheduling

Source: Internet
Author: User

management and scheduling of processes
Process Management
Process descriptors and task structures

The process is stored in a two-way circular list called the task Queue (tasklist). Each item in the list contains all the information for a specific process, type task_struct, called process descriptor, which is defined in the <linux/sched.h> file.

Linux allocates the TASK_STRUCT structure through the slab allocator, which can achieve object reuse and cache coloring (cache coloring). On the other hand, in order to avoid the use of additional registers to store specialized records, so that the hardware architecture such as x86 so that less registers can be used as long as the stack pointer to calculate the location of the task_struct, the structure is thread_info, in the file <asm/thread_info.h Defined in >.

You can use the PS command to view information for all processes in Linux.

Process status

The state in task_struct describes the current status of the process. There are 5 different states of the process, and the process must be in one of these states:

1) task_running (run)--the process is executable, it is either executing, or waiting to be executed in the run queue. This is the only possible state that the process executes in the user space, or it can be applied to the process being executed in the kernel space.

2) task_interruptible (interruptible)--The process is sleeping (i.e. it is blocked) waiting for certain conditions to be reached. Once these conditions are reached, the kernel will set the process state to run, and the process in this state will be woken up and put into operation because the signal is received.

3) Task_uninterruptible (non-disruptive)-this state is the same as the interruptible state, except that it will not be woken up for receiving a signal to be put into operation. This state usually occurs when the process must wait without interference or wait for the event to occur soon. Because tasks in this state do not respond to signals, they are less used than interruptible states.

4) Task_zombie (zombie)--the process has ended, but its parent process has not yet called the WAIT4 () system call. The process descriptor for the child process is still preserved for the parent process to be able to learn its message. Once the parent process calls WAIT4 (), the process descriptor is freed.

5) task_stopped (stop)--the process stops executing and the process is not operational or operational. Usually this state occurs when a signal such as Sigstop,sigtstp,sigttin,sigttou is received. In addition, any signal received during debugging will cause the process to enter this state.

The status of the process needs to be adjusted, preferably using the Set_task_state (task, state) function, which, when necessary, sets the memory barrier to force the other processor to reorder (SMP).

The transformation between the various states of a process forms the entire life cycle of the process, from http://www.cnblogs.com/wang_yb/archive/2012/08/20/2647912.html.

Creation of processes

in a Linux system, all processes are descendants of the init process with PID 1. The kernel initiates the INIT process at the last stage of system startup. The process reads the initialization script (Initscript) of the system and executes other related programs, eventually completing the entire process of system startup.

Linux provides two functions to process the creation and execution of processes: fork () and exec (). First, fork () creates a child process by copying the current process. The difference between a child process and a parent process is only the PID (unique per process), the PPID (PID of the parent process), and some resources and statistics (for example, a pending signal). The exec () function is responsible for reading the executable file and loading it into the address space to start running.

Fork () is implemented using the write-time copy (Copy-on-write) page. The kernel does not replicate the entire process address space during the fork process, allowing the parent process and child processes to share the same copy, and when writing is required, the data is copied so that the processes have their own copies. When the page is not written at all (exec () immediately after the fork ()), the actual cost of the fork is only to copy the page table of the parent process and create a unique task_struct for the child process.

The fork () function that creates the process actually ultimately calls the Clone () function. The steps to create threads and processes are just the same as the arguments that eventually pass to the clone () function. For example, create a process with a normal fork, equivalent to: Clone (SIGCHLD, 0); Create a process that shares the address space, file system resources, file descriptors, and signal handlers with the parent process, which is a thread: Clone (CLONE_VM | Clone_fs | Clone_files | Clone_sighand, 0).

The main difference between kernel threads created in the kernel and normal processes is that kernel threads do not have a separate address space and they can only run in kernel space.

The difference between fork and vfork

Fork () and Vfock () are all creating a process, so what's the difference? The summary has the following three points difference:
1. Fork (): The child process copies the data segment of the parent process, the code snippet
Vfork (): Child process shared data segment with parent process
2. The execution order of the fork () parent-child process is indeterminate
Vfork guarantees that the child process runs first, before calling exec or exit, and the parent process data is shared before it calls the exec
Or exit, the parent process may be scheduled to run.
3. Vfork () guarantees that the child process runs first, and the parent process may be scheduled to run after she calls exec or exit. If the
The child process relies on further actions of the parent process before invoking these two functions, which results in a deadlock.

Process termination

The process ends at the end of the run, or receives a signal that it can neither handle nor ignore, or when it is abnormal, it will be terminated. At this point, rely on Do_exit () (in the Kernel/exit.c file) to release all resources associated with the process (assuming the process is the only consumer of those resources). At this point, all the resources associated with the process have been freed. The process is not operational (there is actually no address space for it to run) and is in the Task_zombie state. All of the resources it occupies are kernel stacks, thread_info, and Task_struct. The only reason the process exists at this point is to want its parent process to provide information. After the parent process obtains information about the child process that has been terminated, or notifies the kernel that it does not care about the information, the remaining memory such as Task_struct held by the child process is freed.

Orphan process issues

If the parent process exits before the child process, there must be a mechanism to ensure that the child process can find a new parent class, otherwise these orphaned processes will always be in a zombie state at the time of exiting, consuming memory in vain. The workaround is to have the child process find a thread within the current thread group as the father, and if not, let Init do their parent process.

Process scheduling what is scheduling

Now the operating system is multi-tasking, in order to enable more tasks to better run on the system at the same time, need a management program to manage the computer on the simultaneous running of the various tasks (that is, the process).

This management program is the scheduler, its function is simple to say:

1. Decide which processes to run and which processes to wait for

2. Determine how long each process runs

In addition, in order to achieve a better user experience, the running process can be interrupted immediately by other more urgent processes. In short, scheduling is a balanced process. On the one hand, it is to ensure that each running process can maximize the use of the CPU (that is, as few switching processes, process switching too much, the CPU time will be wasted on switching), on the other hand, to ensure that the process can be fair use of the CPU (that is, to prevent a process of exclusive CPU for a long time).

Policy I/O consumption and processor consumption-type processes

I/O consumption process: Most of the time is used to submit I/O requests or wait for I/O requests, often in a running state, but running for a short time, waiting for the request process to be in a blocking state. such as interactive programs.

Processor-consuming process: Most of the time is spent executing code, unless it is preempted or kept running.

The scheduling strategy seeks to strike a balance between rapid process responsiveness (short response times) and maximum system utilization (high throughput).

Linux in order to ensure the interactive application, so the process of the corresponding optimization, more inclined to prioritize the I/O consumption process.

Process priority

The most basic class of scheduling algorithm is priority-based scheduling. This is an idea of the process rating based on the value of the process and its need for processor time. High-priority processes run first, low-running, and processes with the same priority are scheduled on a rotational basis.

Linux based on the above ideas to achieve a dynamic priority-based scheduling method. Initially, the method sets the basic priority, but it allows scheduling to increase or decrease the priority as needed. For example, if a process spends more time on I/O waiting than its run time, the process is obviously an I/O consumption type, and its priority is dynamically increased. Conversely, the priority of processor-consuming processes is dynamically degraded.

The Linux kernel provides two sets of independent priority ranges. The first is the nice value, ranging from 20 to +19, and the default value is 0. The greater the nice value, the lower the priority. The second is a real-time priority, with a configurable value ranging from 0 to 99, and any real-time process has a higher priority than a normal process.

Time slices

A time slice is a numeric value that indicates how long a process can continue to run before it is preempted, I/O consumption does not require a long time slice, and the processor-consuming process expects the longer the better. The size of the time slice setting is not simple, set large, the system response is slow (long schedule), set small, process frequent switching brought about by the processor consumption.

The Linux scheduler increases the priority of the interaction programs, allowing them to run more frequently. As a result, the scheduler provides a relatively long default time slice to the interactive program. In addition, the Linux scheduler can dynamically adjust the time slices allocated to it based on the priority of the process. This ensures high-priority processes, which are assumed to be of high importance, with high frequency and long execution times. By implementing a mechanism to dynamically adjust the priority and time slice length, Linux scheduling performance is not only stable but also robust.

Note that the process does not have to use all of its time slices at once, such as a process with a 100 millisecond time slice, which can be repeated 5 times for 20 milliseconds at a time.

When a process runs out of time, it is considered to be expired. Processes that do not have time slices are no longer operational until all other processes have exhausted their time slices. At that point, the time slices of all the processes will be recalculated.

Process preemption

Linux is a preemptive type. When a process enters the task_running state, the kernel checks to see if it has precedence over the currently executing process. If so, the scheduler is awakened, seizing the currently running process and running a new, running process. Also, when a process's time slice becomes 0 o'clock, it is preempted and the scheduler is awakened to select a new process.

Scheduling algorithm Executable queue

The most basic data structure-type run queue (runqueue) in the scheduler. The executable queue is a linked list of executable processes on a given processor, one per processor. Each operational process uniquely belongs to an executable queue. In addition, the executable queue also contains scheduling information for each processor. Therefore, the executable queue is also the most important data structure for each processor.

To avoid deadlocks, the code that locks multiple running queues must always obtain these locks in the same order: the Order of the executable queue addresses from low to high.

Priority Series Group

Each run queue has two priority series groups, one active and one expired. The Priority series group is a data structure that can provide an O (1) level algorithm complexity. Priority groups enable each priority of a runnable processor to contain a corresponding queue that contains the list of executable processes on the corresponding priority. Priority groups also have a priority bitmap that can help improve efficiency when it is necessary to find the executable process with the highest priority within the current system.

Recalculate time slices

Many operating systems use a display method to calculate time slices when the time slices of all processes are exhausted. A typical implementation is to iterate through each process, which can take a considerable amount of time, with the worst case being O (N), which must be secured in the form of a lock to protect the task queue and each process descriptor, which exacerbates the contention for the lock, and the actual uncertainty of the recalculation time.

Processes on the executable queue in the active array also have time slices remaining, and the expired array is depleted of time slices. When the time slice of a process is exhausted, it is moved to an expired array, but before that time slice has been recalculated for it. The recalculation of the time slice becomes very simple, as long as it is switched back and forth between the active and expired arrays, which is the core of the O (1) Level scheduler.

Schedule ()

Selecting the next process and switching to it is done through the schedule () function. When the kernel code wants to hibernate, the function is called directly, and if any process is preempted, then the function is invoked. The schedule () function runs independently of each processor.

First, the first set of bits is found in the active priority group, which is the highest-priority executable process. The scheduler then selects a process in this level list. This is the highest-priority executable program in the system. If the selected process is not the current process, a context switch is made.

Calculate priority and time slices

The nice value is named as a static priority because it cannot be changed since it was specified by the user at the beginning. The dynamic priority is calculated from a function relationship between static precedence and process interactivity. The Effective_prio () function can return the dynamic priority of a process. This function takes a nice value as a base, plus a reward or penalty for process interactivity between 5 and +5.

How to obtain an accurate reflection of whether the process is I/O-consumption or processor-consumed, with some inference. The most obvious criterion is the length of time the process sleeps. If a process is dormant for most of its time, it is I/O-consumable. If a process executes longer than the time it sleeps, it is processor-intensive.

On the other hand, recalculating the time slice is relatively straightforward. It can be as long as it is based on a static priority. When a process is created, the new child process and the parent process divide the remaining process time slices of the parent process. Such allocations are fair and prevent users from constantly acquiring time slices by creating new processes. The Task_timeslice () function returns a new time slice for a given task. The calculation of the time slice only needs to scale the priority proportionally, so that it conforms to the value range of the time slice. The higher the static priority of a process, the longer it will take each time it executes.

The scheduler also provides another mechanism to support interactive processes: If a process is highly interactive, it will be placed into an active array instead of an expired array when the time slice is exhausted.

Sleep and wake up

Dormant ( blocked ) processes are in a special non-executable state. The process marks itself as dormant, moves itself out of the executable queue, puts it into the waiting queue, and then calls Schedule () to Select and execute a different process. The process of awakening is the opposite : The process is set to executable state and then moved from the wait queue to the executable queue.

There are two related process states for hibernation : Task_interruptible and the task_uninterruptible . Hibernation is processed by waiting for a queue. The wait queue is a simple list of processes that wait for certain events to occur. The kernel uses wake_queue_head_t to represent the waiting queue. The wait queue can be created statically by Declare_waitqueue () or dynamically by Init_waitqueue_head () . The wake-up operation is performed through the function wake_up () , which wakes up all the processes on the specified wait queue.

Load Balancing

The Linux scheduler prepares separate executable queues and locks for each processor that is stacked into a multi-processing system. A load balancing program is provided for load balancing on each executable queue. If it finds an imbalance, it will pump the process from the busy queue to the current self-contained queue.

The load Balancing program is implemented with the function load_balance () in KERNEL/SCHED.C. It has two methods of invocation. When schedule () executes, it is called as long as the current executable queue is empty. In addition, it is called by the timer: The system is called every 1 milliseconds when it is idle, or in other cases every 200 milliseconds. The load balancer call requires that the executable queue of the current processor be locked and shielded from interruption to prevent the executable queue from being accessed concurrently.

Preemption and Context switching

context switching, that is, switching from one executable process to another executable process. The process switch schedule function calls the context_switch () function to complete the following work:

1. call SWITCH_MM () defined in <asm/mmu_context.h>, which is responsible for switching virtual memory from the previous process map to the new process.

2. Call switch_to () defined in <asm/system.h>, which is responsible for switching from the processor state of the previous process to the processor state of the new process. This includes saving, resuming stack information, and register information.

There are a number of scenarios in which the schedule function calls are seen before, and it is not good to rely entirely on the user to invoke them. The kernel needs to determine when to call schedule, and the kernel provides a need_resched flag to indicate whether the schedule needs to be re-executed:

1 when a process runs out of its time, Scheduler_tick () sets this flag;

2 This flag is also set by TRY_TO_WAKE_UP () when a high-priority process enters the executable state.

Each process contains a need_resched flag, because accessing a value in the process descriptor is faster than accessing a global variable

User preemption

When the kernel is about to return to user space, if the need_resched flag is set, it causes the schedule function to be called, at which time the user preemption occurs.

User preemption occurs when the following conditions occur:

1. Return to user space from system transfer.

2. Returns the user space from the interrupt handler.

Kernel preemption

As long as the rescheduling is secure, the kernel can preempt the tasks being performed at any time.

When is it safe to re-dispatch? As long as the lock is not held, the kernel can be preempted.

The lock is a non-preemptive zone flag. Because the kernel is SMP-capable, if you do not have a lock, the code you are executing is re-imported, which is what you can preempt.

Kernel preemption can occur in:

1. When the interrupt handler is executing and before the kernel space is returned.

2. When the kernel code once again has the preemption.

3. If the task in the kernel explicitly calls schedule ().

4. If the task in the kernel is blocked (this also causes the call to schedule ()).

Reference

Http://www.cnblogs.com/pennant/archive/2012/12/17/2818922.html

Http://www.cnblogs.com/wang_yb/archive/2012/09/04/2670564.html

http://blog.csdn.net/cxf100900/article/details/5775252

Linux kernel design and implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.