Process management for the Linux kernel

Source: Internet
Author: User
Tags message queue terminates

Introduction:


In the five major components of the Linux kernel, the Process Management module is a very important part of it, although it is not as complex as memory management, virtual file systems and other modules, and not as coherent as the interprocess communication module, but as one of the five core modules, process management for us to understand the operation of the kernel, is very important for our future programming. At the same time, as the core module in the five major modules, it is connected with the other four modules. The following is an introduction to the process module to write, first of all to understand the process and its related concepts. Second, the process of creating, switching, revocation and other basic operations. In addition, the Linux kernel is also how to manage the process of scheduling.


I. Process and its related concepts


process : A process can be understood as an instance of a program execution that includes executable programs and related system resources such as open files, pending signals, kernel internal data, processor state, memory address space, and data segments that contain global variables. From the kernel point of view, a process can also be called a task.

process descriptors: There are so many things related to the process, such as the state of the process, the priority of the process, the address space of the process, the files that are allowed to access the process, and so on, the Linux kernel specifically designed a struct of type task_struct, called the process descriptor. The process descriptor contains all the information about the kernel management process, so that you can know all the information about a process as long as you get the process descriptor of a process.

process state: Process Descriptor task_struct There is a state field in the struct body that represents the current status of the process. From the process's creation to the deletion of the process, it can go through 5 different states, namely the operational state, the interruptible wait state, the non-interruptible wait state, the paused state, and the tracking state. In addition, when a process is terminated, it can also become zombie, zombie undo state. The kernel can use the macro set_current_state (state) to set the status of the current process and set the state of a process with set_task_state (task,state).

Process Identifier: process description The PID field in the TASK_STRUCT structure can identify a process that uniquely identifies it, and is called the process ID pid. When a new process is created, the PID is assigned to the new process from small to large in order. The kernel represents the currently assigned PID and the idle PID number by managing a Pidmap_array bitmap. Note: In a multithreaded group, all threads share the same PID. In addition to the process identifier, the kernel accesses most of the process through the process descriptor pointer.

process relationships: relationships between processes are related to kinship and non-kinship. Kinship includes parent-child relationships and sibling relationships. This is described by fields such as parent/children/real_parent/sibling in the tast_struct structure. In addition to kinship, there are other relationships, for example, a process is the lead process for a process group or login session, possibly a lead process for a thread group, which is described by fields such as GROUP_LEADER/TGID/SIGNAL->PGRP.

Process resources: to prevent the process from using system resources excessively, the kernel limits the amount of resources that each process uses. This includes the maximum number of process address space, the maximum time that the process uses the CPU, the maximum value of the heap, the maximum size of the file, the maximum number of file locks, the maximum number of bytes in the message queue, the maximum open file descriptor, and the maximum number of page frames owned by the process.


ii. creation, switching and revocation of processes

process creation: when programming in Linux, the fork () function is generally used to create a new process, which is, of course, a function in user space that invokes the clone () system call in the kernel and continues to invoke Do_fork () by the Clone () function. Complete the process creation.

In a traditional UNIX system, the child process that is created replicates the resources owned by the parent process, which is inefficient because the child process needs to copy the entire address space of the parent process. However, the child process almost does not have to read or modify all the resources owned by the parent process, because in many cases the child process will immediately invoke the exec () family of functions and clear the address space that the parent process has carefully copied over. The modern UNIX system solves this problem in three ways: 1. Realistic copy technology allows the parent-child process to read the same physical page. 2. Lightweight processes allow parent-child processes to share many of the data structures in the kernel for each process. 3, Vfork () the process created by the system call can share the memory address space of the parent process, in order to prevent the parent process from overriding the data required by the child process, blocking the execution of the parent process until the child process exits or executes a new program. the entire process creation process may involve the following functions:

Fork ()/vfork ()/_clone----------->clone ()--------->do_fork ()---------->copy_process ()

After the above creation process, there is a complete sub-process in the operational state, the new child process with PID, process descriptor and other data structures, in order to actually run it, but also need the dispatcher to the new child process.


In addition to the process, there is the concept of kernel threads (created with Kernet_thread). In Linux, kernel threads differ from the normal process in the following two ways:

1, kernel threads only run in the kernel state, while the normal process can run in the kernel state, but also run in the user state.

2, because the kernel thread only runs in the kernel state, it only uses a linear address space greater than page_offset. On the other hand, the normal process can use 4GB linear address space regardless of the user state or the kernel state.


undo Process: After the process terminates, the kernel needs to be notified so that the kernel frees the resources owned by the process, including memory, open files, and other resources, such as semaphores. The general way to terminate a process is to call the exit () library function, which frees the resources allocated by the C function library, executes each function registered by the programmer, and ends the system call from the system recycling process.

In addition to the process terminating itself, the kernel can selectively force the entire group of threads to die. This occurs when the process receives a signal that cannot be processed or ignored, or when the kernel is generating an unrecoverable CPU exception while the core is running on behalf of the process.

There are two system calls to terminate the user-state application: The Exit_group () system call, which terminates the entire thread group, the entire multithreaded-based application. Do_group_exit () is the primary kernel function that implements this system call. Exit () system call, which terminates a thread regardless of all other processes in the thread group that the thread belongs to. Do_exit () is the primary kernel function that implements this system call.


Process switching: process Switching is also known as task switching, context switching. It is a behavior that, in order to control the execution of a process, the kernel suspends the process currently running on the CPU and resumes execution of a previously suspended process.

Similar to a function call, when a process switches, it is generally necessary to load the process context on the CPU to execute the process. The hardware context of a process refers to a subset of executable program contexts, which is a set of data that is loaded before the process resumes execution. Some of these are placed in the TSS segment, which is the task status segment , and the remainder is stored in the kernel-state stack. Process switching occurs only in the kernel state, and all the register content used by the user-state process is saved on the kernel-state stack before the process switch is performed.

There are two ways to switch processes, one is hardware switching and the other is software switching . Software switching is the use of a program to step through the switch, its advantage is that the switch fashion data can be checked for legitimacy, although the execution time is roughly the same as the hardware switch, but there is still a place to improve.


Process switching is done using the schedule () function, in essence, each process switch consists of two parts: 1. Switch the page Global directory to install a new address space. 2. Switch the kernel-state stack and hardware context, because the hardware context provides all the information that the kernel needs to perform the new process, including the CPU registers, mainly the Switch_to function.


Third, process scheduling


scheduling policy: The scheduling strategy is a set of rules that determine when and how to choose a new process to run the rules. Linux scheduling is based on time-sharing technology: Multiple processes run in a "multiplexed" manner, because the CPU's time is divided into "slices", assigning one piece to each of the running processes. Scheduling policies also classify them according to the priority of the process. in Linux, the priority of a process is dynamic. The scheduler tracks what the process is doing and periodically adjusts their priority. Depending on the classification criteria, the process can be divided into different types. For example, you can think of a process as "I/O constrained" or "CPU constrained." The process area can also be divided into the following three categories: interactive process, batch process, real-time process. The Linux process is preemptive, whether it is in a kernel state or a user-state. The length of time slices is critical to system performance: it can be neither too long nor too short. If the average time slice is too short, the overhead of the system due to process switching becomes very high. If the average time slice is too long, the process does not look concurrent. The choice of time slice size is always a compromise. Linux uses a single-experience approach, which is to select a time slice that is as long as possible while maintaining good response time.


scheduling algorithm: early Linux, the scheduling algorithm is based on the priority of the process to select the "best" process to execute, its disadvantage is that the time overhead and "the number of running processes". In modern Linux, scheduling algorithms can select which processes to run within a fixed time (regardless of the number of running processes). First, we must know that processes can be divided into real-time processes and ordinary processes. Each Linux process is always scheduled according to the following schedule type: FIFO real-time process, the realtime process of temporal slice rotation, and the ordinary timeshare process. Scheduling algorithms vary greatly depending on whether the process is a normal process or a real-time process.

normal process scheduling : Each normal process has its own static priority (the value is from 100 to 139), and the scheduler uses static precedence to estimate the degree of dispatch between this process and other ordinary processes in the system. The static priority determines the basic time slice of the process, that is, the time slice length that the system allocates to the process before the process has run out of time. The normal process has a dynamic priority in addition to the static priority. The dynamic priority is the number that the scheduler uses when it chooses a new process to run. The average sleep time is the average number of nanoseconds that the process consumes in the sleep state. Even a normal process with a higher static priority gets a larger CPU time slice, and should not cause a process with a lower static priority to run. To avoid this problem, the concept of the activity process and the expiration process is proposed, and the activity process refers to the time slice of the process is not exhausted, and the expiration process refers to the time slice of the process to run out, even if the expiration process has higher priority, and cannot continue until all active processes expire.

scheduling of real-time processes : each real-time process is associated with a trial priority, and real-time prioritization is a value from 1 to 99. Unlike normal processes, real-time processes are always treated as active processes.


main data structures used by the Scheduler: Data structure Runqueue and process descriptors

data Structure Runqueue: The most important field in the Runqueue data structure is the field related to the linked list of the running process. The Arrays field is two sets of active and expired processes, the active field is a pointer to the list of active processes, and the expired field is a pointer to the list of expired processes.

process descriptors : Each process descriptor includes several fields that are related to the schedule. The Time_slice field is the number of clock ticks remaining in the process's time slice. It is set by the Copy_process function: The remaining beats of the parent process are divided into two equal parts, one for the parent process and one for the child process.


Iv. functions used by the Scheduler


The scheduler relies on several functions to complete the dispatch, the most important of which are as follows:

The try_to_wake_up () function wakes up a process that sleeps or stops by setting the process state to task_running and inserting the process into the running queue of the local CPU.

The Recalc_task_prio () function updates the average sleep time and dynamic priority of the process.

Schedule () regrets the implementation of the scheduler, its task is to find a process from the list of running queues, and then assign the CPU to the process. Schedule () can be called by several kernel-controlled paths and can be called directly or deferred


Summary:

The process Management module in the Linux kernel is very important, it is an important bridge to connect the other 4 large modules, it is also very complex, understand some of its basic principles, it is very important to understand the Linux kernel, just a few simple descriptions of it, and did not go into the specific implementation details, Hopefully, there is a chance to drill down into the details of the implementation.

Process management for the Linux kernel

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.