To manage processes, the kernel must clearly describe what each process does. For example, the kernel must know the priority of a process, whether it is running on the CPU or blocked due to some events, the address space allocated to it, and the file to which it is allowed to access.
These are the roles of process descriptors-process descriptors are all task_struct data structures, and their fields contain all information related to a process. Because the process descriptor stores so much information, it is quite complicated .... But don't be afraid. We will solve the main contradictions, and the other parts will be able to solve them without attacking them. It shows the main contradictions of Linux Process descriptors:
This blog will focus on the basic information of processes, the status of processes, and the relationship between processes. We will discuss other issues in future blog posts.
1. Basic Process Information
1.1 Identify a process-PID
Each process must have its own process descriptor. Therefore, even lightweight processes that share most of the kernel's data structures (as mentioned later) have their own task_struct structure. There is a very strict one-to-one correspondence between processes and process descriptors, so we can easily use the 32-bit process descriptor address to identify the process. The process descriptor pointer (task_struct *) points to these addresses. The kernel references most of the processes through the process descriptor pointer.
On the other hand, the Unix-like compaction system allows you to identify a process by using a number of process identifiers processid (PID). The PID is stored in the PID field of task_struct. PID is sequentially numbered. The PID of the newly created process is usually the PID of the previous process plus 1. However, the PID value has an upper limit. When the kernel uses the PID to reach this peak, the idle small PID number must be used cyclically. By default, the maximum PID Number is 32767.
The system administrator can reduce the upper limit of the PID by writing a smaller value to the/proc/sys/kernel/pid_max file, so that the upper limit of the PID is less than 32767. In a 64-bit architecture, the system administrator can extend the PID limit to 4194304.
Because the PID Number is used cyclically, the kernel must manage a pidmap_array bitmap to indicate the allocated PID and idle PID Number. Because a page box contains 32768 characters (4*1024*8), The pidmap_array bitmap is stored in a separate page in the 32-bit architecture. The system will keep saving these pages without releasing them.
I would like to tell you the knowledge that Linux only supports lightweight processes and does not support threads. But to make up for such defects, Linux introduces the concept of thread groups. All threads in a thread group use the same PID as the lead thread in the thread group, that is, the pid of the first Lightweight Process in the group, which is stored in the tgid field of the Process descriptor. The getpid () system call returns the tgid value of the current process instead of the PID value. Therefore, all threads of A multithreaded application share the same PID. Most processes belong to one thread group, while the tgid of the thread group's leading thread is the same as the PID value, so getpid () system calls have the same effect on such processes as general processes.
Therefore, we come to an important conclusion that Linux does not support threads, but it has all the features of the operating system that supports threads. The concept of lightweight processes will be discussed in detail later.
1.2 process descriptor Positioning
A process is a dynamic entity and its lifecycle ranges from several milliseconds to several months. Therefore, the kernel must be able to process many processes at the same time and store the corresponding process descriptor in the dynamic memory, rather than permanently allocated to the memory area of the kernel (linear address above 3G ).
So how can we find the dynamically allocated process descriptor? We need to design a thread_union block for each process in the memory area of a linear address above 3G. For each process, we need to allocate two pages, namely 8192 bytes. In Linux, two different data structures are compact stored in a separate storage area for processes: one is a kernel-state process stack, and the other is thread_info, which is next to the small data structure of the Process descriptor, it is called a thread descriptor.
Considering the efficiency, the kernel occupies the 8 K space for two consecutive page boxes and sets the starting address of the first page box to a multiple of 2 ^ 13. When there is almost no available dynamic memory space, it will be difficult to find such two consecutive page boxes, because there may be a large number of fragments in the free space (Note: Here is the physical space, see the "partner system algorithm" blog ). Therefore, in the 80x86 architecture, you can set it during compilation so that the kernel stack and thread descriptor span a separate page box (because of the primary single-page fragmentation ). In the blog post "segment in Linux", we already know that kernel-State processes access the stack in the kernel data segment, that is, the purpose of designing such a stack for each process in Linux memory space above 3G is different from the stack used by user-State processes. Because the kernel control PATH uses few stacks, it only requires thousands of bytes of kernel state stacks. Therefore, 8 KB is sufficient for the stack and thread_info. However, if only one page box is used to store the two structures, the kernel uses some additional stacks to prevent overflow caused by interruptions and deep nesting of exceptions (see the "Exception Handling" blog ).
Shows how to store two data structures in the 2-page (8 KB) memory area. The thread descriptor resides at the beginning of the memory zone, and the stack grows from the end to the end. The figure also shows how the task field is associated with the task_struct structure.
Struct thread_info {
Struct task_struct * task;/* Main Task Structure */
Struct exec_domain * exec_domain;/* execution domain */
Unsigned long flags;/* low level flags */
Unsigned long status;/* thread-synchronous flags */
_ U32 CPU;/* Current CPU */
_ S32 preempt_count;/* 0 => preemptable, <0 => bug */
Mm_segment_t addr_limit;/* thread address space:
0-0xbfffff for user-thead
0-0xffffffff for Kernel-Thread
*/
Struct restart_block;
Unsigned long previus_esp;/* esp of the previous stack in case
Of nested (IRQ) stacks
*/
_ U8 supervisor_stack [0];
};
ESP is the CPU Stack pointer register used to store the address of the top unit of the stack. In the 80x86 system, the stack starts from the end and increases towards the starting direction of the memory zone. After switching from the user State to the kernel state, the kernel stack of the process is always empty. Therefore, the ESP register points to the top of the stack.
Once the data is written to the stack, the value of ESP decreases. Note that the data here refers to the kernel data, which is rarely used. Therefore, the kernel stack is empty in most cases. Because the thread_info structure is the length of 52 bytes, the kernel stack can be expanded to 8140 bytes. The C language uses the following joint structure to easily represent the thread Descriptor and kernel stack of a process:
Union thread_union {
Struct thread_info;
Unsigned long stack [2048];/* 1024 for 4kb stacks */
};
The kernel uses alloc_thread_info and free_thread_info macros to allocate and release the thread_info structure and the memory zone of the kernel stack.
1.3 identify the current process
From the perspective of efficiency, the close combination of the thread_info structure and the kernel state stack provided the following benefits: the kernel can easily obtain the address of the thread_info structure of the processes currently running on the CPU from the ESP register value. In fact, if the length of thread_union is 8 K (213 bytes), the kernel can obtain the base address of the thread_info structure by shielding the low 13-bit valid bit of ESP; if the length of thread_union is 4 K, the kernel needs to hide the 12-bit effective bit of ESP. This is done by the current_thread_info () function, which generates the following Assembly commands:
Movl $0xffffe000, % ECx/* or 0xfffff000 for 4kb stacks */
Andl % ESP, % ECx
Movl % ECx, P
After these three commands are executed, p is the pointer to the thread_info structure of the current process running on the CPU where the commands are executed. However, the most common process is the address of the Process descriptor, rather than the address of the thread_info structure. To obtain the descriptor pointer of the current process running on the CPU, the kernel calls the current macro, which is essentially equivalent to current_thread_info ()-> task and generates the following Assembly command:
Movl $0xffffe000, % ECx/* or 0xfffff000 for 4kb stacks */
Andl % ESP, % ECx
Movl (% ECx), P
Because the offset of the task field in the thread_info structure is 0, after these three commands are executed, p is the descriptor pointer of the processes running on the CPU.
The current macro is often prefixed as a process descriptor field in the kernel code. For example, the current-> PID returns the PID of the process on which the CPU is being executed.
2. Process status
2.1 process linked list
The Linux kernel links the process chain list to the descriptors of all processes. Each task_struct structure contains a tasks field of the list_head type. The Prev and next fields of this type point to the task_struct element in the front and back, respectively.
The header of the process linked list is the init_task descriptor, which is the process descriptor of the so-called 0 process or Swapper process. The tasks. Prev field of init_task points to the tasks field of the last inserted process descriptor in the linked list.
The set_links and remove_links macros are used to insert and delete a process descriptor from the process linked list. These macros take into account the parent-child relationship between processes.
In addition, a useful macro is for_each_process. Its function is to scan the chain table of the entire process. Its definition is as follows:
# Define for_each_process (P )/
For (P = & init_task; (P = list_entry (P)-> tasks. Next ,/
Struct task_struct, tasks )/
)! = & Init_task ;)
2.2 state field
The State field in the task_struct structure of the process descriptor describes the current state of the process. It consists of a group of tags, each of which describes a possible process status. In the current Linux version, these statuses are mutually exclusive. Therefore, strictly speaking, only one status can be set, and other flags will be cleared. The following is a possible status:
Runable status (task_running)
The process is either executed on the CPU or ready to be executed.
Interrupted Wait Status (task_interruptible)
The process is suspended (sleep) until a condition changes to true. A hardware interruption, releasing system resources waiting for a process, or passing a signal are all conditions for awakening the process (putting the process status back to task_running ).
Uninterruptible waiting status (task_uninterruptible)
It is similar to the pendable wait state, but with one exception, the state of a sleep process cannot be changed when a signal is sent to it. This state is rarely used, but in some specific conditions (the process must wait until a non-interrupted current event occurs), this state is very useful. For example, when a process opens a device file, its corresponding device driver starts to detect the corresponding hardware device. Before the probe is completed, the device driver cannot be interrupted. Otherwise, the hardware device is in an unpredictable state.
Paused (task_stopped)
The execution of the process is paused. When a process receives sigstop, sigtstp, sigttin, or sigttou signals, it is paused.
Trace status (task_traced)
The execution of the process has been suspended by the debugger program. When a process is monitored by another process (for example, when a debugger executes the ptrace () System Call to monitor a Test Program), any signal can place the process in the task_traced state.
There are two other process statuses that can be stored in the state field of the Process descriptor or in the exit_state field. From the names of these two fields, we can see that only when the execution of a process is terminated will the state of the process become one of the two:
Dead state (exit_zombie)
The execution of the process is terminated, but the parent process has not released the wait4 () or waitpid () system call to return information about the dead process. Before calling a wait () system, the kernel cannot discard the data contained in the dead process descriptor, because the parent process may need it again.
Dead undo status (exit_dead)
Final state: because the parent process has just issued a wait4 () or waitpid () system call, the process is deleted by the system. To prevent other execution threads from executing wait () class system calls on the same process (this is also a competitive condition), the process state is frozen (exit_zombie) the status changes to the dead undo status (exit_dead)
The value of the state field is usually set by a simple value assignment statement, for example:
P-> state = task_running;
The kernel also uses the set_task_state and set_current_state macros: They respectively set the status of the specified process and the status of the currently executed process. In addition, these macros ensure that the compiler or CPU control unit does not mix the assignment operation with other commands. The order of mixed commands sometimes results in disastrous consequences.
2.3 task_running process linked list
When the kernel finds a new process running on the CPU, it must only consider the processes that can run (that is, the processes in the task_running State ).
In earlier Linux versions, all processes that can run are stored in the same linked list called runqueue. Because of the overhead of maintaining the process priority sorting in the linked list, early scheduling programs had to scan the entire queue to select the "best" process to run.
The running queues in Linux 2.6 are different. The purpose is to allow the scheduler to select the "best" runable queue within a fixed period of time, regardless of the number of processes that can run in the process. We only provide some basic information here. This new running queue will be described in the process scheduling blog.
The key to improving the running speed of the scheduler is to create multiple lists of processes that can run. Each process has a different list priority. Each task_struct descriptor contains a list_head type field run_list. If the priority of a process is K (its value ranges from 0 to 139), the run_list field links the priority of the process to the list of running processes with the priority of K. In addition, in a multi-processor system, each CPU has its own running queue, that is, its own process linked list set. This is a typical example of improving performance by making the data structure more complex: The scheduling program is indeed more efficient, but the linked list of running queues is split into 140 different queues!
The kernel must store a large amount of data for each running queue in the system. However, the main data structure of the running queue is the chain table of process descriptors of the running queue, all these linked lists are implemented by a separate prio_array_t data structure.
The enqueue_task (p, array) function inserts the process Descriptor (p parameter) into a linked list of a running Queue (array parameter based on the prio_array_t structure). The Code is essentially equivalent to the following code:
List_add_tail (& P-> run_list, & array-> queue [p-> PRIO]);
_ Set_bit (p-> Prio, array-> Bitmap );
Array-> nr_active ++;
P-> array = array;
The PRIO field of the Process descriptor stores the dynamic priority of the process, and the array field is a pointer pointing to the proo_array_t data structure of the current running queue. Similarly, the dequeue_task (p, array) function deletes a process descriptor from the linked list of the running queue.
3. Relationship between processes
3.1 parent-child relationship
The process created by the Program has a parent/child relationship. If a process creates multiple sub-processes, the sub-processes have a sibling relationship. Process 0 and process 1 are created by the kernel; process 1 (init) is the ancestor of all processes.
Several fields are introduced in the Process descriptor to represent these relationships. We assume that the process with the task_struct structure is called P:
Real_parent -- points to the descriptor of the created P process. If the parent process of process p does not exist, it points to the descriptor of process 1 (therefore, if the user runs a background process and exits shell, the background process will become the init sub-process ).
Parent -- points to the current parent process of P (the child process of this process must send a signal to the parent process when it is terminated ). Its value is usually the same as that of reak_parent, but occasionally it can be different. For example, when another process sends a ptrace System Call request for monitoring p.
Children -- the head of the linked list. All elements in the linked list are subprocesses created by P.
Sibling -- pointer to the next element in the sibling process linked list or the previous element. The parent process of these sibling processes is the same as that of P.
Displays the kinship between processes. Process P0 creates P1, P2, P3, and process P3 creates P4.
3.2 Other relationships
In addition, there are other relationships between processes: A process may be the lead process of a process group, a login session, or a thread group, he may also track the execution of other processes. The following lists some fields in the process descriptor. These fields establish the relationship between process P and other processes:
Group_leader -- descriptor pointer of the lead process in the process group where p is located
Signal-> pgrp -- PID of the lead process in the process group where p is located
Tgid -- PID of the lead process in the thread group where p is located
Signal-> session -- PID of the logon session leader PROCESS OF P
Ptrace_children -- the head of the linked list, which contains all the sub-processes of P tracked by the debugger Program
Ptrace_list -- point to the first and next elements of the linked list of the actual parent process of the tracked process (used when P is tracked)
3.3 PID positioning task_struct
Then, the kernel must be able to export the corresponding process descriptor pointer from the process PID. For example, this occurs when a service is provided for a kill () System Call: When process P1 wants to send a signal to another process P2, P1 calls the kill () System Call, the kernel exports the corresponding process descriptor from the PID, and then retrieves the data structure pointer that records the pending signal from the task_struct.
So how can we get this task_struct? First think of for_each_process (P ). No. Although it is feasible to scan the process linked list sequentially and check the PID field of the Process descriptor, it is quite inefficient. To accelerate search, the Linux kernel introduces four hashes. Four HASH lists are required because the process descriptor contains fields indicating different types of PID, and each type of PID requires its own hash list:
Pid of the pidtype_pid Process
Pidtype_tgid tgid PID of the lead process in the thread group
Pidtype_pgid PID of the lead process in the pgrp Process Group
Pidtype_sid session-led PID
During kernel initialization, the system dynamically allocates space for the four HASH lists and stores their addresses in the pid_hash array. The length of a hash depends on the available Ram capacity. For example, if a system has mb ram, each hash is displayed in four pagination boxes, you can have 2048 table items.
Use the pid_hashfn macro to convert the PID into a table index:
# Define pid_hashfn (x) hash_long (unsigned long) x, pidhash_shift)
The variable pidhash_shift is used to store the table index length (the length in bits, here we are 11 bits ). Many hash functions use hash_long (), which is basically equivalent:
Unsigned long hash_long (unsigned long Val, unsigned int bits)
{
Unsigned long hash = Val * 0x9e370001ul;
Return hash> (32-bits );
}
Because pidhash_shift here is equal to 11, the value range of pid_hashfn is 0 to 2 ^ 11-1 = 2047.
As stated in the basic course of computer science, hash functions do not always ensure one-to-one correspondence between the PID and the table index. Two different PIDs are hashed to the same table index, which is called colliding ). Linux uses a linked list to handle conflicting PIDs: Each table item is a bidirectional cyclic linked list consisting of conflicting process descriptors. We will not discuss the specific method. If you are interested, you can read the blog about the kernel preliminary technical preparation topic. Here, we will only use our old method for the relevant data structure, the image is displayed as follows: