Process ID of the Linux kernel process management
The Linux kernel uses a TASK_STRUCT data structure to correlate all process-related data and structures, and all the algorithms that involve processes and programs in the Linux kernel are built around that data structure and are one of the most important data structures in the kernel. This data structure is defined in the kernel file include/linux/sched.h, in the Linux 3.8 kernel, the data structure is a full 380 lines, where I can not describe the meaning of its expression, this article only focus on how the data structure to organize and manage the process ID.
Process ID Type
To understand how the kernel organizes and manages process IDs, first know the type of process ID:
PID: This is a number assigned to it that uniquely identifies a process in its namespace in Linux, called the process ID number, or PID. Processes generated when using a fork or clone system call are assigned a new, unique PID value by the kernel.
Tgid: In a process, if a process called clone with the Clone_thread flag is a thread of the process, they are in a thread group and the ID of the thread group is called Tgid. All processes in the same thread group have the same tgid, and the thread group leader's Tgid is the same as the PID, and the Tgid is the same as the PID if a process does not use a thread.
Pgid: In addition, independent processes can compose process groups (using SETPGRP system calls), and process groups can simplify the operation of signaling to processes within all groups, such as processes connected by pipelines in the same process group. The process group ID is called Pgid, and all processes within the process group have the same pgid, which equals the PID of the group leader.
SID: Several process groups can be combined into a single conversation group (using the SETSID system call), which can be used for terminal programming. All processes in a conversation group have the same SID.
PID namespaces
namespaces provide support for the virtualization mechanism at the operating system level, with six different namespaces currently implemented, namely the Mount namespace, the UTS namespace, the IPC namespace, the user namespace, the PID namespace, and the network namespace. Namespaces simply provide an abstraction of a global resource, placing resources in different containers (different namespaces), and each container being isolated from one another. Namespaces have hierarchical relationships, such as the PID namespace, and Figure 1 is a hierarchical diagram of namespaces.
Figure 1 Hierarchical relationship of namespaces
In four namespaces, a parent namespace derives two child namespaces, one of which is derived from a child namespace. In the case of a PID namespace, each namespace can have a process with a PID number of 1, because the namespaces are isolated from each other, but because of the hierarchical nature of the namespace, the parent namespace is aware of the existence of the child namespace, so the child namespace is mapped to the parent namespace, so the level 1 The six processes in the two child namespaces are mapped to the PID number 5~10 of their parent namespace, respectively.
Namespaces increase the complexity of PID management, where it is possible for some processes to have multiple PID pid--in their own namespace and PID of their parent namespace, where the namespace of the process can be seen to be assigned a PID. So there are:
Global ID: The unique ID in the kernel itself and in the initial namespace, the Init process that started during system startup belongs to that initial namespace. Each process in the system corresponds to a PID of that namespace, called the global ID, and is guaranteed to be unique throughout the system.
Local ID: For a particular namespace, the ID assigned to it within its namespace is a local ID, which can also appear in other namespaces.
Process ID Management data structure
The Linux kernel should consider the following factors when designing the data structure of the management ID:
How to quickly find a local ID based on the task_struct, ID type, and namespace of the process
How to quickly find the task_struct of a corresponding process based on the local ID, namespace, ID type
How to quickly assign a unique PID to a new process within the visible namespace
If all factors are taken into account, it will be complicated, and the structure will be designed from simple to complex.
One PID corresponds to a task_struct
If you do not consider the relationship between processes, regardless of the namespace, just a PID number corresponding to a task_struct, then we can design such a data structure:
struct Task_struct {
//...
struct Pid_link PIDs;
//...}; struct Pid_link {
struct Hlist_node node;
struct PID *pid; };struct PID {
struct Hlist_head tasks; Point back to Pid_link's node
int nr; Pid
struct Hlist_node pid_chain; 1122.www.qixoo.qixoo.com/pid hash list node};
There is a pointer to the PID structure body in the task_struct structure of each process, and the PID structure contains the PID number. Structure 2.
Figure 21 task_struct corresponding to a PID
There are two other structures in the diagram that are not mentioned above:
Pid_hash[]: This is the structure of a hash table, according to the PID of the Nr value hash to one of its table items, if there are multiple PID structure corresponding to the same table item, here to resolve the conflict using the hash list method. In this way, we can solve the 2nd question, how to find the TASK_STRUCT structure quickly according to the PID value:
First a table entry that is attached to a hash table pid_hash[] by PID calculation PID
Traverse the table entry to find the PID with the same NR value as the PID value in the PID structure body
And then, through the tasks pointer of the PID structure, find node
Finally, the TASK_STRUCT structure can be found based on the container_of mechanism of the kernel.
Pid_map: This is a bitmap that uniquely assigns the PID value to the structure, and the figure Gray indicates the value that has been assigned, and when you create a new process, you only need to find a NR that assigns the assigned value to the PID struct body, and then set the value in Pid_map to the assigned flag. This solves the 3rd question above-How to quickly assign a global PID.
As for the 1th question above is more simple, known task_struct structure, according to its pid_link PID pointer to find the PID structure, remove its NR is PID number.
The process ID has a type of
If you consider a complex relationship between processes, such as thread groups, process groups, conversation groups, these groups have group IDs, respectively, Tgid, Pgid, SID, so the original task_struct in the Pid_link point to a PID structure need to add several, to point to its leader PID structure, the corresponding struct PID only need to refer back to the task_struct of its PID, and now it is necessary to add several items to link all processes in the group with the PID leader. The data structure is as follows:
Enum pid_type{
Pidtype_pid,
Pidtype_pgid,
Pidtype_sid,
Pidtype_max};struct Task_struct {
//...
pid_t pid; Pid
pid_t Tgid; Thread Group ID
struct Task_struct *group_leader; Threadgroup leader
struct Pid_link Pids[pidtype_max];
//...}; struct Pid_link {
struct Hlist_node node;
struct PID *pid; };struct PID {
struct Hlist_head Tasks[pidtype_max];
int nr; Pid
struct Hlist_node pid_chain; PID hash list node};
The type of the above ID Pidtype_max represents the number of ID types. The thread group ID is not included because the kernel already has the task_struct pointer to the thread group Group_leader, the thread group ID is nothing more than the Group_leader PID.
If there are now three processes A, B, C for the same process group, the process leader is a, such a structure of 3.
Figure 3 Adding the structure of the ID type
There are a few things to note:
The Pid_hash and PID_MAP structures are omitted because the first case is similar;
The process group leader of Process B and C is a, then the PID pointer of Pids[pidtype_pgid] points to the PID structure body of process A;
Process A is the leader of process B and C, and the PID structure of process a tasks[pidtype_pgid] is the head of a hash table, which links all processes that are the leader of the PID.
Once again, the three basic questions in this section are reviewed and are well implemented in this structure.
Increase the process PID namespace
If you add the PID namespace in the second case, a process may have multiple PID values, because a PID is assigned to each of the visible namespaces, so you need to change the structure of the PID, as follows:
struct pid{
unsigned int level;
/* Lists of the tasks that use this PID */
struct Hlist_head Tasks[pidtype_max];
struct upid numbers[1];}; struct Upid {
int nr;
struct Pid_namespace *ns;
struct Hlist_node pid_chain;};
A hierarchical level that represents the namespace in which the process is located is added to the PID structure, as well as an extensible upid structure. For struct upid, the id,ns that represents the process assigned to the namespace is the namespace to which the ID belongs, and pid_chain represents a hash table in that namespace.
For example, a new process is created on a namespace of Level 2, the PID assigned to it is 45, the namespace mapped to level 1, the PID assigned to it is 134, the namespace is mapped to level 0, the PID assigned to it is 289, for example, 4 indicated by the following:
Figure 4 Structure diagram after adding a PID namespace
Figure about if the assigned unique PID is not drawn, but it is relatively simple, and in the previous two cases, the assignment of a unique PID is a container with a namespace, in the PID namespace must be unique, but the individual namespaces do not need to be unique.
At this point, there is little difference between the data structures in the Linux kernel.
Process ID of the Linux kernel process management