How to manage processes in Linux Kernel

Source: Internet
Author: User

"Process" has many definitions. In many teaching materials, it is defined as an example of program execution, some people think that it is a set of all the data structures described by the program processing. Here we will not go into its definition. from another perspective, processes are like us humans, they are generated, and they have their own lifecycles, even though the lifecycles are different, from several milliseconds to several seconds, or even months, or years. The real difference with humans is that they have no gender.

From the Linux kernel perspective, a process is an entity of the system resources allocated by the kernel (such as CPU time slice and memory ). It should be noted that the early Linux kernel did not support multi-threaded applications, because from the original kernel perspective, multi-threaded applications are just a normal process. In short, the early applications were not satisfactory. Now the Linux kernel uses lightweight threads to support multi-threaded applications. In fact, two lightweight threads can share some resources, such as address space and opened files. When one of the shared resources is modified, the other will immediately see the changes between them. When accessing shared resources, they need to be synchronized.

In Linux, many processes are running. How does the kernel efficiently manage them? This is what we want to understand. After reading some Linux kernel source code about process management, I made some thoughts and reviews. To understand how the Linux kernel manages processes, you should first understand some important internal data structures, which are the most basic.

Process Data Structure-process Descriptor

To manage a process, the kernel must have a very clear blueprint about what each process is doing. For example, it must know the priority of the process, whether it is running on the CPU or what events it is waiting, which address space is assigned to which process and which files are allowed to be accessed? Therefore, the transaction kernel of the process must be clear. Each field in the process Descriptor (A task_struct Data Structure) contains all information related to each process. That is to say, the kernel manages processes mainly by obtaining rich information from the process descriptor. As you can imagine, the process descriptor certainly contains other data structure types. Various data types are interspersed with each other to transmit data information in a complex and orderly manner. Having said so much, I don't know what the process descriptor looks like. The true colors of Lu Shan are as follows:

struct task_struct {volatile long state;/* -1 unrunnable, 0 runnable, >0 stopped */void *stack;atomic_t usage;unsigned int flags;/* per process flags, defined below */unsigned int ptrace;int lock_depth;/* BKL lock depth */int prio, static_prio, normal_prio;unsigned int rt_priority;const struct sched_class *sched_class;struct sched_entity se;struct sched_rt_entity rt;......struct list_head tasks;struct plist_node pushable_tasks;struct mm_struct *mm, *active_mm;/* Revert to default priority/policy when forking */unsigned sched_reset_on_fork:1;pid_t pid;pid_t tgid;/*  * pointers to (original) parent process, youngest child, younger sibling, * older sibling, respectively.  (p->father can be replaced with  * p->real_parent->pid) */struct task_struct *real_parent; /* real parent process */struct task_struct *parent; /* recipient of SIGCHLD, wait4() reports *//* * children/sibling forms the list of my natural children */struct list_head children;/* list of my children */struct list_head sibling;/* linkage in my parent's children list */struct task_struct *group_leader;/* threadgroup leader *//* PID/PID hash table linkage. */struct pid_link pids[PIDTYPE_MAX];struct list_head thread_group;/* CPU-specific state of this task */struct thread_struct thread;/* filesystem information */struct fs_struct *fs;/* open file information */struct files_struct *files;#ifdef CONFIG_AUDITSYSCALLuid_t loginuid;unsigned int sessionid;#endifstruct prop_local_single dirties;#ifdef CONFIG_LATENCYTOPint latency_record_count;struct latency_record latency_record[LT_SAVECOUNT];#endif/* * time slack values; these are used to round up poll() and * select() etc timeout values. These are in nanoseconds. */unsigned long timer_slack_ns;unsigned long default_timer_slack_ns;struct list_head*scm_work_list;        ........};

I listed only a small part of the code in the process descriptor above. The first thing I feel is that it is huge and I don't know how to get started. It is impossible to understand the meaning of each field and how it works. However, we can start with the list_head task field. However, before analysis, we must remember that the task_struct process descriptor describes the details of a process. In our normal understanding, a task_struct process descriptor represents a process. It is the entity of a process, so I think it is OK.

Note that a large part of the Fields marked as "Red" are the struct list_head struct, which plays a very huge role and is like a needle, you can concatenate many related raw materials. The list_head structure is as follows:

struct list_head {struct list_head *next, *prev;};

This is just a two-way linked list node with a next and Prev pointers. In this case, the Linux kernel probably can guess how to manage process descriptors.

Therefore, we can see that by embedding list_head as one of the fields into the task_struct struct, task_struct has the feature of a bidirectional linked list, in this way, you can easily traverse and search for all processes. The list_head task field is used to complete this function. However, the connection of the task field only logically concatenates all processes. At this time, the total number of nodes in the two-way linked list is the total number of current processes. It does not consider further situations, such as sibling and parent-child relationships. Next let's take a look at the other two fields:

struct list_head children;/* list of my children */struct list_head sibling;/* linkage in my parent's children list */

The children field is used to connect all child processes. In this case, you can use this field to find your child process node. The sibling field is used to connect the sibling process. For example, if process P0 creates three sub-processes: P1, P2, and P3, and then P3 creates a sub-process P4, the relationships between them are shown in:

The children field of the P0 process only records the position of the first child P1 through the child's sibling. next, we can find children P2 and P3 in P0 one by one. The children and brothers are accessed through slibing. Each child can find their father through the parent field, child in P0. prev can find its last child process, namely P3, so they are all two-way linked lists. Take P3 as an example for further analysis. P3 creates the P4 process, so P3's children. next points to P4, because p4 is the only child of P3. prev still points to P4, because P4 has no sibling process, so their sibing. prev and sibling. next points to the P3 process.

So to sum up a little, the task field in the process descriptor is a block of all process strings, which is equivalent to classifying a certain type of data, but it is not subdivided, however, the children and sibling fields are further processed on this basis to integrate the internal logical relationships of this type of data. This is like a group of people getting together, and then we can quickly find the one we belong to through kinship ties.

 
How does a Process Traverse?

From the above analysis, we can see that traversal depends on the linked list, but here we mainly talk about the interfaces used when the kernel traverses processes. First we will discuss the traversal interfaces of the linked list, and then we will talk about the traversal interfaces of processes, the idea is basically the same. the Linux kernel mainly uses macros to traverse or operate the linked list:

List_add (n, p): Insert node data N from front of P. Therefore, if you want to insert n to the head of the linked list, Set P to the header node.

List_add_tail (n, p): it is basically the same as list_add (n, p). The difference is that the insert position is behind P, so if you want to insert n to the end of the linked list, set P as the header Node

List_del (P): Delete the address node pointed to by P.

List_empty (P): determines whether the address node pointed to by P is empty.

List_entry (P, T, m): return the address of the data structure type T. Note that t contains the list_head type field m, and the address of this field is P

List_for_each (p, h): scans linked list elements with H as the header node. During each iteration, p Stores the address pointing to the lis_head data structure field.

List_for_each_entry (p, H, m): similar to list_for_each, but the returned P contains the address of the data structure of the list_head field.

The Linux kernel is operated by using the linked list interface above. For a process, the header node of the process is the init_task descriptor, which is also the task_struct structure type. The task-> Prev address of init_task is passed as the parameter P to list_add_tail, you can insert process descriptors that contain the task_struct type to the end of the linked list. For processes, the set_links and remove_links macros are used to insert or delete a process descriptor node. Another useful macro is for_each_process, which is used to scan the entire process list. Use:

#define for_each_process(p) \       for (p = &init_task; ( p=list_entry( (p)->tasks.next,   struct task_struct, tasks) ) != &init_task;)


How does the kernel find the process to run?

When the kernel needs to find a new process to run on the CPU, the kernel must only consider the process in the running status (taskrunning.

In earlier Linux versions, All executable processes were placed in the same linked list running queue. However, the cost of maintaining it is too high, and the performance decreases rapidly as the number of processes increases. Different running queues are implemented from linux2.6. Its goal is to allow the scheduler to find a running process within the same time value and be completely independent of the number of processes. Its implementation technique is to have a corresponding runable queue for each priority K, and there is a list_head run_list in the process descriptor, which links the runable processes with the same priority, in a multi-core system, each CPU also has its own running queue. This is a classic case to improve performance by increasing the complexity of the data structure: it makes the Scheduler operation more efficient, and the running queue is scattered into 140 different linked lists.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.