Deep understanding of the Linux kernel day02--process

Last Update:2016-04-29 Source: Internet

Author: User

Tags mutex prev terminates

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Look at the next day, or to learn the operating system related, for the behind write driver preparation.

Process process is the basic concept in any multi-channel program designed operating system. A process is typically defined as an instance of program execution.
This section will first describe the static nature of the process and then describe how the kernel processes the process switch.

Processes, lightweight processes, and thread processes are an instance of program execution. You can think of it as a collection of data structures that fully describe the program and to what extent it is executed.
From the kernel point of view, the purpose of a process is to assume the entity that allocates system resources (CPU time, memory, and so on).
When a process is created, it is almost the same as the parent process. It accepts a (logical) copy of the parent process's address space. And the next instruction from the process to create the system call starts executing the same code as the parent process. Although parent-child processes can share pages that contain program code (body), they each have separate copies of the data (heap and stack), so the child process's modifications to one memory unit are not visible to the parent process (or vice versa).
Modern UNIX systems do not use the simple pattern above. They support multithreaded applications----A large number of user programs that have relatively independent execution flows share most of the data structures of the application. In such a system, a process consists of several user threads, and each line routine represents the execution flow of the process.
Linux uses lightweight processes to provide better support for multithreaded applications. Two lightweight processes can basically share some resources, such as address space, open files, and so on. As soon as one of them modifies the shared resource, the other immediately looks at the modification. Of course, when two threads access a shared resource, they must synchronize themselves.
One simple way to implement multithreaded applications is to associate lightweight processes with each thread.

Process descriptors in order to manage processes, the kernel must have a clear description of what each process does. For example, the kernel must know the priority of the process, whether it is running on the CPU or is blocked for some reason, what address space is assigned to it, what files it is allowed to access, and so on.
The process descriptor is a TASK_STRUCT type structure whose fields contain all the information related to a process. So much information is stored in the process descriptor, so it is quite complex. Not only does it contain many process attribute fields, but some fields also include pointers to other data structures.
This phase focuses on the state of the process and the parent-child relationship of the process.

Process state as the name implies, the state field in the process descriptor describes the current status of the process. It consists of a set of flags in which each flag describes a possible process state. In the current Linux version, these states are mutually exclusive, so, strictly speaking, only one state is set, and the remaining flags are cleared. The following are the possible states of a process:
Operational status (task_running)
The process either executes on the CPU or is ready to execute.
interruptible wait Flag (task_interruptible)
The process is suspended (sleep) until a condition becomes true. Generates a hardware interrupt, frees the system resources that the process is waiting for, or passes a signal that can be the condition of the wakeup process (putting the status of the process into task_running)
Non-interruptible wait state (task_uninterruptible)
Similar to the interruptible wait state, but with one exception, passing the signal to the sleep process does not change its state.
Paused State (task_stopped)
Execution of the process is paused. When the process receives the Sigstop, SIGTSTP, Sigttin, and Sigttou signals, it enters the paused state.
Trace status (task_traced)
The execution of the process has been paused by the debugger program. When a process is monitored by another process, any signal can put the process into a task_traced state.
There are also two process states that can be stored in the state field of the process descriptor or stored in the Exit_state field. As you can see from the names of these two fields, the state of the process becomes one of two states only when execution of the process is terminated:
Zombie State (Exit_zombie)
The execution of the process is terminated, but the parent process has not released a WAIT4 () or waitpid () system called to return information about the death. Before releasing the wait () class system call, the kernel cannot discard the data contained in the dead process descriptor because the parent process may also need it.
Zombie Undo Status (Exit_dead)
Final state: The process is removed by the system because the parent process has just issued a WAIT4 () or waitpid () system call.
The kernel also uses set_task_state and set_current_state macros: They set the state of the specified process and the state of the current execution process, respectively.

Each execution context that identifies a process that can generally be scheduled independently must have its own process descriptor, so even lightweight processes that share most of the data structures of the kernel have their own task_struct structures.
There is a very strict one by one correspondence between the process and the process descriptor, which makes it a convenient way to identify a process with a 32-bit process descriptor address. The process descriptor pointer points to these addresses, and most of the kernel references to processes are made through the process descriptor pointers.

The process descriptor processing process is a dynamic entity with a life cycle ranging from milliseconds to months. The kernel must therefore be able to handle many processes at the same time and store the process descriptor in dynamic memory, rather than in the memory area that is permanently allocated to the kernel.
For each process, Linux tightly stores two different data structures in a separate memory area allocated to the process:
One is the kernel-state process stack;
The other is a small data structure thread_info next to the process descriptor, called the thread descriptor.
The kernel allocates and frees memory areas that store thread_info structures and kernel stacks by using alloc_thread_info and Free_thread_info macros.

Identify the current process from the point of view of efficiency, the key benefits of the close integration between the kernel stack of the THREAD_INFO structure just described are:
The kernel easily obtains the address of the THREAD_INFO structure of the process currently running on the CPU from the value of the ESP register.
The process most commonly used is the address of the process descriptor rather than the address of the thread_info structure. In order to obtain a descriptor pointer that currently runs on the CPU, the kernel calls the current macro, which is essentially equivalent to Current_thread_info ()->task.

Doubly linked list for each linked list, you must implement a set of primitive operations: Initialize the linked list, insert and delete an element, scan a linked list, and so on.
The new linked list was created with the List_head (LIST_NAME) macro. It declares a new variable of type List_head list_name, changing the placeholder as a new linked list, is a dummy element. The List_head (LIST_NAME) macro also initializes the Prev and next fields of the List_head data structure to point to the list_name variable itself.
There are the following functions and macros that implement primitives.
List_add (N,P) Inserts the element pointed by n into the specific element that P points to
List_add_tail (N,P) Inserts an n-pointed element before the specific element that P points to
List_del (p) Remove the element that P points to
List_empty (p) checks whether the linked list specified by the address p of the first element is empty
List_entry (P,T,M) returns the address of the data structure of type T, where the type T contains the List_head field, and the List_head field contains the name m and the address p
List_for_each (P,H) scans the linked list specified by the header address H, and returns a pointer to the LIST_HEAD structure of the linked list element by p at each loop.
List_for_each_entry (p,h,m) with the above type, but returns the address of the data structure that contains the list_head structure, rather than the address of the list_head structure itself.

The process chain list linked the description of all processes. Each TASK_STRUCT structure contains a list_head type of tasks, and this type of prev and next fields point to the preceding and subsequent task_struct elements, respectively.
The head of the process chain list is the Init_task descriptor, which is the process descriptor for the so-called 0 process or swapper process. The Tasks.prev field of init_task points to the Tasks field of the last inserted process descriptor in the list.
The Set_links and Remove_links macros are used to insert and delete a process descriptor from the process list, respectively.
For_each_process, its function is to scan the entire process chain list.

Task_running status of the process list when the kernel is looking for a new process to run on the CPU, only the running process (that is, the process in the task_running state) must be considered.
The kernel must hold a large amount of data for each running queue in the system, but the primary data structure that runs the queue is the process description Fu Chinqu that makes up the running queue, all of which are implemented by a separate prio_array_t data structure. Manages header nodes for 140 priority queues.
The Enqueue_task (P,array) function inserts a process descriptor into a linked list of a running queue.
The Dequeue_task (p,array) function removes the process descriptor from a list of running queues.

Process-interprocess relationships create processes that have parent/child relationships, and if a process creates multiple child processes, there is a sibling relationship between the child processes.

It is feasible but rather inefficient to scan the process chain list and check the PID field of the Process Pidhash table and list sequence. In order to speed up the search, introduced 4 hash lists. 4 hash lists are required because the process descriptor contains fields that represent different types of PID, and no type of PID requires its own hash table.
4 hash list and related fields in the process descriptor
Description of the Type field name of the hash table
PID of Pidtype_pid PID process
PID of Pidtype_tgid Tgid thread group lead process
PID of Pidtype_pgid PGRP process Group lead process
PID of PIDTYPE_SID session leader process
When the kernel initializes, it dynamically allocates space for 4 hash lists and stores their addresses in the Pid_hash array.
Using the PID_HASHFN macro to convert the PID into a table index, Pidhash_shift is used to store the length of the table index.

How to organize a process to run a queue list organizes all processes that are in the task_running state together. When you want to group the processes of other States, different states require different processing, and Linux chooses one of the following methods:
There are no processes in the task_stopped, Exit_zombie, or exit_dead states that create specialized lists. It is not necessary to group three states because of the simple access to a process that is paused, zombie, or dead, or through a PID, or through a list of child processes in a particular parent process.
There is no dedicated linked list for a process that is in or state.

Waiting queue queues have many uses in the kernel, especially for interrupt processing, process synchronization, and timing. The wait queue implements the conditional wait on the event: the process that waits for the pending event puts itself in the appropriate waiting queue and discards control.
Because the wait queue is modified by the interrupt handler and the primary kernel function, it is necessary to protect its doubly linked list from peer access because peer access can cause unpredictable consequences. Synchronization is achieved by waiting for the lock spin lock in the queue header.
Each element in the waiting queue list represents a sleep process that waits for an event to occur; His descriptor is stored in the task field.
There are two kinds of sleep processes: The mutex process (the Flags field for the waiting queue element is 1) is selectively awakened by the kernel, and not the mutex process (flags value 0) is always awakened by the kernel when an event occurs.

The wait queue operation can define a new waiting queue header with the Declare_wait_queue_head (name) macro, which statically declares a header variable of the wait queue called name and initializes the lock and Task_list fields of the variable.
The function Init_waitqueue_head () can be used to initialize dynamically allocated wait queue header variables.
Once an element has been defined, it must be inserted into the wait queue.
The Add_wait_queue () function inserts a mutex process into the first position of the list of waiting queues.
The add_wait_queue_execlusive () function inserts a mutex process into the last position of the list of waiting queues.
The Remove_wait_queue () function removes a process from the list of waiting queues.
The waitqueue_active () function checks whether a given wait queue is empty.

Process resource limits each process has a set of associated resource limits (resource limit) that define the amount of system resources that the process can use. These restrictions prevent users from using excessive system resources (CPU, disk space, and so on).
The resource limit for the current process is stored in the Current->signal->rlim field, which is a field of the signal descriptor for the process.

Process switching in order to control the execution of a process, the kernel must have the ability to suspend a process that is running on the CPU and reply to the execution of a previously suspended process. This behavior is known as process switching, task switching, or context switching.

Hardware context Although each process can have its own address space, all processes must share the CPU registers. Therefore, before resuming the execution of a process, the kernel must ensure that each register loads the value of the pending process.
A set of data that must be mounted before the process resumes execution is called a hardware context. A hardware context is a subset of the process's executable context, because the executable context contains all the information that is required for the process to execute. In Linux, part of the process hardware context is stored in the TSS segment, while the remainder is stored in the kernel-state stack.

The thread field must be saved elsewhere in each process switch, the hardware context of the replaced process. Because Linux uses TSS for each CPU and not for each process.

The Execute process switch process switch may only occur in a well-defined store: Schedule () function. Here we focus only on how the kernel performs a process switchover.
Essentially, each process switch consists of two steps:
1, switch the page Global directory to install a new address space;
2. Switch the kernel-state stack and hardware context, because the hardware context provides all the information that the kernel needs to execute the new process, including the CPU registers.
The second step of process switching is performed by the SWITCH_TO macro. It is one of the most closely related routines in the kernel.
Switch_to (prev,next,last):
Prev, next represents the location of the replaced process and the address of the new process descriptor in memory.
Last is the output parameter, which indicates where the macro unloaded the memory from the descriptor address of process C.

Saving and loading the FPU, MMX, and XMM registers starting with 80486DX, the arithmetic floating-point unit (FPU) has been integrated into the CPU.

Creating a Process the Unix operating system relies solely on process creation to meet user needs.
The traditional UNIX operating system treats all processes in a consensual manner: the child process replicates the resources owned by the parent process. This approach is to create a process that is very slow and inefficient because the child process needs to copy the entire address space of the parent process. In fact, the child process almost does not have to read or modify all the resources owned by the parent process, in many cases the child process calls EXECVE immediately and clears the address space that the parent process has carefully copied over.
The modern Unix kernel solves this problem by introducing three different mechanisms:
1. Copy-on-write technology allows the parent-child process to read the same physical page.
2. Lightweight processes allow parent-child processes to share many of the data structures of each process kernel, such as page tables, open file tables, and signal processing.
3. Vfork () The process created by the system call can share the memory address space of its parent process.

Clone (), fork (), vfork () system call in Linux, lightweight processes are created by a function called Clone ().
The fork (), vfork () system call is implemented in Linux with Clone ().

The Do_fork () function do_fork () function is responsible for handling clone (), Fork,vfork () system calls.
Long do_fork (unsigned long clone_flags,
unsigned long Stack_start,
The struct Pt_regs *regs, which points to the universal register-worthy pointer, is stored in the kernel-state stack only when switching from the user state to the kernel state.
unsigned long stack_size, not used 0.
int __user *parent_tidptr, same as Ptid in Clone ()
int __user *child_tidptr) is the same as Ctid in Clone ()
Do_fork () uses the auxiliary function copy_process () to create the process descriptor and other kernel data structures required by the child process execution.

The copy_process () function copy_process () creates a process descriptor and all other data structures that are required for the child process to execute. His parameters are the same as do_fork ();
Static task_t *copy_process (unsigned long clone_flags,
unsigned long Stack_start,
struct Pt_regs *regs,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr,
int pid)

Kernel threads a traditional UNIX system. Some important tasks are delegated to periodically executed processes, which include refreshing the disk cache, swapping out unused page boxes, maintaining network connections, and so on.
In Linux, kernel threads differ from normal processes in the following ways:
Kernel threads run only in the kernel state, while normal processes can run in the kernel state or in the user state.
Because kernel threads only run in the kernel state, they use only linear address spaces larger than page_offset. On the other hand, the normal user can use 4GB linear address space regardless of the user state or the kernel state.

Create a kernel thread the Kernel_thread () function creates a new kernel thread that accepts parameters such as the address of the kernel function to be executed (FN), the parameter to be passed to the function (ARG), and a set of clone flags (flags).

Process 0
The ancestors of all processes are called process 0,idle processes or because of historical reasons called swapper processes, which are a kernel thread created from scratch in the Linux initialization phase.

Process 1
The kernel thread created by process 0 executes the init () function, and Init () completes the kernel initialization in turn. Init () calls the EXECVE () system call to mount the executable program init.
The result is that the Init kernel thread becomes a normal process and has the kernel data structure of its own process.
The init process survives until the system shuts down, and it should create and monitor the activity of all processes performed on the outer layer of the operating system.

Other Kernel threads Linux uses many other kernel threads. Some of these are created during the initialization phase, running until the system shuts down, while others are created "on demand" when the kernel must perform a task, which is performed well in the context of the kernel execution.
Some examples of kernel threads (except for process 0 and process 1):
Keventd (also known as events), KAPMD (handling events related to Advanced Power Management (APM), KSWAPD (performing memory reclamation), Pdflush (flushing the contents of the "dirty" buffer to disk to reclaim memory).
KBLOCKD (perform functions in kblockd_workqueue work queue), KSOFTIRQD (run Tasklet, each CPU in the system has one such kernel thread)

Undo process Many processes terminate the code they should have executed, and in a sense, these processes are dead. When this happens, the kernel must be notified so that the kernel frees the resources owned by the process, including memory, open files, and so on.
The kernel can selectively force the entire thread group to die. This occurs in the following two typical cases:
When a process receives a signal that cannot be processed or ignored, the
Or when the kernel is generating an unrecoverable CPU exception in the kernel state while it is running on behalf of the process.

Process termination There are two system calls in Linux 2.6 that terminate the user-state application:
Exit_group (): It terminates the entire thread group, which is the entire multithreaded-based application. Do_exit_group () is the primary kernel function called by this function.
Exit (): it terminates a thread regardless of all other processes in the thread group that the thread belongs to. Do_exit () is the primary kernel function that implements this system call.

Process Removal UNIX allows a process to query the kernel to obtain its parent process PID, or any of its child processes execution state.
For example: a process can create a child process to perform a specific task, and then call some library functions such as wait () to check whether the child process terminates. If the child process terminates, its terminating designator tells the parent process whether the task has completed successfully.
To follow these design choices, the Unix kernel is not allowed to discard data contained in the Process Descriptor field after the process terminates. This is allowed only after the parent process has made a wait () class system call that is related to the terminated process.
This introduces the cause of the zombie state, although technically the process is dead, but must save its descriptor, knowing that the parent process is notified.
The Release_task () function separates the final data structure from the descriptor of the zombie process, and there are two possible ways to handle the zombie process:
If the parent process does not need to accept signals from the child process, call Do_exit (); In this case, the recycle of the memory will be done by the process scheduler.
If a signal has been sent to the parent process, it needs to call WAIT4 (), waitpid () system call. In this case, the function also reclaims the memory space occupied by the process descriptor.

Deep understanding of the Linux kernel day02--process

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More