Chapter III Process Management 3.1 process 1. Process
1) A process is a program that is in the execution phase (the target code is stored on a storage medium), but the process is not confined to a single executable program code. Typically processes also include other resources, such as open files, pending signals, kernel internal data, processor state, one or more memory-mapped address spaces, and one or more execution threads. Of course, the data segments used to hold the global variables are, in fact, the real-time results of the executing program code, and the kernel needs to manage all the details efficiently and transparently.
2) The execution thread, called the thread, is the object that is active in the process, each thread has a separate program counter, a process stack, and a set of process registers, the kernel dispatches the object is the thread, not the process, in the traditional Linux system, a process contains only one thread, but now the system, multithreaded programs that contain multiple threads are commonplace. The thread implementation of a Linux system is very special: it does not differentiate between threads and processes, and for Linux, threads are just a special process.
3) in modern operating systems, processes provide two virtual mechanisms: virtual processors and virtual memory. Interestingly, note that virtual memory can be shared between threads, but each has its own virtual processor.
PS. In the modern Linux kernel, fork () is actually implemented by the clone () system call.
Program Pass Rich Oh Exit () the system call exits execution, and the parent process can query whether the child process is terminated through the WAIT4 () system call. After the process exits execution, it is set to the dead state, knowing that its parent process calls wait () or waitpid ().
2. Threads
An execution thread, referred to as a thread, is an object that is active in a process. The kernel Dispatches objects that are threads rather than processes. Linux defines threads as special processes.
3. Virtual processors and virtual memory
In modern operating systems, processes provide two virtual mechanisms: virtual processors and virtualized memory .
Threads that are included in the same process can share virtual memory, but each has its own virtual processor.
4. Several functions
Fork (): Create a new process exec (): Creates a new address space and loads a new program into it Clone (): Fork is actually implemented by clone exit (): Exit Execution Wait4 (): The parent process queries whether the child process is terminating wait (), Waitpid () : The program exits execution and becomes zombie, calling both to destroy.
3.2 Process descriptor and task structure
The kernel stores the list of processes in a two-way circular chain table called the task queue. Each item in the list is a process descriptor.
The data contained in the process descriptor can fully describe an executing program, the file it opens, the address space of the process, the pending signal, the status of the process, and so on.
The type of the process descriptor is task_struct, which contains the following data:
3.2.1 Assigning process descriptors
Linux allocates task_struct structures through the slab allocator , which can achieve object reuse and cache coloring .
Slab Allocator- dynamically generated , simply create a new struct thread_info at the bottom or top of the stack.
The THREAD_INFO structure of each task is allocated at the end of its kernel stack.
A task field in a structure holds a pointer to the actual task_struct of the task.
2. Storage of process descriptors
The kernel identifies each process through a unique process identity value PID. the PID type is pid_t, which is actually an int type, the maximum value is set to 32768 by default, and the upper limit can be improved by modifying the/proc/sys/kernel/pid_max. The PID is stored in the respective process descriptor.
Finds the process descriptor to the currently running process through the current macro. In x86, the THREAD_INFO structure is created at the end of the kernel stack, and the TASK_STRUCT structure is found indirectly by calculating offsets. Current uses Current_thread_info () to mask the 13 significant bits of the stack pointer, and then extracts and returns the address of the task_struct from the Thread_info task field.
3.3.3 Process Status
The state field in the process descriptor is used to describe the current status of the process. A total of five states, marked as follows:
Task_running (Run): The process is executable, or is executing, or is waiting to execute task_interruptible (interruptible) in the running queue: the process is sleeping/being blocked task_uninterruptible (non-disruptive): Sleep/ The blocked process is not awakened by a signal task_traced: a process that is tracked by another process task_stopped (stop): The process stops executing, the process is not operational, and it cannot be put into operation. You can enter this state when you receive a signal such as Sigstop, SIGTSTP, Sigttin, Sigttou, or when you receive any signal during commissioning.
3.2.4 Setting the current process state
Use the Set_task_state (task,state) function.
Set_task_state (task,state); To set the status of task tasks to state
Set_current_state (state) is equivalent toset_task_state(current,state)
3.2.5 Process Context
When the kernel "executes on behalf of the process" and is in the process context, the current macro in the context is valid, unless a process with a higher priority in this gap needs to be executed and adjusted by the scheduler, or the Chen Gu recovery will continue to execute in user space when the kernel exits.
When a program executes a system call or triggers an exception, it falls into kernel space, where the kernel is executed on behalf of the process and in the context of the process.
The process must access the kernel through interfaces: System calls and exception handlers.
3.2.6 Process Family Tree
All processes are descendants of the init process with PID 1.
The kernel initiates the INIT process at the last stage of system startup.
Each process in the system must have a parent process that can have 0 or more child processes, and a process with the same parent process is called a sibling.
This relationship is stored in the process descriptor, which points to the parent process Task_struct,children is the child process chain list.
Get the process descriptor of the parent process:
struct Task_struct *my_parent = current->parent;
To access a child process:
struct task_struct *task;struct list_head *list;list_for_each (list, ¤t->children) { task = List_entry (list, struct task_struct, sibling); /* Task now points to one of the current child processes */}
The process descriptor of the INIT process is statically allocated as Init_task.
Get the next process in the list:
List_entry (Task->tasks.next, struct task_struct, tasks);
Get the previous process in the list:
List_entry (Task->tasks.prev, struct task_struct, tasks);
The above relies on the two macro implementations of Next_task (Task) and Prev_task (Task).
For_each_process (Task) macro, which accesses the entire task queue, each access task pointer points to the next element in the list.
struct task_struct *task;for_each_process (Task) {/ * It prints out the name and PID of each task */ PRINTK ("%s[%d]\n", Task->comm, TASK->PID);}
3.3 Process Creation
- The mechanism by which the general operating system generates processes:
在新的地址空间创建进程、
读入可执行文件、
执行
- UNIX Mechanisms: fork () and exec ().
Fork (): Creates a child process by copying the current process. The difference between a child process and a parent process is only pid,ppid and some resources and statistics exec (): reads the executable file and loads it into the address space to start running.
3.3.1 Write-time copy
1. Write-time copy is a technology that can postpone or even eliminate copying data, and the kernel does not replicate the entire process address space, but instead lets the parent and child processes share a copy.
2. Replication of resources occurs only when a write is required, and is read as read-only before.
The actual cost of 3.fork is to copy the page table of the parent process and create a unique process descriptor for the child process.
3.3.2 Fork ()
Linux implements fork () through clone () system calls.
The approximate steps to create a process are as follows:
Fork (), vfork (), and __clone () call Clone () according to their desired parameter flags. Called by Clone () to call Do_fork (). Do_fork () calls the Copy_process () function and then lets the process start running. Returns the Do_fork () function, if the copy_process () function returns successfully, the newly created child process is awakened and put into operation.
Call Alloc_pid () to assign a valid PID to the new process.
The generic kernel chooses the child process to execute first . But generally not. This is the lightweight decision of Linux. Having the child process call the EXEC () function immediately avoids the extra overhead of copying at the time of writing.
3.3.3vfork ()
The vfork () system call and the fork () function are the same except for page table entries that do not copy the parent process. Ideally, do not call Vfork ().
Note: The child process runs as a separate thread of the parent process in its address space, and the parent process is blocked until the child process exits or executes exec (). The child process cannot write to the address space.
The implementation of the Vfork () system call is done by passing a special flag to clone ().
Call Copy_process () is, Task_struct's Vfor_done member is set to NULL. When executing do_fork (), if given a specific flag, vfor_done points to a specific address. After the child process begins execution, the parent process does not resume execution immediately, but waits until the child process sends a signal to it through the Vfor_done pointer. When Mm_release () is called, the function is used for the process to exit the memory address space and to check if Vfor_done is empty, and if not NULL, a signal is sent to the parent process. Back to Do_fork (), the parent process wakes up and returns.
3.4 Implementation of threads in Linux
Threading mechanism is an abstract concept commonly used in modern programming technology, which provides a set of threads running in shared memory address space within the same program , can share open files and other resources, supports concurrent programming, and can guarantee true parallel processing on multiprocessor systems.
The Linux kernel's perspective does not have the concept of threading, which implements all threads as processes, and threads are only considered a process of sharing certain resources with other processes.
For Linux, threading is just a means of sharing resources among processes.
3.4.1 Creating Threads
The new process and its parent process are the popular so-called threads.
Creating a thread is similar to creating a normal process, except that you need to pass some parameter flags to indicate which resources need to be shared when you call Clone ():
Clone (CLONE_VM | Clone_fs | Clone_files | Clone_sighand, 0); shared address space, file system resources, file descriptors, and signal handlers.
Normal Fork:
Clone (SIGCHLD, 0);
Vfork ():
Clone (Clone_vfork | CLONE_VM | SIGCHLD, 0);
The parameter flags passed to clone () determine how the new creation process behaves and what kind of resources are shared between the parent and child processes.
3.4.2 Kernel Thread
Kernel thread: A standard process that runs independently in kernel space. Kernel threads do not have separate address space , only run in kernel space , never switch to user space, can be dispatched and preempted .
Kernel threads can only be created by other kernel threads:
Kthread_create () is in a non-operational state after creating a kernel thread through the clone () system call and does not actively run if it is not explicitly awakened through Wake_up_process (). Create a process and let it run, you can call Kthread_run (). The kernel thread starts up and runs until the call Do_exit () exits, or the other part of the kernel calls Kthread_stop () exits, and the arguments passed to Kthread_stop () are returned by the Kthread_create () function task_ The address of the struct structure.
3.5 Process End
At the end of the process, the kernel must release the resources it occupies and inform the parent process.
Reason for process termination: typically from itself, when called by the exit () system call.
- An explicit invocation
- Implicitly returning from the main function of a program
Most rely on do_exit () to complete. There are several key points:
...... To re-find the adoptive father (another thread in the thread group or the INIT process) calls schedule () to switch to the new process ...
The process is not operational and is in the Exit_zonbie exit state, and all memory consumed is the kernel stack, thread_info structure, and task_struct structure. The only purpose of the process at this time is to provide information to its parent process.
3.5.1 Deleting a process descriptor
Releasing the TASK_STRUCT structure occurs after the parent process obtains the terminated child process information and notifies the kernel that it is not concerned, the required system call is WAIT4 ():
挂起调用它的进程,直到其中的一个子进程退出,此时函数返回该子进程的PID。
When you release the process descriptor, you need to call Release_task ().
2. The orphan process
If the parent process exits before the child process, there must be a mechanism to ensure that the child process can find a new parent, otherwise these orphaned processes will always be in a zombie state upon exiting.
These two methods can be used to solve the problem:
Find a thread in the current process group as the adoptive father let Init become their parent process.
The procedure is as follows:
Call Exit_notify () exit_notify () in Do_exit () call Forget_original_parent () forget_original_parent () call Find_new_reaper () Iterates through all the child processes and sets a new parent process for them. Calling Ptrace_exit_finish () also carries out a new parent-seeking process to find the father for Ptraced's child process. The INIT process calls wait () to check its child processes, clearing all of its associated zombie processes.
When a process is tracked, its temporary father is set to debug the process. If its parent process exits at this point, the system will re-find a new father for it and all its brothers. You can search for related sibling processes in a single, ptrace-tracked sub-process list-with two relatively small lists to mitigate the small size of the traversal.
Once the system successfully finds and sets the parent process for the process, there is no danger of a zombie-resident process. The init process routinely calls wait () to check its child processes, knowing all the zombie processes associated with them.
"Linux kernel Design and implementation" Chapter III reading notes