A detailed description of the Linux process creation process

Source: Internet
Author: User
Tags posix signal handler

This article first uses the interface pthread_create to create a thread, and the Strace command tracks the interface pthread_create the steps to create the thread and the system calls involved, then discusses the Linux thread and process relationship, Finally, we outline the changes made by the Linux kernel in order to implement the POSIX thread.

I. Creating a thread using Pthread_create

You can use Pthread_create to create threads under Linux, which is declared as follows:

#include <pthread.h> int pthread_create(phtread_t *thread, const pthread_attr_t *attr,  void *(*start_routine) ( void *), void *arg);

As you can see, when we create a thread, we can specify the thread's property pthread_attr_t, such as the separation state property of the thread, the size of the line stacks, and so on (of course, you need to pthread_attr_init the relevant interface to manipulate the property structure), or when you create the thread, Pass the parameter arg to the thread entry function. Note When you create a new thread with a change interface, the newly created thread may run before the Pthread_create function returns. Here is a simple example:

#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <pthread.h>void* thread ( void* Arg) {    printf ("This is a pthread.\n");     Sleep (5);     return ((void *) 0); int main (int argc,char **argv) {    pthread_t id;     int ret;     printf ("pthread_start\n");    ret = Pthread_create (&id,null,thread,null);    printf ("pthread_end\n");    if (ret! = 0)    {           printf ("Create pthread error!\n");        Exit (1);    }       printf ("This is the main process.\n");     Pthread_join (id,null);     return 0;}

The compiler obtains the executable file: $GCC-G-lpthread-wall-o hack_pthread_create hack_pthread_create.c

We can use the command strace to track the thread creation process: $strace./hack_pthread_create

The output of the relevant part of the interface Pthread_create is as follows:

Mmap (NULL, 8392704, prot_read| Prot_write, map_private| map_anonymous| Map_stack,-1, 0) = 0x7f70b6c2f000brk (0)                                  = 0x1fe3000brk (0x2004000)                          = 0x2004000mprotect (0x7f70b6c2f000, 4096, Prot_none) = 0clone (child_stack=0x7f70b742eff0, flags=clone_vm| clone_fs| clone_files| clone_sighand| clone_thread| clone_sysvsem| clone_settls| clone_parent_settid| Clone_child_cleartid, parent_tidptr=0x7f70b742f9d0, tls=0x7f70b742f700, child_tidptr=0x7f70b742f9d0) = 64861

With the above output you can know that the steps for creating a thread Pthread_create interface are as follows:

1) Call Mmap to allocate memory on the heap, size 8392704 bytes, that is, 8196KB, that is 8m+4k, 4 K larger than the stack space, this 4k size is the we must uphold buffer size of the stack.

2) call Mprotect () to set the memory page of the protected area (size 4K), the page start address is 0x7f70b6c2f000, this page is used to monitor the stack overflow, if there is read and write operations on this piece, then will trigger a SIGSEGV signal.

3) Call Clone () to create the thread. In Linux, this interface is used to create processes, in essence, Linux threads are implemented using processes, with specific reference to the following. The first parameter of the call is the address of the bottom of the stack. The memory usage of the stack space starts with the high memory. Where the parameters flags the main markup meanings are described below:

CLONE_VM indicates that the parent and child processes share memory space; that is, any process that is modified in memory can also affect other processes, including executing mmap or MUNMAP operations in the process, as well as other processes. It is worth mentioning that fork also calls clone to create a child process, and it does not set the CLONE_VM tag.

CLONE_FS indicates that the parent and child processes share file system information, including the file system root directory, the current working directory, and the umask. Calling Chroot,chdir and umask in parent or child processes can also affect other processes.

Clone_files indicates that the parent and child processes share the same file descriptor table. Opening a new file in a parent or child process, or closing a file, or modifying the associated file flag with Fcntl, can also affect other processes.

Clone_sighand indicates that the parent and child processes share the same signal handler table, that is, the parent process or child process modifies how the signal is handled by sigaction, and it also affects other processes. However, the parent and child processes have separate masks, so that a process can block or not block a signal through sigprocmask, and it will not affect other processes.

Clone_thread is used to indicate that a child process is in the same thread group as the parent process. Simply put, the child process that is created is the creation of a thread for user space.

Clone_sysvsem is used to indicate that the child process shares the same semaphore list as the parent process.

The "sub-process" is essentially the thread we create, and from these identities we can see that the resources are shared among the threads in the process.

second, the Linux thread and process relationship

In a Linux system, a process is defined as an execution instance of a program, it does nothing, it only maintains the various resources required by the application, while the thread is the real execution entity. The process must contain at least one thread in order for the process to do some work. The process maintains the resources that the program contains (static resources), such as: virtual address space, open file descriptor collection, file system State and signal processing program, etc., the running related resources (dynamic resources) maintained by the thread, such as: Run Stack, dispatch-related control information, signal set to be processed, etc. There is no thread concept in the Linux kernel, and each execution entity is a task_struct structure, often called a process. A process is an execution unit that maintains the dynamic resources associated with execution. It also refers to the static resources required by the program. When you create a child process from the system call clone, you can selectively let the child process share the resources referenced by the parent process. Such child processes are often referred to as lightweight processes. Threads on Linux are based on lightweight processes and are implemented by the user-configured Pthread library.

With Pthread, each task_struct corresponds to a thread in the user's view, and a set of threads and a set of resources that they collectively refer to is a process, but a set of threads is sufficient not just to refer to the same set of resources, they must also be considered as a whole, the so-called thread group.

In summary, a process can have multiple threads, which are automatically dispatched by the kernel, and each thread has its own thread context, including thread ID, stack, stack pointer, program counter, general purpose register, and condition code. Other resources in the process are shared by all threads, including virtual address space (code, data, heap, shared library), file system Information, file descriptor list, and signal handlers.

Third, the implementation of the Linux threads

Functions called from the interface pthread_create know that in the Linux thread is implemented through the process, the Linux kernel provides a clone () system call for process creation, the clone parameter is CLONE_VM, Clone_files, Clone_ Sighand,clone_thread and so on. When creating a thread, by using the parameters of Clone (), the newly created process (also known as LWP (lightweight process)) shares the memory space with the parent process, the file descriptor and signal handlers, and so on, thus achieving the same purpose of creating the thread.

Before Linux 2.4, the Phtread line libraries The corresponding implementation is a named Linuxthreads Lib, but the library does not meet the requirements of POSIX, it is implemented in user space, in the signal processing, process scheduling (each process requires an additional scheduling thread) and multi-threaded synchronization of shared resources and other aspects of a certain problem.

After the Linux2.4, the Phtread line libraries corresponds to the implementation is NPTL (Native POSIX Thread Library), NPTL implementation to meet the POSIX requirements. NPTL is a 1x1 threading model, which is a thread's scheduling process for an operating system. The implementation of NPTL relies on the modification of the Linux kernel. The kernel has the following related modifications:

1) Added Futex (Fast userspace mutex) support in kernel to handle sleep and wake between threads. Futex is an efficient algorithm for mutually exclusive access to shared resources. Kernel plays a role in arbitration, but it is usually done by the process itself.

2) in Linux 2.4, the kernel has the concept of thread group, all threads in the thread group share a PID, which is called Thread Group identifier (TGID), and Task_ A field is added to the struct structure to hold this value. If the newly created thread is the first thread in a thread group, that is, the main thread, then the value of Tgid is the value of the threaded PID, otherwise the value of Tgid equals the PID of the process (that is, the main thread's PID).

If you call Getpid in a newly created thread, the value returned is the Tgid, the PID of the main thread (that is, the normally called Process PID), which can be called Gettid to obtain the thread's own ID in the kernel, the PID in Tast_struct.

In a clone system call, passing the Clone_thread parameter (that is, when creating a thread) allows the tgid of the new process to be set to the tgid of the parent process (otherwise the tgid of the new process will be set to its own PID). A similar ID has two in task_struct: The task->signal->pgid of the beginning process of the Save Process group, the PID of the beginning process of the Pid,task->signal->session save session, Use these two IDs to correlate process groups and sessions. With Tgid, the kernel or the associated shell program knows whether a tast_struct represents a process or a thread, and knows when to show them and when not to show them (for example, in PS, threads usually don't show up, but with the option-l).

Executing a similar execve function in any thread in the process, except for the main thread, the other threads terminate, and the new program executes in the main thread.

Note that the ID discussed above is completely unrelated to the pthread_t of the thread, and in most of the system calls that take the PID as a parameter or action on the process, the PID is treated as a tgid that will act on the entire thread group (process).

3) in order to cope with the "signal sent to the process" and "send the signal to the thread", task_struct inside the maintenance of two sets of signal_pending, one set is the thread group shared, one set is thread-unique. Signals sent through kill are placed in the signal_pending shared by the thread group and can be handled by any one of the threads, and the signal sent through Pthread_kill (Pthread_kill is the interface of the Pthread library, Tkill in the corresponding system call) Signal_pending, which is placed in a thread-unique, can only be handled by this threading. When a thread stops/continues, or if it receives a fatal signal, the kernel applies the processing action to the entire thread group.

A detailed description of the Linux process creation process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.