Implementation of Linux Threads

Source: Internet
Author: User
Tags posix


First, three threads are clarified from the OS design principle : Kernel thread, lightweight process, user thread

Kernel threads

A kernel thread is a clone of the kernel, and a single clone can handle a specific thing. This is particularly useful when dealing with asynchronous events such as asynchronous IO. The use of kernel threads is inexpensive, and the only resource used is the space for storing registers when the kernel stack and context switches. Multithreading-enabled cores are called multithreaded cores (multi-threads kernel).

Lightweight process

A lightweight thread (LWP) is a user thread supported by the kernel. It is a high-level abstraction based on kernel threads, so you can have an LWP only if you support kernel threads first. each process has one or more lwps, and each LWP is supported by a kernel thread. This model is actually a pair of thread models mentioned in the Dinosaur book. In this implementation of the operating system, the LWP is the user thread.

Since each LWP is associated with a particular kernel thread, each LWP is a separate thread dispatch unit. Even if an LWP is blocked in a system call, it does not affect the execution of the entire process .

Lightweight processes have limitations. First, most LWP operations, such as Setup, destructor, and synchronization, require system calls. System calls are relatively expensive: you need to switch between user mode and kernel mode. Second, each LWP needs to have a kernel thread support, so LWP consumes kernel resources (the stack space of kernel threads). Therefore, a system cannot support a large number of LWP.


The terms of LWP are borrowed from SVR4/MP and Solaris 2.x. Some systems refer to LWP as a virtual processor. The reason this is called a lightweight process may be that, with the support of kernel threads, LWP is a separate dispatch unit, just like a normal process. So the biggest feature of LWP is that each LWP has a kernel thread support.

User thread

Although LWP is essentially a user thread, the LWP line libraries is built on top of the kernel, and many of the operations of LWP are system-called, so the efficiency is not high. The user thread here refers to the line libraries, which is completely built in the user space, the user thread is built, synchronized, destroyed, and the schedule is completely completed in the user space without the help of the kernel. So the operation of this thread is extremely fast and low-consumption.

Is the initial user threading model, from which it can be seen that the process contains threads, the user thread is implemented in user space, the kernel is not directly to the user thread process scheduling, the kernel's scheduling object and the traditional process, or the process itself, the kernel does not know the existence of the user thread. Scheduling between user threads is implemented by a line libraries implemented in user space.

This model corresponds to the many-to-one-thread model mentioned in the Dinosaur book, with the disadvantage that if a user thread is blocked in the system call, the entire process will be blocked. (why?) )

Enhanced user thread-user thread +LWP

This model corresponds to many-to-many models in dinosaur books. User-line libraries is still completely built into user space, so the user thread is still very inexpensive, so you can create any user thread that you need. The operating system provides the LWP as a bridge between the user thread and the kernel thread. The LWP is also the same as mentioned earlier, with kernel thread support, the kernel's dispatch unit, and the user thread's system call going through LWP, so the blocking of a user thread in the process does not affect the execution of the entire process. The User Line Libraries associates the established user thread to the LWP, and the number of LWP and user threads does not necessarily match. When the kernel is dispatched to an LWP, the user thread associated with the LWP is executed at this time.

It is not entirely true that a lightweight process is a thread in many documents, as you can see from the previous analysis that a lightweight process is a thread only when the user thread is completely composed of lightweight processes.


What is achieved is a "one-to-one" threading model based on the core lightweight process, where a thread entity corresponds to a core lightweight process, and the management of threads is implemented in the kernel external function library (our common Pthread Library). All along, the Linux kernel does not have a thread concept. Each executing entity is a task_struct structure, often called a process.

A process is an execution unit that maintains the dynamic resources associated with execution. At the same time, it refers to the static resources required by the program. When you create a child process by calling clone from the system, you can selectively let the child process share the resources referenced by the parent process. Such child processes are often referred to as lightweight processes, as described above, and also called kernel threads .

[episode] The difference between fork and vfork
When fork, a child process is a copy of the parent process. The child process obtains the data segment and the stack segment from the parent process, but does not share with the parent process but allocates memory separately. However, the non-shared initial state is shared, Linux under the use of write-time replication technology, just started to share the parent process of the data segment, in the Writing data section of the copy, in the case of fork, the final shared resource is task_struct, System space Stack (Copy_thread),  Page tables, and so on. Vfork, because the implementation for the child process first executed, so is not copied (not necessary) the parent into the virtual storage space, that is, the user space stack, clone (CLONE_VFORK|CLONE_VM|SIGCHLD,02.4 kernel do_dork is Fork/vfork/clone system calls the common code, the core process is as follows: 1) by default, all resources are shared temporarily and not replicated; 2) If Flags does not specify a share (the corresponding bit is 0), deep replication is performed.   Includes, file,fs,sighand,mm. Take Clone_files as an example, for fork is 1, that is, two files must be copied, so that the parent-child process has a separate context (independent Lseek does not affect, but the file pointer must still point to one); But for vfork, this flag is 1, That is, the parent-child process shares the file context (note that this is not a shared file pointer, even the context!) That is, the child process Lseek will change the parent process read and write location), this is not a mess (similar to the vfork CLONE_VM flag)!   Do not worry, Do_fork will ensure that the vfork time, from the process to finish first! In particular, MM resources, even non-shared, namely Clone_vm=1 (fork so), not immediately copied,  Instead of copying the page table, the table entry is set to write protection so that whoever writes it, it will be copied again, and this completes the independence of the resources--for fork. 3) copy system stack (different from user space VM)         


The user-state thread is implemented by the Pthread library, and after using Pthread, each task_struct corresponds to a thread for the user, and a set of threads and a set of resources that they collectively refer to is a process. However, a set of threads is sufficient not just to reference the same set of resources, they must also be considered as a whole.

POSIX thread implementations are based on the following requirements:
1, when viewing the list of processes, a related set of task_struct should be presented as a node in the list;
2, the signal sent to this "process" (corresponding to the kill system call), will be shared by the corresponding set of task_struct, and by any one of the "thread" processing;
3, the signal sent to a "thread" (corresponding to the Pthread_kill), will be received only by a corresponding task_struct, and processed by it itself;
4, when "process" is stopped or continued (corresponding to the sigstop/sigcont signal), the corresponding set of task_struct states will change;
5, when the "process" received a fatal signal (for example, due to a segment error received SIGSEGV signal), the corresponding group of task_struct will all exit;
6, the above may not be complete;

  before Linux 2.6, the implementation of the Pthread line libraries was a Lib called Linuxthreads. Linuxthreads uses the lightweight process mentioned earlier to implement Threads , but for the requirements that POSIX asks, Linuxthreads, except for the 5th, was not fulfilled (in fact powerless):
1, if you run a program, a program creates 10 threads, then execute PS command under the shell will see 11 a process, instead of 1 (note, not 10, explained below);
2, whether it is kill or pthread_kill, the signal can only be received by a corresponding thread;
3, the Sigstop/sigcont signal only works on a single thread;

Fortunately Linuxthreads achieved the 5th, I think this is the most important. If a thread is "hung", the entire process is running as if it were not working, and there may be a lot of inconsistencies. The process will not be a whole, and the thread cannot be called a thread. Perhaps that is why Linuxthreads is so far away from POSIX requirements that it can exist and has been used for several years. Yes, linuxthreads to achieve this "5th", still pay a lot of price, and create linuxthreads itself a major performance bottleneck.

The next thing to say is why a program creates 10 threads, but PS has 11 a processes. Because Linuxthreads automatically creates a management thread. The "5th" mentioned above is achieved by managing threads. When the program starts running, there is no management thread present (because although the program has already linked the Pthread library, it does not necessarily use multithreading). The first time the program calls Pthread_create, Linuxthreads discovers that the management thread does not exist and creates the management thread. This administrative thread is the son of the first thread (main thread) in the process.
Then in Pthread_create, a command is sent through the pipe to the management thread to tell it to create the thread. That is, all threads are created by management threads except the main thread, and the management thread is their father. Thus, when any one of the child threads exits, the management thread receives the SIGUSER1 signal (which is specified when creating a child thread through clone). The management thread in the corresponding Sig_handler determines whether the child thread exits normally, and if not, kills all threads and then commits suicide.
So, what about the main thread? The main thread is the father of the management threads, which does not Cheng the management line when exiting. Thus, the ID number of the parent process is checked through getppid in the main loop of the management thread, if the ID number is 1, indicating that the father has exited and hosted himself to the Init process (process 1th). At this point, the management thread will also kill all the child threads and then commit suicide. So, what if the main thread is calling Pthread_exit active exit? According to POSIX standards, other child threads should continue to run in this case. Therefore, in Linuxthreads, the main thread call Pthread_exit will not actually exit, but will be blocked in the Pthread_exit function to wait for all child threads to exit, Pthread_exit will let the main thread exit. (During this and so on, the main thread is always asleep.)

As can be seen, the creation and destruction of threads is accomplished by managing threads, and managing threads becomes a performance bottleneck for linuxthreads. Creation and destruction require one interprocess communication, one context switch to be executed by the management thread, and multiple requests are executed serially by the management thread.

NPTL (Native POSIX threading Library)

To the Linux 2.6, there is a new Pthread line libraries NPTL in glibc. NPTL implements all of the POSIX 5-point requirements mentioned earlier. But, in fact, instead of NPTL implementation, it's more of a Linux kernel implementation.

In Linux 2.6, the kernel has a thread group concept, and a tgid (thread group ID) field is added to the task_struct structure. If this task is a "main thread", then its tgid equals PID, otherwise tgid equals the PID of the process (that is, the main thread of the PID), in addition, each thread has its own PID. in the clone system call, passing the Clone_thread parameter sets the Tgid of the new process to the tgid of the parent process (otherwise the tgid of the new process is set to its own PID). There are two more xxid in task_struct: task- >signal->pgid the PID of the beginning process of the Save Process group, task->signal->session the PID of the process to save the session. Use these two IDs to correlate process groups and sessions.

With Tgid, the kernel or the associated shell program knows whether a tast_struct represents a process or a thread, and knows when to show them and when not to show them (as in PS, the threads don't show up). The getpid (get process ID) system call returns the Tgid in the Tast_struct, and the PID in Tast_struct is returned by the Gettid system . There are some problems when executing the PS command without displaying the sub-threads. For example, when the program a.out run, a thread is created. Assume that the main thread's PID is 10001, the child threads are 10002 (their tgid are all 10001). At this point, if you kill 10002, you can kill 10001 and 10002 of these two threads together, even though the PS command does not see a 10002 process at all. If you don't know the story behind the Linux thread, you'll definitely feel like you're in a supernatural event.

In order to cope with the "signal sent to the process" and "send the signal to the thread", task_struct inside maintained two sets of signal_pending, one is the thread group shared, one set is thread-unique . Signals sent through kill are placed in the signal_pending shared by the thread group and can be handled by any one of the threads; The signal sent through the Pthread_kill (Pthread_kill is the interface of the Pthread library, Tkill in the corresponding system call) is placed in signal_pending, which is unique to the thread, and can only be handled by the threads.

When a thread stops/continues, or if it receives a fatal signal, the kernel applies the processing action to the entire thread group.

Ngpt (Next Generation POSIX Threads)

The two types of wire libraries used are kernel-level threads (each thread corresponds to a dispatch entity in the kernel), which is called a 1:1 model (1 threads correspond to 1 kernel-level threads), while NGPT intends to implement the M:N model (m threads correspond to n kernel-level threads). This means that several threads may be implemented on the same execution entity.
The line libraries needs to abstract several execution entities on the execution entity provided by the kernel, and implement the scheduling between them. Such an abstract execution entity is called a user-level thread. In general, this can be done by assigning a stack to each user-level thread and then by longjmp the context switch. (Baidu "setjmp/longjmp", you know.)
But there are really a lot of details to deal with. The current ngpt does not seem to have implemented all the expected functions and is not ready to implement them for the time being.

The switching of user-level threads is obviously faster than kernel-level threading, which may be a simple long jump, while the latter requires saving/loading registers, entering and exiting the kernel state. (process switching also needs to switch address space, etc.). User-level threads do not enjoy multiprocessor because multiple user-level threads correspond to a kernel-level thread, and a kernel-level thread can run on only one processor at a time. However, M:n's threading model provides a means to The ability to run threads that do not require parallel execution on several user-level threads corresponding to a kernel-level thread can save them from switching overhead.

It is said that some Unix-like systems (such as Solaris) have implemented a more mature m:n threading model, which has some advantages over Linux threads.

References: Linux kernel design and implementation 3.4 threads implemented in Linux

Linux Kernel source code scenario analysis 4.3

Apue 8.4 vfork function

Linux kernel source scenario Analysis 4.3 system call fork Vfork clone

Linux Threading Implementation Mechanism Analysis
Some notes about processes, threads, and lightweight processes

Implementation of Linux Threads

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.