I. Basic Knowledge: threads and processes
As defined in textbooks, processes are the smallest unit of resource management, and threads are the smallest unit of program execution. In the operating system design, the main purpose of evolution from a process to a thread is to better support SMP and reduce (process/thread) Context switching overhead.
Regardless of the method, a process requires at least one thread as its instruction execution body, and processes manage resources (such as CPU, memory, and files ), the thread is allocated to a CPU for execution. Of course, a process can have multiple threads. At this time, if the process runs on an SMP machine, it can use multiple CPUs to execute each thread at the same time to achieve the maximum degree of parallelism, to improve efficiency. At the same time, even on a single CPU machine, the multi-threaded model is used to design the program, just as the multi-process model was used to replace the single process model in the past, make the design more concise, function more complete, and program execution more efficient, for example, using multiple threads to respond to multiple inputs, at this time, the multi-threaded model can also implement the functions implemented by the multi-process model. Compared with the latter, the context switching overhead of the thread is much lower than that of the process, in terms of semantics, the function of responding to multiple inputs simultaneously shares all resources except the CPU.
For the two main Meanings of the thread model, two thread models, core-level thread and user-level thread, are developed respectively. The classification criteria mainly refer to whether the thread scheduler is inside or outside the core. The former is more conducive to concurrent use of multi-processor resources, while the latter is more concerned with context switching overhead. In current commercial systems, the two are usually used together to provide core threads to meet the needs of SMP systems, it also supports implementing another thread mechanism in the user State using the thread library. At this time, a core thread becomes the dispatcher of multiple user State threads at the same time. As with many technologies, "hybrid" usually brings higher efficiency, but it also brings more implementation difficulties. Out of the "simple" design philosophy, linux has no plans to implement a hybrid model from the very beginning, but it adopts a "hybrid" approach in implementation ".
In the specific implementation of the thread mechanism, threads can be implemented in the operating system kernel or outside the kernel. The latter obviously requires at least processes in the kernel, the former generally requires support for processes in the kernel. The core-level thread model obviously requires the support of the former, while the user-level thread model is not necessarily implemented based on the latter. This difference is brought about by the different standards of the two classification methods.
When the kernel supports both processes and threads, You can implement the "many-to-many" model of the thread-process, that is, a thread of a process is scheduled by the kernel, at the same time, it can also be used as the scheduler of the user-level thread pool, and select the appropriate user-level thread to run in its space. This is the "mixed" thread model mentioned above, which can meet the needs of a multi-processor system and minimize the scheduling overhead. Most commercial operating systems (such as Digital UNIX, Solaris, and IRIX) Use a thread model that fully implements the posix1003.1c standard. Threads implemented outside the core can be divided into "one-to-one" and "many-to-one" models. The former uses a core process (maybe a Lightweight Process) to correspond to a thread, thread Scheduling is equivalent to process scheduling and handed over to the core for completion, while the latter implements multithreading completely outside the core, and scheduling is also completed in the user State. The latter is the implementation method of the user-level thread model mentioned above. Obviously, this non-core thread Scheduler only needs to switch the thread running stack, and the scheduling overhead is very small, however, because the core signal (synchronous or asynchronous) is in process units, the thread cannot be located. Therefore, this implementation method cannot be used in a multi-processor system, this demand is getting bigger and bigger. Therefore, in reality, the implementation of pure user-level threads has almost disappeared except for the purpose of Algorithm Research.
The Linux kernel only supports lightweight processes and limits the implementation of more efficient thread models. However, Linux optimizes the scheduling overhead of processes and makes up for this defect to some extent. Currently, the most popular thread mechanism linuxthreads uses the thread-process "one-to-one" model, which is assigned to the core for scheduling, and implements a thread management mechanism including signal processing at the user level. The Linux-linuxthreads operating mechanism is the focus of this article.
II. Implementation of lightweight processes in Linux 2.4 kernel
The initial process definition consists of three parts: Program, resource, and execution. A program usually refers to code. resources generally include memory resources, Io resources, and signal processing at the operating system level, the execution of a program is generally understood as the execution context, including CPU usage, and later developed into a thread. Prior to the appearance of the thread concept, in order to reduce the overhead of process switching, the operating system designer gradually revised the concept of process and gradually allowed to strip the resources occupied by the process from its subject, some processes are allowed to share part of resources, such as files, signals, data memory, and even code. This develops the concept of a lightweight process. The Linux kernel version 2.0.x has implemented lightweight processes. Applications can call interfaces through a unified clone () system to specify whether to create lightweight processes or common processes with different parameters. In the kernel, the clone () call will call do_fork () after passing and interpreting the parameters. This kernel function is also the final implementation of Fork () and vfork () system calls:
Linux-2.4.20/kernel/fork. c>
Int do_fork (unsigned long clone_flags, unsigned long stack_start,
Struct pt_regs * regs, unsigned long stack_size)
Clone_flags is taken from the "or" value of the following macro:
<Linux-2.4.20/include/Linux/sched. h>
# Define csignal 0x000000ff/* signal mask to be sent at exit */
# Define clone_vm 0x00000100/* set if VM shared between processes */
# Define clone_fs 0x00000200/* set if FS info shared between processes */
# Define clone_files 0x00000400/* set if open files shared between processes */
# Define clone_sighand 0x00000800/* Set IF signal handlers and blocked signals shared */
# Define clone_pid 0x00001000/* set if PID shared */
# Define clone_ptrace 0x00002000/* set if we want to let tracing continue on the child too */
# Define clone_vfork 0x00004000/* set if the parent wants the child to wake it up on mm_release */
# Define clone_parent 0x00008000/* set if we want to have the same parent as the cloner */
# Define clone_thread 0x00010000/* same thread group? */
# Define clone_newns 0x00020000/* New namespace group? */
# Define clone_signal (clone_sighand | clone_thread)
In do_fork (), different clone_flags will lead to different behaviors. For linuxthreads, it uses the (clone_vm | clone_fs | clone_files | clone_sighand) parameter to call clone () to create "Thread ", shared Memory, shared file system access count, shared file descriptor table, and shared signal processing method. This section describes how the Linux kernel shares these resources.
1. clone_vm
Do_fork () needs to call copy_mm () to set the mm and active_mm items in task_struct. The two mm_struct data correspond to the memory space associated with the process. If the clone_vm switch is specified for do_fork (), copy_mm () sets mm and active_mm in the new task_struct to the same as current, and increases the number of users of the mm_struct (mm_struct :: mm_users ). That is to say, the Lightweight Process shares the memory address space with the parent process. The figure shows the status of mm_struct in the process:
2. clone_fs
In task_struct, FS (struct fs_struct *) is used to record the root directory and current directory information of the file system where the process is located. When do_fork (), copy_fs () is called to copy the structure; for lightweight processes, only the FS-> count is added, and the same fs_struct is shared with the parent process. That is to say, a lightweight process does not have any information related to an independent file system. Any change to the current directory, root directory, or other information of a thread in the process will directly affect other threads.
3. clone_files
A process may open some files. In the process structure task_struct, files (struct files_struct *) is used to save the file structure (struct file) Information opened by the process. do_fork () copy_files () is called to process this process attribute. The Lightweight Process shares this structure with the parent process. When copy_files () is used, only files-> count is added. This sharing allows any thread to access open files maintained by the process, and their operations are directly reflected in other threads in the process.
4. clone_sighand
Each Linux Process can customize the signal processing method. In the SIG (struct signal_struct) in task_struct, a struct k_sigaction structure array is used to save the configuration information. do_fork () copy_sighand () in is responsible for copying this information. Lightweight processes do not copy, but only add signal_struct: Count count to share this structure with the parent process. That is to say, the subprocess and the parent process have the same signal processing method and can be changed to each other.
Do_fork () does a lot of work and is not described in detail here. For the SMP system, all the processes fork are allocated to the same CPU as the parent process until the process is scheduled.
Although Linux supports lightweight processes, it does not support Core-level threads, because Linux "Threads" and "processes" are actually at a scheduling level, sharing a process identifier space, this restriction makes it impossible to implement POSIX thread mechanism in a full sense on Linux. Therefore, many implementations of Linux thread libraries can only implement the vast majority of POSIX semantics as much as possible, and function approximation is as close as possible.
Iii. Thread mechanism of linuxthread
Linuxthreads is currently the most widely used thread library on Linux, developed by Xavier Leroy (Xavier.Leroy@inria.fr) and bound to publish in glibc. It implements the "one-to-one" thread model based on the core lightweight process. A thread entity corresponds to a core Lightweight Process, and management between threads is implemented in the off-core function library.
1. Thread description data structure and implementation restrictions
Linuxthreads defines a struct _ pthread_descr_struct data structure to describe the thread, and uses the Global Array Variable _ pthread_handles to describe and reference the thread under the process. In the first two items in _ pthread_handles, linuxthreads defines two global system threads: __pthread_initial_thread and _ pthread_manager_thread, use _ pthread_main_thread to characterize the parent thread of _ pthread_manager_thread (initially _ pthread_initial_thread ).
Struct _ pthread_descr_struct is a double link chain table structure. The linked list of __pthread_manager_thread contains only one element. In fact, __pthread_manager_thread is a special thread, linuxthreads only uses the errno, p_pid, p_priority, and other three fields. The chain where _ pthread_main_thread is located will link all user threads in the process. The _ pthread_handles array formed after a series of pthread_create () is shown in:
Figure 2 _ pthread_handles Array Structure
The newly created thread occupies one item in the _ pthread_handles array first, and then connects to the linked list with the _ pthread_main_thread as the leading pointer through the chain pointer in the data structure. The use of this linked list will be mentioned when introducing the creation and release of threads.
Linuxthreads follows the posix1003.1c standard, which limits the implementation scope of the thread library, such as the maximum number of threads in the process and the size of the private data zone of the thread. In the implementation of linuxthreads, these restrictions are basically followed, but some changes have also been made. The change trend is to relax or expand these restrictions to make programming more convenient. These restricted macros are mainly concentrated in sysdeps/Unix/sysv/Linux/bits/local_lim.h (different file locations are used on different platforms), including the following:
Number of private data keys for each process. POSIX defines _ posix_thread_keys_max as 128, and linuxthreads uses pthread_keys_max as 1024. The number of operations allowed to be executed when private data is released. threads is consistent with POSIX and defines pthread_destructor_iter; number of threads per process, POSIX defined as 64, linuxthreads increased to 1024 (pthread_threads_max); minimum space of the thread running stack, POSIX not specified, linuxthreads uses pthread_stack_min, 16384 (bytes ).
2. Management thread
One of the advantages of the "one-to-one" model is that thread scheduling is completed by the core, while other tasks such as thread cancellation and inter-thread synchronization are completed in the off-Core Thread library. In linuxthreads, a management thread is specially constructed for each process to handle thread-related management work. When the process calls pthread_create () for the first time to create a thread, it creates (_ clone () and starts the management thread.
In a process space, the management thread communicates with other threads through a pair of "Management pipeline (manager_pipe [2])", which is created before the management thread is created, after the management thread is successfully started, the read and write ends of the Management pipeline are assigned to the two global variables _ pthread_manager_reader and _ pthread_manager_request respectively. Then, every user thread sends a request to the management thread through _ pthread_manager_request, but the management thread itself does not directly use _ pthread_manager_reader, the read end of the pipeline (manager_pipe [0]) it is passed to the management thread as one of the _ clone () parameters. The management thread is mainly used to listen to the reading end of the pipeline and respond to the requests retrieved from it.
The process for creating a management thread is as follows:
(The initial value of the global variable pthread_manager_request is-1)
Figure 3 process for creating a management thread
After initialization, the Lightweight Process number and the thread ID for off-core allocation and management are recorded in _ pthread_manager_thread. The value 2 * pthread_threads_max + 1 does not conflict with the ID of any common user thread. The management thread runs as a subthread of the caller thread of pthread_create (), while the user thread created by pthread_create () is created by the management thread by calling clone, therefore, it is actually a sub-thread of the Management thread. (The Sub-thread concept should be understood as a sub-process .)
_ Pthread_manager () is the main loop of the Management thread. After a series of initialization work, it enters the while (1) loop. In a loop, the thread uses 2 seconds as the read end of the timeout query (_ poll () Management pipeline. Before processing the request, check whether the parent thread (that is, the main thread that creates the manager) has exited. If it has exited, the entire process is exited. If the child thread to be exited needs to be cleared, pthread_reap_children () is called.
Then, read requests from the MPs queue and perform the corresponding operations (switch-case) based on the request type ). The specific request processing is clear in the source code. I will not go into details here.
3. Thread Stack
In linuxthreads, the stack of the Management thread is separated from the stack of the user thread. The management thread allocates a thread_manager_stack_size byte region in the Process heap through malloc () as its own running stack.
Stack allocation methods of user threads vary depending on the architecture. One is need_separate_register_stack, which is used only on the IA64 platform. The other is floating_stack macro, it is used on a few platforms such as i386. In this case, the system determines the specific position of the user thread stack and provides protection. At the same time, you can specify the custom stack through the thread attribute structure. Due to space limitations, we can only analyze the two stack organization modes used by the i386 platform: floating_stack mode and user-defined mode.
In the floating_stack mode, linuxthreads uses MMAP () to allocate 8 Mb space from the kernel space (i386 system default maximum stack space size, if there is a running limit (rlimit ), use mprotect () to set the first page as a non-access zone. The functions of the 8 m space are allocated as follows:
Figure 4 stack structure diagram
Pages protected by low addresses are used to monitor stack overflow.
For the stack specified by the user, after following the pointer to the perimeter, set the thread stack top and calculate the stack bottom without protection. The correctness is guaranteed by the user.
Regardless of the mode of organization, the thread description structure is always located at the top of the stack and adjacent to the stack.
4. Thread ID and process ID
Each linuxthreads thread has both a thread ID and a process ID. The process ID is the process number maintained by the kernel, and the thread ID is allocated and maintained by linuxthreads.
The thread ID of _ pthread_initial_thread is pthread_threads_max ,__ pthread_manager_thread is 2 * pthread_threads_max + 1, and the thread ID of the first user thread is pthread_threads_max + 2, the thread ID of the nth user thread follows the following formula:
Tid = N * pthread_threads_max + n + 1
This allocation method ensures that all threads in the process (including exited threads) do not have the same thread ID, and the thread ID type pthread_t is defined as unsigned long Int ), it also ensures that the thread ID will not be repeated during the running time for a reason.
The thread data structure searched from the thread ID is completed in the pthread_handle () function. In fact, only the thread number is modeled by pthread_threads_max, and the index of the thread in _ pthread_handles is obtained.
5. Thread Creation
After pthread_create () sends a req_create request to the management thread, the management thread calls pthread_handle_create () to create a new thread. After stack allocation and thread attributes are set, use pthread_start_thread () as the function entry to call _ clone () to create and start a new thread. Pthread_start_thread () reads its own process ID and stores it in the thread description structure, and configures scheduling according to the recorded scheduling method. After everything is ready, call the real thread to execute the function, and call pthread_exit () to clear the scene after the function is returned.
6. linuxthreads Deficiency
Due to Linux kernel restrictions and implementation difficulties, linuxthreads is not fully POSIX compatible. It is described in its release readme.
1) process ID
This deficiency is the most critical one. The cause is the "one-to-one" model of linuxthreads.
Linux Kernel does not support threads in the true sense. linuxthreads uses lightweight processes with the same kernel scheduling view as common processes to implement thread support. These lightweight processes have independent process IDs and enjoy the same capabilities as common processes in terms of process scheduling, signal processing, and Io. In the source code reader's view, the Linux kernel clone () does not support the clone_pid parameter.
The processing of clone_pid in the kernel do_fork () is as follows:
If (clone_flags & clone_pid ){
If (current-> PID)
Goto fork_out;
}
This Code indicates that the current Linux kernel only accepts the clone_pid parameter when the PID is 0. In fact, the clone_pid parameter is used only during SMP initialization and manual process creation.
According to POSIX, all threads of the same process should share the same process ID and parent process ID, which cannot be implemented in the current "one-to-one" model.
2) signal processing problems
Asynchronous signals are distributed by processes in the kernel, while each thread in linuxthreads is a process for the kernel and does not implement a "thread group". Therefore, some semantics does not comply with POSIX standards. For example, readme does not implement sending signals to all threads in the process.
If the core does not provide real-time signals, linuxthreads will use SIGUSR1 and sigusr2 as internal restart and cancel signals, so that the application will not be able to use the two originally reserved signals for users. Versions later than Linux kernel 2.1.60 support extended real-time signals (from _ sigrtmin to _ sigrtmax), so this problem does not exist.
The default actions of some signals are difficult to implement in the current system. For example, sigstop and sigcont, linuxthreads can only suspend one thread, rather than the entire process.
3) Total number of threads
Linuxthreads defines the maximum number of threads for each process as 1024, but in fact this value is still limited by the total number of processes in the system. This is because the thread is actually a core process.
In kernel 2.4.x, a new set of computing methods for the total number of processes is adopted, so that the total number of processes is basically limited by the size of the physical memory. The calculation formula is in kernel/fork. in the fork_init () function of C:
Max_threads = mempages/(thread_size/page_size)/8
On i386, thread_size = 2 * page_size, page_size = 2 ^ 12 (4 kb), mempages = physical memory size/page_size, for machines with MB of memory, mempages = 256*2 ^ 20/2 ^ 12 = 256*2 ^ 8. the maximum number of threads is 4096.
However, to ensure that the total number of processes for each user (except root) does not occupy more than half of the physical memory, fork_init () continues to specify:
Init_task.rlim [rlimit_nproc]. rlim_cur = max_threads/2;
Init_task.rlim [rlimit_nproc]. rlim_max = max_threads/2;
Check the number of processes in do_fork (). Therefore, for linuxthreads, the total number of threads is also limited by these three factors.
4) Management thread Problems
Management threads are prone to bottlenecks, which is a common problem of this structure. At the same time, management threads are responsible for cleaning user threads. Therefore, although management threads have shielded most of the signals, however, once the management thread dies, the user thread has to be manually cleared, and the user thread does not know the status of the Management thread, and subsequent thread creation and other requests will not be processed.
5) synchronization problems
Thread Synchronization in linuxthreads is largely based on signals. Efficiency has always been a problem through the complicated kernel signal processing mechanism.
6) Other POSIX compatibility issues
In Linux, many system calls are semantically related to processes, such as nice, setuid, and setrlimit. In the current linuxthreads, these calls only affect the caller thread.
7) Real-time problems
The introduction of threads has certain real-time considerations, but linuxthreads is not supported for the time being, such as scheduling options, which have not yet been implemented. In addition to linuxthreads, the real-time performance of standard Linux is rarely considered.