Objective
In the single-core era, the programs you write are single-process/multithreaded. With the development of computer hardware technology, after entering the multi-core era, in order to reduce the response time and make full use of multi-core CPU resources, the use of multi-process programming is gradually accepted and mastered by people. However, because the cost of creating a process is relatively large, the means of multithreaded programming is gradually recognized and loved by people.
Remember when I just learned the threading process, I thought, why is it rare for people to combine multi-process and multi-threaded use, combining them together is not better? Now think about it really too young too simple, after the main discussion on this issue.
Process and threading Model
The classic definition of a process is an instance of an executing program. Each program in the system is running in the context of a process. The context is made up of the state required to run the program correctly, which includes the code and data for the program stored in the memory, its stack, the contents of the general purpose register, the program counter (PC), the environment variable, and the set of open file descriptors.
The process is primarily provided to the top of the application two abstractions:
- A separate logical control flow that provides an illusion as if our program exclusively uses the processor.
- A private virtual address space, which provides an illusion as if our program exclusively uses the memory system.
A thread is a logical flow that runs in the context of a process. Threads are automatically dispatched by the kernel. Each thread has its own thread context, including a unique integer thread ID, stack, stack pointer, program counter (PC), general purpose register, and condition code. Each thread shares the remainder of the process context with other threads running within the same process. This includes the entire user virtual address space, which consists of read-only text (code), read/write data, heaps, and all shared library code and data regions. Threads also share the collection of open files.
That is, the process is the smallest unit of resource management, and the thread is the smallest unit of program execution.
In a Linux system, POSIX threads can "be considered" as a lightweight process, pthread_create create threads and fork creation processes are created by invoking the __clone function in the kernel, except when the option to create a thread or process is different. For example, whether to share virtual address space, file descriptors, and so on.
Fork and multithreading
We know that a child process created by fork is almost but not exactly the same as the parent process. The child process obtains the same (but separate) copy of the parent process's user-level virtual address space, including text, data and BSS segments, heaps, and user stacks. The child process also obtains the same copy as any open file descriptor of the parent process, which means that the child process can read and write to any open files in the parent process, the biggest difference between the parent and child processes is that they have different PID.
One thing to note, however, is that in Linux, the fork is only copied from the current thread to the child process, and in the fork (2)-linux man page, there is a related description:
The child process was created with a single thread--the one that called Fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition Vari Ables, and other pthreads objects; The use of Pthread_atfork (3) May is helpful for dealing with problems that this can cause.
This means that other threads "evaporate" in the child process except the thread that calls the fork.
This is the root of all the problems that fork brings in multi-threading.
Mutual exclusion Lock
Mutexes are a key part of most problems with multithreaded fork.
On most operating systems, for performance reasons, the lock is basically implemented in the user state rather than the kernel (because the user configuration is most convenient, basically by atomic operation or the memory barrier mentioned in the previous article), so when the fork is called, All locks from the parent process are copied to the child process.
The problem is here. From an operating system perspective, there is a holder for each lock, which is the thread that locks the operation on it. Assuming that a thread locks on a lock before fork, the lock is held, and another thread calls fork to create the child process. But the thread holding that lock in the subprocess "disappears", from the perspective of the subprocess, the lock is "permanently" locked because its holder "evaporates".
Then a deadlock occurs if any thread in the child process locks the lock that is already held.
Of course, someone would say. Before you fork, let the thread that is ready to call fork get all the locks, and then release each lock in the sub-process that you fork out. Not to mention the reality of business logic and other factors allow this is not allowed, this practice will bring a problem, that is to imply a locking sequence, in the child process must be unlocked in the same order, or a deadlock will occur.
If you say you can be sure that you will be in the same order in the child process to unlock without error, there is an implicit problem you can not control, that is the library function.
Because you're not sure that all of the library functions you use will not use shared data, which means they are fully thread-safe. A significant portion of the thread-safe library functions are implemented internally by holding mutexes, such as malloc, printf, and so on, which are used by almost all programs.
For example, a multi-threaded procedure will inevitably allocate dynamic memory before fork, which will inevitably use the malloc function, and in the sub-process after the fork will inevitably allocate dynamic memory, which also need to use malloc, but this is not safe, Because it is possible that the lock inside the malloc has been held by one thread before the fork, and that thread has disappeared in the child process.
exec and file descriptors
According to the above analysis, it seems that it is only wise to call the EXEC function in a multi-threaded sub-process in a fork, even if this is a bit of a disadvantage. Because the child process inherits all open file descriptors from the parent process, the child process can still read and write to the files in the parent process before exec executes, but what if you don't want the child process to be able to read and write to an open file in the parent process?
Perhaps Fcntl setting file properties is one way:
1 int fd = open ("file", O_RDWR | O_creat); 2 if 0 )3{4 perror ("open"); 5 }6 fcntl (FD, F_SETFD, fd_cloexec);
However, if the file is opened after open, the child process is still able to read and write the file by calling Fcntl to set the Cloexec property before other threads have forked out the child process. If we use the lock, we will return to the situation discussed above.
Starting with the Linux 2.6.23 Kernel, we can set the O_CLOEXEC flag in open, which is equivalent to "open file and set Cloexec" as an atomic operation. This will make it impossible to read and write files that have already been opened in the parent process until the fork is executed by exec.
Pthread_atfork
If you're unlucky enough to have a problem solving a multi-threaded fork, try using pthread_atfork:
int pthread_atfork ( Void (*prepare) ( void Void (*parent) (void void (*child) (void));
- The prepare handler function is called by the parent process before the fork creates the child process, and the task of this function is to get all the locks defined by the parent process.
- The parent handler function is called after the fork has created the child process, but before the fork returns in the parent process environment. Its mission is to unlock all locks acquired by prepare.
- The child handler function is called in the subprocess environment before the fork is returned, and, like the parent handler function, it must unlock all locks acquired in the prepare.
Because the child process inherits the copy of the lock of the parent process, all of the above are not unlocked two times, but are individually unlocked. You can call the Pthread_atfork function multiple times to set up multiple sets of fork handlers, but when you use more than one handler. Handlers are not called in the same order. The parent and child are called in the order in which they were registered, whereas the prepare call order is the reverse of the registration order. This allows multiple modules to register their own handlers and maintain a hierarchy of locks (similar to the construction hierarchy of multiple Raii objects).
It is important to note that pthread_atfork can only clean up locks, but cannot clean up condition variables. In some systems implementations, the condition variables do not need to be cleaned up. However, in some systems, the implementation of the condition variable contains the lock, and this situation needs to be cleaned up. However, there is currently no interface or method to clean up the condition variables.
Conclusion
- In a multithreaded program, it is best to use fork only to execute the EXEC function without any further action on the fork-out child process.
- If you determine that you want to execute the EXEC function on a sub-process that is forked out in multiple threads, you need to add the CLOEXEC flag when you open the file descriptor before fork.
Reference documents
- Randal e.bryant, David O ' Hallaron. In-depth understanding of Computer Systems (2nd edition). Mechanical Industry Press, 2010
- W.richard Stevens. Advanced Programming for UNIX Environments (3rd edition), People's post and Telecommunications press, 2014
- Linux Mans Page. Fork (2)
- Damian Pietras. Threads and Fork (): Think twice before mixing them, 2009
- Cloud Wind. Very discordant fork multithreaded program, 2011
Careful use of fork in multi-threading