1. Process and Thread differences
A process is an instance of the execution of a program, which is a collection of data structures that the program has performed to the extent of the lesson. from the kernel point of view, the purpose of the process is to assume the basic unit of allocating system resources (CPU time, memory, etc.) .
A thread is an execution flow of a process that is the basic unit of CPU dispatch and dispatch, which is a smaller unit that can run independently than a process . a process consists of several threads (a user program with many relatively independent execution flows that shares most of the data structures of the application), and the threads share all the resources owned by the process with other threads that belong to one process.
"Process-the smallest unit of resource allocation, thread-the smallest unit of program execution"
The process has a separate address space, and after a process crashes, it does not affect other processes in protected mode, and the thread is just a different execution path in a process. Thread has its own stack and local variables, but the thread does not have a separate address space, a thread dead is equal to the entire process to die, so the multi-process program is more robust than multi-threaded programs, but in the process of switching, the cost of large resources, efficiency is worse. But for some concurrent operations that require simultaneous and shared variables, only threads can be used, and processes cannot be used.
In general , the process has a separate address space, and the thread does not have a separate address space (the address space of the thread sharing process within the same process). (The following is an excerpt from multithreaded programming under Linux )
2. Use multithreaded Justification
One of the reasons to use multithreading is that it is a very "frugal" multi-tasking approach compared to the process. We know that under the Linux system, starting a new process must be assigned to its own address space, creating numerous data tables to maintain its code snippets, stack segments, and data segments, which is an "expensive" multi-tasking way of working. While running on multiple threads in a process that use the same address space, sharing most of the data, starting a thread is much less than the space it takes to start a process, and the time it takes to switch between threads is much less than the time it takes to switch between processes. According to statistics, in general, the cost of a process is about 30 times times the cost of a thread, of course, on a specific system, this data may be significantly different.
The second reason to use multithreading is the convenient communication mechanism between threads . For different processes, they have independent data space, it is not only time-consuming, but also inconvenient to transmit the data only by means of communication. Threads do not, because data space is shared between threads in the same process, so that the data of one thread can be used directly by other threads, which is not only fast, but also convenient. Of course, the sharing of data also brings some other problems, some variables can not be modified by two threads at the same time, some of the sub-programs declared as static data more likely to have a catastrophic attack on the multi-threaded program, these are the most important to write a multi-thread programming.
In addition to the advantages mentioned above, not compared with the process, multi-threaded procedure as a multi-tasking, concurrent work, of course, the following advantages:
- Improve application responsiveness. This is especially meaningful to the graphical interface program, when an operation takes a long time, the entire system waits for this operation, the program does not respond to the keyboard, mouse, menu operation, and the use of multi-threading technology, the time-consuming operation (consuming) into a new thread, can avoid this embarrassing situation.
- Make multi-CPU systems more efficient. The operating system guarantees that when the number of threads is not greater than the number of CPUs, different threads run on different CPUs.
- Improve the program structure. A long and complex process can be considered to be divided into multiple threads and become a separate or semi-independent part of the run, which facilitates understanding and modification.
3. Which resources of a process are shared between threads
Thread-shared environments include the process code snippet, the public data of the process (which leverages the shared data, the threads are easily communicating with each other), the file descriptor that the process opens, the processor of the signal, the current directory of the process, and the process user ID and process group ID.
The process has its own personality as well as its many commonalities. With these personalities, threads can achieve concurrency. These personalities include:
1. Thread ID
Each thread has its own thread ID, which is unique in this process. The process uses this to mark
Knowledge of threads.
2. Value of the Register group
Since threads are concurrently running, each thread has its own different running threads, when from a line
Switch to another thread, the state of the Register collection of the original thread must be saved to
In the future, the thread can be restored when it is re-switched to.
3. The stack of threads
The stack is necessary to ensure that the thread runs independently.
A thread function can call a function, and the called function can be nested in layers, so the thread
You must have your own function stack so that function calls can execute normally, not by other threads
Ring.
4. Error return code
Because many threads in the same process are running concurrently, a thread may make system calls
The errno value is set after the thread has not handled the error, and another thread is at this point
is put into operation by the scheduler so that the error value can be modified.
Therefore, different threads should have their own error return code variable.
5. Thread's Signal Shield code
Because each thread is interested in a different signal, the thread's signal mask should be
Management. But all the threads share the same signal processor.
6. Priority of Threads
Because the thread needs to be dispatched as if it were a process, there must be a parameter that can be used for scheduling
Number, this parameter is the priority of the thread .
the process threading method used by 4.Nginx and PHP-FPM
Nginx is a non-blocking IO & IO multiplexing model that can handle requests from multiple clients in a single line thread with similar epoll functionality provided by the operating system.
Nginx process is a thread, that is, there is only one thread in each process, but this one thread can serve multiple clients.
PHP-FPM is a blocked single-threaded model,pm.max_children Specifies the maximum number of processes,pm.max_requests Specifies how many requests per process to restart (because PHP occasionally has memory leaks , so a restart is required).
PHP-FPM has only one thread per process, but a process can serve only one client at a time.
Most Linux programs tend to use processes rather than threads, because Linux is relatively inexpensive to create processes, and Linux's threading capabilities are not very powerful.
However, there are situations, such as if the browser is also full use of multi-process technology, then the number of processes is particularly numerous, this time requires the threading process collaboration, such as Chromium browser: http://blog.csdn.net/talking12391239/article/ details/19755997
When to choose Multi-process, multi-threading, like the following experience:
1) Priority threads that need to be frequently created for destruction
Please look at the comparison above for reasons.
The most common application of this principle is the Web server, a connection to build a thread, broken on the destruction of the thread, if the process, the cost of creating and destroying is very difficult to bear
2) Priority usage threads that require a lot of computation
The so-called large number of calculations, of course, is to consume a lot of CPU, switching frequently, this situation is the most suitable for the thread.
The most common of this principle is image processing, algorithm processing.
3) strong correlation processing with threads, weak correlation processing with process
What is strong correlation and weak correlation? It is difficult to define in theory, and to give a simple example to understand.
The general server needs to do the following tasks: messaging, message processing. "Messaging" and "message processing" are weak-related tasks, and "message processing" may be divided into "message decoding", "Business processing", these two tasks are relatively much more relevant. So "messaging" and "message processing" can be divided into process design, "message decoding", "Business processing" can be divided into threading design.
Of course, this classification is not static, but also can be adjusted according to the actual situation.
4) The use of processes that may be extended to multi-machine distribution, multi-core distributed with threads
Please look at the comparison above for reasons.
5) to meet the needs of the situation, with the most familiar with you, the most adept way
As for the "data sharing, synchronization", "Programming, debugging," "Reliability," the dimensions of the so-called "complex, simple" should be how to choose, there is no clear choice method. But according to a choice principle: if the multi-process and multi-threading can meet the requirements, then choose the most familiar, the best one.
Need to remind is: although given so many choice principle, but the actual application is basically "process + thread" combination way, do not really fall into a kind of error.
Linux process threads, NIGNX and PHP-FPM process threading mode