This article turned from programming blog: http://program-think.blogspot.com/2009/02/multi-process-vs-multi-thread.html
Like Shakespeare's "To be, or not to be, that is the question", he is always plagued by "process or thread ?" This problem is also often plagued by the design of software architecture. So today I want to talk about my experience on this issue. If you are still confused about the difference between threads and processes, please read the operating system principles book and read the Post.
Since this problem can easily lead to a war of words, we declare in advance that multi-process and multi-thread operations cannot generalize who is better than who is. Therefore, this post mainly describes the trade-offs between processes and threads in a specific scenario (related to the products under my responsibility), which is for reference only.
Because the specific scenario is the premise of this post, let's first talk about the features of the products that I am currently responsible: complex business logic, large data volume, high real-time data processing performance requirements, high robustness and security requirements, cross-platform requirements (including operating systems and databases) in some cases, distributed deployment is required.
As mentioned above, many application systems are in line with the above features, such as some online game servers, some financial industry business systems, and some e-commerce transaction systems. If you are engaged in the design of similar application systems, I hope the experience I will introduce below will help you.
★Process Granularity
Everyone should understand that both processes and threads are the means to process concurrency. For the above complex system, if you attempt to use all processes (see note 1) or all threads (see note 2) to process concurrency, it is estimated that it will be ugly. Therefore, the key issue is how to balance processes and threads (that is, to determine the granularity of processes ).
My personal suggestion:Process division by unit of business logic. The benefits of doing so are as follows:
1,Avoid wrangling
Generally, developers of a fixed business logic are relatively fixed. If a process corresponding to the business logic crashes, the tester can quickly locate the perpetrator and directly submit the bug to him/her.
On the contrary, a process is too large, where n people are involved. Once the process crashes, it is easy for programmers to talk about each other, which is not conducive to maintaining stability and unity. In addition, because testers often do not know who the bug belongs to, and often give errors, it is easy to create internal contradictions.
As shown above,RelativeFine Process granularity can avoid some management troubles. As XXX often teaches us that "stability prevails over everything", this advantage is listed as the first one.
2,Robustness and fault tolerance
In general, the levels of developers are uneven, and the excellent ones are a minority (For details, refer to the post on the "principles" series ). Therefore, it is inevitable that a cainiao programmer will make a low-level error, and some low-level errors will be fatal and cause the process to crash.
If you divide processes based on the business logic, the crash of a business logic process will have little impact on other business logic (unless it is the dependent party of the business logic ); therefore, there will be no problems mentioned in "NOTE 2.
3,Distributed
The distributed deployment requirements that I often encounter are generally divided by business logic. For example, the system has an authentication module containing sensitive user authentication information. At this time, the customer will require that the module be separately deployed in a security reinforcement host (to prevent class enemies from being damaged ).
If the process is divided by business logic, it is relatively easy to meet the above deployment requirements (as long as the appropriate inter-process communication mechanism is used together, we will mention it below ).
In addition, the support for distributed deployment can also solve performance problems. For example, if a business logic module consumes hardware resources (such as memory, CPU, hard disk, and bandwidth), it can be taken out and run on a single machine.
4,Cross-Programming Language
This benefit may be easily overlooked by many people. In general, each programming language has its own advantages and disadvantages. If you divide processes by business logic, you can select the appropriate programming language based on the characteristics of different business logic.
For example, I use C ++ for performance-sensitive modules, while Java or python is used for business logic-intensive modules.
★Inter-process communication (IPC) Problems
Since it is impossible to put the entire system into a process, it will inevitably encounter the IPC problem. The following describes how to select IPC.
There are many strange IPC types in various operating systems. To consider cross-platform, first cut off a batch (I will mention cross-platform IPC in the "cross-platform development" series ). In the remaining IPC types, there are not many IPC capable of data transmission, mainly including the following four types:
1. socket (socket );
2. Shared Memory;
3. MPS queue;
4. file;
Among them, socket is my highly recommended IPC method, for the following reasons: using socket can naturally support distributed deployment; using socket can easily achieve the combination of multiple programming languages (for example, C ++, Java, Python, and flex support socket ); using the socket can also save a lot of "Lock operation" code.
Some people may worry about the performance of the socket. When two processes perform socket communication on the local machine, the localhost loopback address can be used, and the data does not need to pass through the physical Nic. The operating system kernel can also be optimized. In this case, the performance of the socket is not significantly different from that of other IPC Mechanisms.
Finally, we can add that the socket method can effectively prevent wrangling. For example, John wrote process a and process B, and process a sent data to process B through socket. Suddenly one day, the communication between the two processes failed. Then, Michael said that there was an error in receiving the data, and Michael said that there was an error in sending the data. What should I do at this time? It's easy. Just find a sniffer software and capture the data packets on the spot and dump them to see the problem.
★Why thread?
After talking about the advantages of so many processes, some people have to ask: "What is the use of the thread ?" Generally, threads are used for two reasons: performance factors and convenient coding.
1,Performance factors
Because some operating systems (such as Windows) process is heavy, ifFrequentCreating a process or creating a large number of processes will lead to high operating system load. Example:
Suppose you want to develop an application similar to Web server. You create a corresponding process for each client request for data interaction (do you think of the ancient CGI?:-). Once the system is scaled up, your applications will immediately become overwhelmed as the number of concurrent connections increases.
The above example shows that the number of processes in the cross-platform software system must be relatively stable. If the number of processes increases linearly with certain environmental factors, it is pretty bad (by the way, if the number of threads increases linearly with environmental factors ). The division of processes based on the unit of the business logic can achieve the "relatively stable number of processes" effect.
2,Encoding
Because the data in the business logic is closely coupled. If the concurrency inside the business logic is also implemented by the process, it may lead to a large amount of IPC encoding (any two processes need to write a piece of IPC code as long as there is data interaction ). This may make related programmers complain.
Of course, coding problems are not absolute. If your system has a mature and easy-to-use IPC library that can transparently encapsulate IPC-related operations, then this problem does not exist.
------------------------------------------------------------------------------
Note 1
The so-called "all processes" means that all concurrency is implemented by processes (so each process has only one thread ). This design may cause serious performance problems on some platforms (such as Windows.
Note 2
The so-called "All threads" means that all concurrency is implemented using threads (so the entire system has only one process ). This design is extremely robust (a fatal error can cause the entire system to crash), not to mention distributed deployment.