What is multithreading: multithreading aims to make multiple threads work in parallel to complete multiple tasks and improve system efficiency. The thread is implemented when multiple tasks need to be completed at the same time.
The advantages of using threads are as follows:
· Use threads to place tasks in programs that occupy a long time in the background for processing
· The user interface can be more attractive. For example, if a user clicks a button to trigger processing of some events, a progress bar will pop up to display the processing progress.
· Program running speed may be accelerated
· When implementing some waiting tasks, such as user input, file read/write, and network data sending and receiving, the threads are swimming. In this case, we can release some precious resources, such as memory usage.
====
■ What isMulti-Process:
A process is an execution activity of a program on a computer. When you run a program, you start a process. Obviously, the program is dead (static), and the process is active (dynamic ). Processes can be divided into system processes and user processes. All the processes used to complete various functions of the operating system are system processes, and they are the operating system itself in the running state. You don't need to talk about the user processes, all processes started by you are user processes. A process is the unit in which the operating system allocates resources.
In Windows, a process is refined into a thread, that is, a process has multiple smaller units that can run independently.
In the same time, if two or more processes are allowed to run in the same computer system, this is a multi-task. Modern Operating Systems are almost all multi-task operating systems that can manage the running of multiple processes at the same time. The benefits of multitasking are obvious. For example, you can listen to MP3 while surfing the internet, and even print downloaded documents without interfering with each other. This involves the issue of parallelism. As the saying goes, it cannot be used at all. This is also true for computers. In principle, a CPU can only be allocated to one process to run the process. We usually use only one CPU in the computer, that is, there is only one heart. To make it single-purpose and multi-process running at the same time, we must use the Concurrency Technology. The Concurrency Technology is quite complex, and the easiest to understand is the "time slice rotation process scheduling algorithm". Its idea is briefly introduced as follows: under the management of the operating system, all running processes use the CPU in turn, and each process can take a very short CPU usage time (for example, 10 ms). In this way, the user cannot feel that the CPU is serving multiple processes in turn, it is as if all processes are running continuously. But in fact, only one process occupies the CPU at any time.
If a computer has multiple CPUs, the situation is different. If the number of processes is smaller than the number of CPUs, different processes can be allocated to different CPUs for running, multiple processes are actually running at the same time, which is parallel. However, if the number of processes is greater than the number of CPUs, you still need to use the Concurrency Technology.
In Windows, CPU allocation is based on threads. A process may consist of multiple threads. The situation is more complicated, but simply put, the relationship is as follows:
Total number of threads <= number of CPUs: Parallel Operation
Total threads> CPU count: Concurrent Operation
The efficiency of parallel operation is obviously higher than that of concurrent operation. Therefore, in multi-CPU computers, the efficiency of multi-task operations is relatively high. However, if you run only one process (thread) on a multi-CPU computer, you cannot take advantage of multiple CPUs.
This involves the problem of multi-task operating systems. The basic principle of multi-task operating systems (such as Windows) is: the operating system allocates CPU time slices to multiple threads, each thread is completed in the time segment specified by the operating system (note that multiple threads here belong to different processes ). the operating system constantly switches from the execution of one thread to the execution of another thread. In the macro view, it seems like multiple threads are executed together. since these threads belong to different processes, it seems to us that multiple processes are executed at the same time, thus implementing multiple tasks.
Architecture Design:
Multi-ProcessOr multithreading is like Shakespeare's "To be, or not to be, that is the question", which is always plagued by "process or thread ?" This problem is also often plagued by the design of software architecture. So today I want to talk about my experience on this issue. If you are still confused about the difference between threads and processes, please read the operating system principles book and read the Post.
As this problem can easily lead to a war of words, the statement in advance is as follows:Multi-ProcessAnd multithreading, it is impossible to generalize who is better than who is. Therefore, this post mainly describes the trade-offs between processes and threads in a specific scenario (related to the products under my responsibility), which is for reference only.
Because the specific scenario is the premise of this post, let's first talk about the features of the products that I am currently responsible: complex business logic, large data volume, high real-time data processing performance requirements, high robustness and security requirements, cross-platform requirements (including operating systems and databases) in some cases, distributed deployment is required.
As mentioned above, many application systems are in line with the above features, such as some online game servers, some financial industry business systems, and some e-commerce transaction systems. If you are engaged in the design of similar application systems, I hope the experience I will introduce below will help you.
Process Granularity
Everyone should understand that both processes and threads are the means to process concurrency. For the above complex system, if you attempt to use all processes (see note 1) or all threads (see note 2) to process concurrency, it is estimated that it will be ugly. Therefore, the key issue is how to balance processes and threads (that is, to determine the granularity of processes ).
I personally suggest that you divide processes based on the unit of business logic as much as possible. The benefits of doing so are as follows:
1. Avoid wrangling
Generally, developers of a fixed business logic are relatively fixed. If a process corresponding to the business logic crashes, the tester can quickly locate the perpetrator and directly submit the bug to him/her.
On the contrary, a process is too large, where n people are involved. Once the process crashes, it is easy for programmers to talk about each other, which is not conducive to maintaining stability and unity. In addition, because testers often do not know who the bug belongs to, and often give errors, it is easy to create internal contradictions.
It can be seen from the above that relatively fine process granularity can avoid some management troubles. As XXX often teaches us that "stability prevails over everything", this advantage is listed as the first one.
2. robustness and fault tolerance
In general, the levels of developers are uneven, and the excellent ones are a minority (For details, refer to the post on the "principles" series ). Therefore, it is inevitable that a cainiao programmer will make a low-level error, and some low-level errors will be fatal and cause the process to crash.
If you divide processes based on the business logic, the crash of a business logic process will have little impact on other business logic (unless it is the dependent party of the business logic ); therefore, there will be no problems mentioned in "NOTE 2.
3. Distributed
The distributed deployment requirements that I often encounter are generally divided by business logic. For example, the system has an authentication module containing sensitive user authentication information. At this time, the customer will require that the module be separately deployed in a security reinforcement host (to prevent class enemies from being damaged ).
If the process is divided by business logic, it is relatively easy to meet the above deployment requirements (as long as the appropriate inter-process communication mechanism is used together, we will mention it below ).
In addition, the support for distributed deployment can also solve performance problems. For example, if a business logic module consumes hardware resources (such as memory, CPU, hard disk, and bandwidth), it can be taken out and run on a single machine.
4. Cross-programming languages
This benefit may be easily overlooked by many people. In general, each programming language has its own advantages and disadvantages. If you divide processes by business logic, you can select the appropriate programming language based on the characteristics of different business logic.
For example, I use C ++ for performance-sensitive modules, while Java or python is used for business logic-intensive modules.
Inter-process communication (IPC) Problems
Since it is impossible to put the entire system into a process, it will inevitably encounter the IPC problem. The following describes how to select IPC.
There are many strange IPC types in various operating systems. To consider cross-platform, first cut off a batch (I will mention cross-platform IPC in the "cross-platform development" series ). Among the remaining IPC types, there are not many IPC capable of data transmission, mainly including sockets, shared memory, pipelines, and files.
Among them, socket is my highly recommended IPC method, for the following reasons: using socket can naturally support distributed deployment; the use of socket can easily achieve the combination of multiple programming languages (such as C ++, Java, Python,FlexSupports sockets). You can use socket to save a large block of "Lock operation" code.
Some people may worry about the performance of the socket. When two processes perform socket communication on the local machine, the localhost loopback address can be used, and the data does not need to pass through the physical Nic. The operating system kernel can also be optimized. In this case, the performance of the socket is not significantly different from that of other IPC Mechanisms.
Finally, we can add that the socket method can effectively prevent wrangling. For example, John wrote process a and process B, and process a sent data to process B through socket. Suddenly one day, the communication between the two processes failed. Then, Michael said that there was an error in receiving the data, and Michael said that there was an error in sending the data. What should I do at this time? It's easy. Just find a sniffer software and capture the data packets on the spot and dump them to see the problem.
Why thread?
As mentioned aboveMulti-ProcessSome people asked: "What is the use of the thread ?" Generally, threads are used for two reasons: performance factors and convenient coding.
1. performance factors
Some processes in some operating systems (such as Windows) are heavy. If a process is frequently created or a large number of processes are created, the load on the operating system is too high. Example:
Suppose you want to develop an application similar to Web server. You create a corresponding process for each client request for data interaction (do you think of the ancient CGI :-). Once the system is scaled up, your applications will immediately become overwhelmed as the number of concurrent connections increases.
The above example shows that the number of processes in the cross-platform software system must be relatively stable. If the number of processes increases linearly with certain environmental factors, it is pretty bad (by the way, if the number of threads increases linearly with environmental factors ). The division of processes based on the unit of the business logic can achieve the "relatively stable number of processes" effect by the way.
2. Encoding
Because the data in the business logic is closely coupled. If the concurrency inside the business logic is also implemented by the process, it may lead to a large amount of IPC encoding (any two processes need to write a piece of IPC code as long as there is data interaction ). This may make related programmers complain.
Of course, coding problems are not absolute. If your system has a mature and easy-to-use IPC library that can transparently encapsulate IPC-related operations, then this problem does not exist.
Fish or bear's paw:Multi-ProcessMulti-thread Selection
AboutMulti-ProcessAnd multithreading, the most classic sentence in the textbook is "process is the smallest unit of resource allocation, and thread is the smallest unit of CPU scheduling". This sentence is enough for the exam, however, if you encounter a similar choice problem at work, it is not that easy. Poor selection will cause you to suffer.
I often see some xdjm on the network and ask"Multi-ProcessGood or multithreading ?" , "In LinuxMulti-ProcessOr multithreading ?" I can only say that there is no best, but better. Determine which one is more appropriate and which one is better.
Let's look at multithreading andMulti-Process(Note: because it is a perceptual comparison, it is relative, not to say that one is too good, and the other is unbearable ).
It looks simple. The advantage is that the "thread 3.5 V 2.5 process" is compared. Do we just select the thread?
Well, I don't have to waste a bit of effort here for such a simple sentence. There is no such thing as good or bad, and there is only one more suitable question. Let's take a look at how to make a proper decision in practical applications.
1) priority threads that require frequent creation and destruction
For the reason, see the comparison above.
The most common application of this principle is the Web server. When a connection is established to create a thread, the thread will be destroyed when the thread is disconnected. If a process is used, the cost of creation and destruction will be hard to bear.
2) priority threads that require a large amount of Computing
The so-called large amount of computing, of course, consumes a lot of CPU and frequent switching. In this case, the thread is the most suitable.
The most common principles are image processing and algorithm processing.
3) strongly related processing threads and weak related processing processes
What is strong correlation or weak correlation? Theoretically, it is difficult to define. Just give a simple example to understand.
Generally, the server needs to complete the following tasks: Message sending and receiving and message processing. "Message sending and receiving" and "message processing" are weak-related tasks, and "message processing" may be divided into "Message decoding" and "service processing ", these two tasks are more relevant. Therefore, "message sending and receiving" and "message processing" can be designed by process, and "Message decoding" and "service processing" can be designed by thread.
Of course, this division method is not static and can be adjusted according to the actual situation.
4) process may be extended to multi-host distribution, and threads used for multi-core Distribution
For the reason, see the comparison above.
5) use the method you are most familiar with to meet your needs.
I can only say how to choose between the "complex and simple" dimensions of "data sharing, synchronization", "programming, debugging", and "reliability: there is no clear choice method. But I can tell you a choice principle: IfMulti-ProcessAnd multithreading can meet the requirements, then select the one you are most familiar with and best at hand.
It should be noted that although I have given so many selection principles, the actual application is basically a combination of "process + thread". Do not really fall into a misunderstanding.
Multithreading aims to make multiple threads work in parallel to complete multiple tasks and improve system efficiency. The thread is implemented when multiple tasks need to be completed at the same time.
The advantages of using threads are as follows:
· Use threads to place tasks in programs that occupy a long time in the background for processing
· The user interface can be more attractive. For example, if a user clicks a button to trigger processing of some events, a progress bar will pop up to display the processing progress.
· Program running speed may be accelerated
· When implementing some waiting tasks, such as user input, file read/write, and network data sending and receiving, the threads are swimming. In this case, we can release some precious resources, such as memory usage.
====
■ What isMulti-Process:
A process is an execution activity of a program on a computer. When you run a program, you start a process. Obviously, the program is dead (static), and the process is active (dynamic ). Processes can be divided into system processes and user processes. All the processes used to complete various functions of the operating system are system processes, and they are the operating system itself in the running state. You don't need to talk about the user processes, all processes started by you are user processes. A process is the unit in which the operating system allocates resources.
In Windows, a process is refined into a thread, that is, a process has multiple smaller units that can run independently.
In the same time, if two or more processes are allowed to run in the same computer system, this is a multi-task. Modern Operating Systems are almost all multi-task operating systems that can manage the running of multiple processes at the same time. The benefits of multitasking are obvious. For example, you can listen to MP3 while surfing the internet, and even print downloaded documents without interfering with each other. This involves the issue of parallelism. As the saying goes, it cannot be used at all. This is also true for computers. In principle, a CPU can only be allocated to one process to run the process. We usually use only one CPU in the computer, that is, there is only one heart. To make it single-purpose and multi-process running at the same time, we must use the Concurrency Technology. The Concurrency Technology is quite complex, and the easiest to understand is the "time slice rotation process scheduling algorithm". Its idea is briefly introduced as follows: under the management of the operating system, all running processes use the CPU in turn, and each process can take a very short CPU usage time (for example, 10 ms). In this way, the user cannot feel that the CPU is serving multiple processes in turn, it is as if all processes are running continuously. But in fact, only one process occupies the CPU at any time.
If a computer has multiple CPUs, the situation is different. If the number of processes is smaller than the number of CPUs, different processes can be allocated to different CPUs for running, multiple processes are actually running at the same time, which is parallel. However, if the number of processes is greater than the number of CPUs, you still need to use the Concurrency Technology.
In Windows, CPU allocation is based on threads. A process may consist of multiple threads. The situation is more complicated, but simply put, the relationship is as follows:
Total number of threads <= number of CPUs: Parallel Operation
Total threads> CPU count: Concurrent Operation
The efficiency of parallel operation is obviously higher than that of concurrent operation. Therefore, in multi-CPU computers, the efficiency of multi-task operations is relatively high. However, if you run only one process (thread) on a multi-CPU computer, you cannot take advantage of multiple CPUs.
This involves the problem of multi-task operating systems. The basic principle of multi-task operating systems (such as Windows) is: the operating system allocates CPU time slices to multiple threads, each thread is completed in the time segment specified by the operating system (note that multiple threads here belong to different processes ). the operating system constantly switches from the execution of one thread to the execution of another thread. In the macro view, it seems like multiple threads are executed together. since these threads belong to different processes, it seems to us that multiple processes are executed at the same time, thus implementing multiple tasks.
Architecture Design:
Multi-ProcessOr multithreading is like Shakespeare's "To be, or not to be, that is the question", which is always plagued by "process or thread ?" This problem is also often plagued by the design of software architecture. So today I want to talk about my experience on this issue. If you are still confused about the difference between threads and processes, please read the operating system principles book and read the Post.
As this problem can easily lead to a war of words, the statement in advance is as follows:Multi-ProcessAnd multithreading, it is impossible to generalize who is better than who is. Therefore, this post mainly describes the trade-offs between processes and threads in a specific scenario (related to the products under my responsibility), which is for reference only.
Because the specific scenario is the premise of this post, let's first talk about the features of the products that I am currently responsible: complex business logic, large data volume, high real-time data processing performance requirements, high robustness and security requirements, cross-platform requirements (including operating systems and databases) in some cases, distributed deployment is required.
As mentioned above, many application systems are in line with the above features, such as some online game servers, some financial industry business systems, and some e-commerce transaction systems. If you are engaged in the design of similar application systems, I hope the experience I will introduce below will help you.
Process Granularity
Everyone should understand that both processes and threads are the means to process concurrency. For the above complex system, if you attempt to use all processes (see note 1) or all threads (see note 2) to process concurrency, it is estimated that it will be ugly. Therefore, the key issue is how to balance processes and threads (that is, to determine the granularity of processes ).
I personally suggest that you divide processes based on the unit of business logic as much as possible. The benefits of doing so are as follows:
1. Avoid wrangling
Generally, developers of a fixed business logic are relatively fixed. If a process corresponding to the business logic crashes, the tester can quickly locate the perpetrator and directly submit the bug to him/her.
On the contrary, a process is too large, where n people are involved. Once the process crashes, it is easy for programmers to talk about each other, which is not conducive to maintaining stability and unity. In addition, because testers often do not know who the bug belongs to, and often give errors, it is easy to create internal contradictions.
It can be seen from the above that relatively fine process granularity can avoid some management troubles. As XXX often teaches us that "stability prevails over everything", this advantage is listed as the first one.
2. robustness and fault tolerance
In general, the levels of developers are uneven, and the excellent ones are a minority (For details, refer to the post on the "principles" series ). Therefore, it is inevitable that a cainiao programmer will make a low-level error, and some low-level errors will be fatal and cause the process to crash.
If you divide processes based on the business logic, the crash of a business logic process will have little impact on other business logic (unless it is the dependent party of the business logic ); therefore, there will be no problems mentioned in "NOTE 2.
3. Distributed
The distributed deployment requirements that I often encounter are generally divided by business logic. For example, the system has an authentication module containing sensitive user authentication information. At this time, the customer will require that the module be separately deployed in a security reinforcement host (to prevent class enemies from being damaged ).
If the process is divided by business logic, it is relatively easy to meet the above deployment requirements (as long as the appropriate inter-process communication mechanism is used together, we will mention it below ).
In addition, the support for distributed deployment can also solve performance problems. For example, if a business logic module consumes hardware resources (such as memory, CPU, hard disk, and bandwidth), it can be taken out and run on a single machine.
4. Cross-programming languages
This benefit may be easily overlooked by many people. In general, each programming language has its own advantages and disadvantages. If you divide processes by business logic, you can select the appropriate programming language based on the characteristics of different business logic.
For example, I use C ++ for performance-sensitive modules, while Java or python is used for business logic-intensive modules.
Inter-process communication (IPC) Problems
Since it is impossible to put the entire system into a process, it will inevitably encounter the IPC problem. The following describes how to select IPC.
There are many strange IPC types in various operating systems. To consider cross-platform, first cut off a batch (I will mention cross-platform IPC in the "cross-platform development" series ). Among the remaining IPC types, there are not many IPC capable of data transmission, mainly including sockets, shared memory, pipelines, and files.
Among them, socket is my highly recommended IPC method, for the following reasons: using socket can naturally support distributed deployment; the use of socket can easily achieve the combination of multiple programming languages (such as C ++, Java, Python,FlexSupports sockets). You can use socket to save a large block of "Lock operation" code.
Some people may worry about the performance of the socket. When two processes perform socket communication on the local machine, the localhost loopback address can be used, and the data does not need to pass through the physical Nic. The operating system kernel can also be optimized. In this case, the performance of the socket is not significantly different from that of other IPC Mechanisms.
Finally, we can add that the socket method can effectively prevent wrangling. For example, John wrote process a and process B, and process a sent data to process B through socket. Suddenly one day, the communication between the two processes failed. Then, Michael said that there was an error in receiving the data, and Michael said that there was an error in sending the data. What should I do at this time? It's easy. Just find a sniffer software and capture the data packets on the spot and dump them to see the problem.
Why thread?
As mentioned aboveMulti-ProcessSome people asked: "What is the use of the thread ?" Generally, threads are used for two reasons: performance factors and convenient coding.
1. performance factors
Some processes in some operating systems (such as Windows) are heavy. If a process is frequently created or a large number of processes are created, the load on the operating system is too high. Example:
Suppose you want to develop an application similar to Web server. You create a corresponding process for each client request for data interaction (do you think of the ancient CGI :-). Once the system is scaled up, your applications will immediately become overwhelmed as the number of concurrent connections increases.
The above example shows that the number of processes in the cross-platform software system must be relatively stable. If the number of processes increases linearly with certain environmental factors, it is pretty bad (by the way, if the number of threads increases linearly with environmental factors ). The division of processes based on the unit of the business logic can achieve the "relatively stable number of processes" effect by the way.
2. Encoding
Because the data in the business logic is closely coupled. If the concurrency inside the business logic is also implemented by the process, it may lead to a large amount of IPC encoding (any two processes need to write a piece of IPC code as long as there is data interaction ). This may make related programmers complain.
Of course, coding problems are not absolute. If your system has a mature and easy-to-use IPC library that can transparently encapsulate IPC-related operations, then this problem does not exist.
Fish or bear's paw:Multi-ProcessMulti-thread Selection
AboutMulti-ProcessAnd multithreading, the most classic sentence in the textbook is "process is the smallest unit of resource allocation, and thread is the smallest unit of CPU scheduling". This sentence is enough for the exam, however, if you encounter a similar choice problem at work, it is not that easy. Poor selection will cause you to suffer.
I often see some xdjm on the network and ask"Multi-ProcessGood or multithreading ?" , "In LinuxMulti-ProcessOr multithreading ?" I can only say that there is no best, but better. Determine which one is more appropriate and which one is better.
Let's look at multithreading andMulti-Process(Note: because it is a perceptual comparison, it is relative, not to say that one is too good, and the other is unbearable ).
It looks simple. The advantage is that the "thread 3.5 V 2.5 process" is compared. Do we just select the thread?
Well, I don't have to waste a bit of effort here for such a simple sentence. There is no such thing as good or bad, and there is only one more suitable question. Let's take a look at how to make a proper decision in practical applications.
1) priority threads that require frequent creation and destruction
For the reason, see the comparison above.
The most common application of this principle is the Web server. When a connection is established to create a thread, the thread will be destroyed when the thread is disconnected. If a process is used, the cost of creation and destruction will be hard to bear.
2) priority threads that require a large amount of Computing
The so-called large amount of computing, of course, consumes a lot of CPU and frequent switching. In this case, the thread is the most suitable.
The most common principles are image processing and algorithm processing.
3) strongly related processing threads and weak related processing processes
What is strong correlation or weak correlation? Theoretically, it is difficult to define. Just give a simple example to understand.
Generally, the server needs to complete the following tasks: Message sending and receiving and message processing. "Message sending and receiving" and "message processing" are weak-related tasks, and "message processing" may be divided into "Message decoding" and "service processing ", these two tasks are more relevant. Therefore, "message sending and receiving" and "message processing" can be designed by process, and "Message decoding" and "service processing" can be designed by thread.
Of course, this division method is not static and can be adjusted according to the actual situation.
4) process may be extended to multi-host distribution, and threads used for multi-core Distribution
For the reason, see the comparison above.
5) use the method you are most familiar with to meet your needs.
I can only say how to choose between the "complex and simple" dimensions of "data sharing, synchronization", "programming, debugging", and "reliability: there is no clear choice method. But I can tell you a choice principle: IfMulti-ProcessAnd multithreading can meet the requirements, then select the one you are most familiar with and best at hand.
It should be noted that although I have given so many selection principles, the actual application is basically a combination of "process + thread". Do not really fall into a misunderstanding.