Server multi-thread concurrency model and multi-process concurrency Model Selection

Source: Internet
Author: User

When implementing the server program on UNIX: Should I choose the multi-thread concurrency model or the multi-thread concurrency model?

The reason for this problem is to read the source code of scgi, nginx, and memched ~ As follows:


Scgi is implemented using multiple processes. The main process is responsible for listening to socket connection requests and distributing them to various sub-processes for processing.
Nginx is implemented by using multiple processes. After a sub-process is created, each sub-process directly listens to and processes socket connection requests.
The implementation of memched is multi-thread. The main thread is responsible for listening to requests and distributing them to various subthreads for processing.

In this case, it is essentially a combination of two optional conditions.
1. multithreading or multi-process?
2. Does the master monitor requests uniformly and distribute the requests to the worker for processing, or does each worker listen for and process the requests by itself?

The difference between the 2nd points is derived from:
Do you need to create a socket before Fork sub-process or Fork sub-process before creating a socket?

Because of the socket created first, the sub-process generated by fork naturally inherits this feature, so it can be used directly to listen for requests. The latter does not work ···

========================================================== ==========
Let's talk about the difference between the 2nd points:
I think if we say: let each worker listen for requests on its own: Because it listens to the same socket, it will compete! Who accept first? Therefore, an accept lock is also implemented to allow various processes to compete for this lock. Otherwise, when a request is sent, all sub-processes will read warkup ·······

What if the master monitors and distributes data in a unified manner? Well, the master has to know whether each worker is ready for use. How can I know? Okay · communication between various IPC processes · it seems that there is a lot of overhead ···


========================================================== ====================

So how can we choose between And? What are the advantages and disadvantages? What are the applicable scenarios?


Problem 1: multi-thread or multi-process selection

I understand this problem. In essence, multiple processes use their own memory space, while multiple threads in the same process share the memory space.

First, the essence of the server program is to use the client to obtain and modify data. Therefore, both nginx and memcached exist to solve this problem.

For memcached (multi-thread single process): After the process is started, all content written to the cache by the user is written in the heap space of the process. That is to say, all the data is concentrated in the only process space, so that each thread can completely share the data, and all the data can be queried and modified; if we use multiple processes for implementation, what data does the user input cache have? After it is passed in, it is processed by a process. The processing result must exist in the memory space of the process itself. What should I do if other processes want to read it? IPC? How much consumption!
Because the memory space of multiple processes is independent, the incoming data is also independent of each other, rather than stored together! In this way, you have to use IPC for cross-process queries, and it is frequently used. You do not know which sub-process to query! Therefore, this method is obviously not suitable. If the sub-process obtains the data imported by the user, it is not stored in its own process, instead, it needs to be written into a multi-process shared memory so that the data can be changed to a global one to avoid IPC? This implementation is equivalent to writing the data transmitted by each process to the process shared memory! Each process only needs to operate on the shared memory. Of course this implementation is okay, but it is clear that it is not easy to use a single process with multiple threads. In this way, all data is directly stored in the process space, rather than in the shared memory! Easy to operate!


From the above perspective, the reason why multithreading is selected is based on global data. All processes/threads that process requests must read globally, rather than in their own space !!

Because multiple processing processes/Threads read and write the same big data at the same time, synchronization is inevitable and operations such as simultaneous writing are avoided. So some locks are added.

What about the implementation of MySQL? Multithreading is also used, which can be easily explained based on the above reasons. After each request arrives: each worker needs to process global data, so it is required that these workers can access the same piece of data. Therefore, multithreading is a suitable method. Everyone is in the memory space of the same process! Of course: synchronization also involves the issue of read/write locks ~~

========================================================== ========================================================== ==================================

What about nginx? Why are multiple processes used ??

In fact, one sentence can be explained here: it only reads the server script and will not be changed.

After a request arrives at nginx, if the CGI protocol is not used, multiple processes are directly started to call and run the backend server script. Therefore, nginx is only responsible for reading server scripts, such as a PHP script.

It is a read and does not involve writing/updating ··

How does nginx write data? In fact, the data written here is written to the database. The PHP/Python script is only added to a processing layer in the middle. This layer is just a medium !!! No specific data is written.

So after a write request arrives at nginx: the job is to read the PHP script first, and then the PHP script calls MySQL to write the global database! In this case, nginx only reads scripts and does not involve any writing !! PHP scripts are just a medium of nginx ~~ Use PHP to operate data! Therefore, the real read and write data is in the back-end MySQL database. Here, multithreading is used to operate global data (the global data of memcache exists in the memory, while the global data of MySQL exists in the hard disk) ~~ Nginx only reads PHP scripts, and each PHP script is independent of each other. Therefore, you do not need to synchronize the scripts during reading. Therefore, you can use multithreading and multi-process operations here !! What about the use of multi-process? If multiple processes are used, each process is independent of each other. A crash will not affect other processes. multithreading: if any sub-thread crashes, all other sub-threads will die! This means that the use of multithreading means that the request processing of each user is mutually affected. When one user fails, the requests of other users follow the tragedy, and the multi-process will not affect each other!
So it is more suitable for using multi-process!

That is, I think there is no difference between using multiple processes in nginx and multithreading in other aspects. The biggest consideration is the issue of mutual influence: In order to ensure that each request processing does not affect each other, finally, select multiple processes !!

========================================================== ========================================================== ==========================================

To sum up, I think that multithreading is used for global data write/update operations; otherwise, multi-process robustness is improved.

========================================================== ==================

What about the comparison of the two methods below?

① The Master is responsible for listening and distributing requests, and each worker receives and processes requests from the master.

② Workers directly listens for and Processes requests by themselves

Analyze the next article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.