Applications for multi-threaded servers

Last Update:2015-10-24 Source: Internet

Author: User

Tags mutex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: http://blog.csdn.net/Solstice/article/details/5334243

Aboutspeaker (Giantchen_at_gmail)

Blog.csdn.net/solstice

Feb 28

This article was originally a blog, "Multi-threaded server Common Programming Model" (hereinafter referred to as the "common Model") a section of the plan, today finally finished.

"Server development" all-encompassing, the meaning of the "server development" in this article refer to the article "Common model", one sentence is: running on a multi-core machine Linux user-configured long-running network applications without user interface. "Long-term operation" does not mean that the program 7x24 does not restart, but the program will not because there is nothing to do and quit, it will wait for the next request to come. For example wget is not long-running, httpd is long-running.

Justify

As in the previous article, the "process" in this paper refers to the product of the fork () system call. "Thread" refers to the product of Pthread_create (), and I mean the pthreads is NPTL, each thread is generated by clone (), corresponding to the task_struct of a kernel. The development language used in this article is C + + and the running environment is Linux.

First, a distributed system consisting of multiple machines must be multi-process (literally), because processes cannot cross OS boundaries. In this context, we focus on a machine, a common server with at least 4 cores. If you want to provide a service or perform a task on a multi-core machine, the available modes are:

Running a single thread of the process
Run a multi-threaded process
Processes that run multiple single threads
To run multiple multithreaded processes

The comparisons between these patterns are commonplace and simply summed up:

Mode 1 is not scalable (scalable), can not play the computing power of multi-core machine;
Mode 3 is currently recognized as the mainstream model. It has two seed modes:
- 3a simply to run the process in mode 1 multiple copies, if you can use multiple TCP port to provide services to the external;
- The 3b main process +woker process if it must be bound to a TCP port, such as httpd+fastcgi.
Mode 2 is a lot of people despise, think multithreaded program difficult to write, and no more than mode 3 has any advantage;
Mode 4 is CHOUFSO, it not only does not combine the advantages of 2 and 3, but brings together the shortcomings of both.

This article mainly wants to discuss the pros and cons of pattern 2 and mode 3b, namely: When a server program should be multi-threaded.

Functionally speaking, nothing is multi-threaded can do and single-threaded do not , and vice versa, are state machine well (I am glad to see the counter example). In terms of performance, multithreading has no advantage in either IO bound or CPU bound services . So why on earth do you use multithreading?

Before I answer this question, I'll talk about the need to use a single thread.

Must be used on a single thread occasion

As far as I know, there are two ways to use a single thread:

The program May Fork ()
Limit the CPU usage of the program

First, fork (), I mentioned in the revelation of Linux new system invocation:

Fork () is generally not called in multithreaded programs because the fork () of Linux only clones thread of control for the current threads and does not clone other threads. In other words, you cannot fork () a Cheng Zi process as much as the parent process, and Linux does not have a system call like Forkall (). Forkall () is also very difficult (semantically), because other threads may be waiting on the condition variable, may block on the system call, may wait for the mutex to enter the critical section, and perhaps in the dense calculation, these are not good overall move into the sub-process.

Even worse, if a certain other thread A has acquired the mutex in the instant of the fork (), the mutex will never be released because the new process of fork () does not have this "thread a", and the new process will no longer be able to acquire that mutex, otherwise it would deadlock. (This is only speculation, and no experiments have been made, and it is not ruled out that fork () will release all of the mutex's possibilities. ）

In conclusion, a program designed to call Fork () must be single-threaded, such as the "watchdog process" I mentioned in the article "Revelation." Multithreaded is not a way to call fork (), but to do so will encounter a lot of trouble, I can not think of reasons to do.

After a program fork (), there are generally two kinds of behaviors:

Immediately executes exec () and becomes another program. For example, the shell and inetd, or lighttpd fork () out of the child process, and then run the FASTCGI program. Or the daemon that is responsible for starting the job in the cluster running on the compute node (that is, what I call a "watchdog process").
Do not call EXEC () to continue running the current program. Either by communicating with the parent process through the shared file descriptor, or by taking the file descriptor from the parent process to complete the work independently, such as the 80 's Web server NCSA httpd.

In these behaviors, I think that only the "watchdog process" must stick to a single thread, and the others can be replaced with multi-threaded (functionally speaking) threads.

Single threaded programs can limit the CPU usage of the program.

This is easy to understand, for example, on a 8-core host, a single-threaded application, even if busy-wait (either because of a bug or because of overload), has a CPU utilization of only 12.5%, or 1 cores. In this worst-case scenario, the system still has 87.5% of the compute resources available for other service processes to use.

Therefore, for some auxiliary programs, if it must be run on the same machine as the main function process (for example, it should monitor the state of other service processes), then a single thread can avoid excessive looting of the computing resources of the system.

Process-based Distributed system design

In the article "Common model", it is mentioned that the Software design and function Division of Distributed system should be in the "process" unit generally. I advocate multithreading, not to put the entire system into a process to achieve, but rather, after the functional division, in the implementation of each type of service process, when necessary to improve performance by multithreading. For the entire distributed system, to be able to scale out, that is, to enjoy the benefits of increasing the machine.

For upper-level applications, the code volume for each process is controlled below 100,000 lines C + +, excluding the amount of code for the ready-made library. So every process can be fully understood by a single brain, without confusion. (In fact, I would like to say 50,000 lines.) ）

Here is a good article on Google "Introduction to Distributed System Design". One of the finishing touches is: Distributed system design, which is designed for failure.

This article continues to discuss when a service process should use multithreading, first of all, the advantages of single-threaded.

Advantages of Single Threaded

From a programmatic point of view, the advantages of single-threaded programs need not be said: simple. The structure of the program is generally known as the "common model", which is an IO multiplexing-based event loop. Or, as the cloud says, use blocking IO directly.

The typical code framework for event loops is:

while (!done) {
int retval =::p oll (FDS, Nfds, Timeout_ms);
if (retval < 0) {
Handling Errors
} else {
Processing of expired Timers
if (retval > 0) {
Handling IO Events
}
}
}

The event loop has an obvious drawback, which is non-preemption (non-preemptive). Assuming event A has a priority higher than event B, processing event A requires 1ms, and handling event B requires 10ms. If event B occurs earlier than a, then when event a arrives, the program has left the poll () call to start processing event B. Event A has to wait up to 10ms for the opportunity to be processed, with a total response time of 11ms. This equals a priority reversal.

This drawback can be overcome by multithreading, which is also the main advantage of multithreading.

Is there a performance advantage to multi-threaded threads?

I said before, whether it is IO bound or CPU bound services, multithreading has no absolute sense of performance advantage. The meaning of this sentence is explained in detail here.

This means that if you can run out of Io with very little CPU load, or you can run a CPU full with very little IO traffic, there's no use for multiple threads. For example:

For static Web servers, or FTP servers, the CPU load is lighter, with the main bottleneck in disk IO and network IO. This is often a single-threaded program (Mode 1) to fill the IO. Multithreading does not increase throughput because the IO hardware capacity is saturated. In the same vein, increasing the number of CPUs does not increase throughput.
The CPU ran out of the rare situation, here I have to invent an example. Suppose there is a service whose input is n integers, asking if I can select M integers and make them 0 (here n < m > 0). This is the famous subset sum problem, which is np-complete. For such a "service", even a small value of n will cause the CPU to die, such as n = 30, one input at a time of only 120 bytes (32-bit integer), the CPU can be up to a few minutes to calculate. For this application, Mode 3a is the most suitable, can play the advantages of multi-core, the program is simple.

In other words, if either side arrives at the bottleneck early, there is no advantage to multi-threaded threads.

Speaking of which, there may have been a reader impatient: you speak so much, are talking about the benefits of single-threaded, so what is the use of multi-threading?

Scenarios for multi-threaded applications

I think the multi-threading scenario is to increase the response speed, so that IO and "compute" overlap each other, reducing latency.

Although multi-threading does not improve absolute performance, it can improve average response performance.

A program to be made multi-threaded, roughly to meet:

There are multiple CPUs available. The advantages of multithreading on a single-core machine are not obvious.
There is shared data between threads. If you do not share the data, use Model 3b. While we should minimize shared data between threads, it does not mean no;
The shared data is modifiable, not a static table of constants. If the data cannot be modified, then you can use shared memory between processes, mode 3 can be competent;
non-homogeneous services are provided. That is, the response of the event has a priority difference, and we can use a dedicated thread to handle high-priority events. Prevent priority reversal;
Latency and throughput are equally important, not logic-simple IO bound or CPU bound programs;
Take advantage of asynchronous operations. Like logging. Either writing log file to disk or sending a message to log server should not block critical path;
can scale up. A good multi-threaded program should be able to enjoy the benefits of increasing the number of CPUs, the current mainstream is 8 cores, will soon use the 16-core machine.
With predictable performance. As the load increases, performance slows down more rapidly than a certain critical point. The number of threads generally does not vary with the load.
Multithreading can effectively divide responsibility and function, so that the logic of each thread is simple, the task is single, and it is easy to encode. Instead of stuffing all the logic into an event loop, just like the Win32 SDK program.

These conditions are more abstract, and here is a concrete (albeit fictional) example.

Suppose you want to manage a fleet of Linux servers with 8 compute nodes and 1 control nodes. machine configuration is the same, dual quad-core CPU, gigabit network interconnection. Now you need to write a simple fleet management software (refer to LLNL's Slurm), which consists of three programs:

The master running on the control node, which monitors and controls the state of the entire cluster.
The slave, which is shipped on each compute node, is responsible for starting and terminating the job and monitoring the resources of this machine.
The client command-line tool for the end user to submit the job.

According to the previous analysis, slave is a "watchdog process," which initiates other job processes and therefore must be a single threaded procedure. In addition, it should not consume too much CPU resources, which is also suitable for single-threaded models.

Master should be a multi-threaded module with pattern 2:

It has a 8-core machine, and if you use model 1, it wastes 87.5% of the CPU resources.
The state of the entire cluster should be fully in memory, which is shared and mutable. If you use Pattern 3, the state synchronization between processes can become a big problem. And if a large amount of shared memory, is equal to deceiving, in a multi-process cloak of multithreaded programs.
Master's main performance indicator is not throughput, but latency, which responds to various events as quickly as possible. It rarely occurs when the IO or CPU is running full.
There is a priority difference between the events that the master monitors and the processing priority of a program's end-of-operation and exception-crashing, and the priority of both the compute-node's disk is full and the chassis temperature is too high. If a single thread is used, a priority reversal may occur.
Assuming a TCP connection between master and each slave, Master uses 2 or 4 IO threads to handle 8 TCP connections to effectively reduce latency.
Master is going to write log asynchronously to the local hard disk, which requires the logging library to have its own IO thread.
Master has the possibility to read and write to the database, then the database connection this third-party library may have its own thread, and callback master's code.
Master to serve multiple clients, and multithreading can also reduce customer response time. That is, it can use 2 more IO threads to handle and clients the communication.
Master can also provide a monitor interface to broadcast the state of the (pushing) cluster so that the user does not have to actively poll (polling). This feature is easier to implement if it is done in a separate thread, and does not mess with other major functions.
Master has a total of 10 threads open:
- 4 IO threads for communication with slaves
- 1 x Logging Threads
- 1 Database IO Threads
- 2 IO threads for communication with clients
- 1 main threads for background work, such as job scheduling
- 1 pushing threads for actively broadcasting the state of a fleet
Although the number of threads is slightly higher than the number of cores, these threads are often idle and can rely on the OS's process scheduling to ensure a manageable delay.

In summary, Master is naturally and efficiently written in multithreaded form.

Classification of threads

In my experience, a thread in a multithreaded service program can be broadly divided into 3 categories:

IO threads, the main loop of such threads is IO multiplexing, etc. on the select/poll/epoll system call. This type of thread also handles timed events. Of course, it functions more than IO, and some calculations can be put into it.
Compute threads, the main loop for such threads is the blocking queue, and so on condition variable. Such threads are typically located in the thread pool.
The thread used by the third party libraries, such as logging, or database connection.

Server programs generally do not start and terminate threads frequently. Even in the program I wrote, the create thread was called only when the program was started and not during the service run.

In the multicore era, multithreaded programming is inevitable, "ostrich algorithm" is not the way.

Applications for multi-threaded servers

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More