The evolution of high concurrency Web services: Saving system memory and CPU

Source: Internet
Author: User
Tags epoll sendfile

An increasing number of concurrent connections

The number of concurrent connections faced by Web systems has increased exponentially in recent years, and high concurrency has become the norm, bringing great challenges to web systems. The simplest and most brutal way to solve this is to increase the machine and upgrade hardware configuration of the web system. Although the hardware is getting cheaper now, it is expensive to blindly increase the number of machines to solve the growth of concurrency. A more effective solution is the combination of technology optimization solutions.

Why does the number of concurrent connections grow exponentially? In fact, this number has not risen exponentially from the user base of the last few years, so it is not the main reason. The main reason is that the web has become more complex and more interactive.

1. More page elements, complex interaction

Web page elements are becoming more and more abundant. More resource elements mean more download requests. The interaction of web systems is becoming more and more complex, and the interaction scenes and times increase dramatically. To "" the first page for example, refresh once, there will be about 244 requests. Also, after the page is opened, there are some scheduled queries or escalation requests that continue to work.

The current HTTP request, in order to reduce the repeated creation and destruction of the connection behavior, usually establish a long connection (Connection keep-alive). Once established, the connection is kept for a period of time and reused for subsequent requests. However, it also brings up another new problem, which is that the maintenance of the connection will take up the resources of the Web system server, and if the connection is not fully used, it will lead to waste of resources. After a long connection is created, the first resources are transferred, and then there is little data interaction until the timeout period, which automatically frees up the system resources that the long connection occupies.

In addition, there are some web requirements that need to remain connected for a long time, such as a Web socket.

2. The number of connections to the main browser is increasing

In the face of more and more rich web resources, the number of concurrent connections in the mainstream browser is also increasing, the same domain, the early browser generally only 1-2 download connections, and the current mainstream browser is usually 2-6. Increase the number of browser concurrent connections, the need to download more resources in the scene, you can speed up the loading of the page. More connections are good for browsers to load page elements, and other normal download connections can continue to work if some connections encounter "network congestion."

This naturally invisibly increases the pressure on the backend of the web system, and more download connections mean more resources for the Web server. At the peak of user access, the "high concurrency" scenario is formed by self-heating. These connections and requests occupy a large amount of CPU and memory resources on the server. In particular, it is necessary to use more download connections in Web pages where the number of resources exceeds 100+.

Second, the Web front-end optimization, reduce service pressure

In the mitigation of "high concurrency" of the pressure, the front and back end of the joint optimization to achieve maximum results. At the forefront of the user's web front end, you can reduce or mitigate the effect of HTTP requests.

1. Reduce Web Requests

The common implementation method is controlled by expire or max-age in the HTTP protocol header, the static content is placed in the browser's local cache, and for a period of time, the Web server is no longer requested and the local resources are used directly. There is also local storage technology (Localstorage) in HTML5, which is also used as a powerful local cache of data.

After this scenario is cached, the request to the Web server is not sent at all, significantly reducing server pressure and also bringing a good user experience. However, this scheme is invalid for first-time users, and also affects the real-time of some Web resources.

2. Mitigating Web Requests

The browser's local cache has an expiration time, and once it expires, it must be re-requested from the server. At this time, there are two situations:

(1) The server's resource content is not updated, the browser requests the Web resource, and the server replies "can continue to use local cache." (communication occurs, but the Web server only needs to do a simple "reply")

(2) The file or content of the server has been updated, the browser requests the Web resource, and the Web server transmits the new resource content over the network. (The Web server needs to complete complex transfer work when communication occurs.)

The negotiation method here is controlled by the last-modified or ETag of the HTTP protocol, this time requesting the server, if the content is not changed, the server will return 304 not Modified. In this way, you do not need to request the Web server every time the complex transfer of complete data files, as long as the simple HTTP response can achieve the same effect.

Although the above request, the pressure to "lighten" the Web server, but the connection is still established, the request also occurred.

3. Merging page requests

If it is older web developers, it should be more impressive before Ajax prevails. Most of the pages are output directly, and there are not so many Ajax requests that the Web backend completely pieced together the contents of the page and back to the front end. At that time, the page static, is a very extensive optimization method. Later, the more interactive Ajax was replaced, and the request for a page became more and more.

Because the mobile network (2G/3G) is much worse than the PC broadband, and some of the phone configuration is relatively low, in the face of a page of more than 100 requests, loading speed will be much slower. As a result, the direction of optimization goes back to merging page elements, reducing the number of requests:

(1) Merge HTML display content. Embed CSS and JS directly into the HTML page without being introduced in a connected way.

(2) Ajax dynamic content merge request. For dynamic content, combine 10 Ajax requests into 1 batches of information queries.

(3) Small picture merge, through the CSS offset technology sprites, a lot of small pictures merged into one. This optimization method is also very common in Web optimization in PC-side.

Merge requests, which reduce the number of times the data is transferred, is the "bulk" request that translates them from one to the other and into a single request. The above optimization method reaches the goal of "mitigating" Web server stress, reducing the need to establish a connection.

Third, save the memory on the Web service side

The optimization of the front end, we need to focus on the Web server itself. Memory is a very important resource for Web servers, and more memory often means more work tasks can be put together at the same time. For a Web service to occupy memory, it can be roughly divided:

(1) The basic memory used to maintain the connection, and when the process is initialized, some base modules are loaded into memory.

(2) The transmitted data contents are loaded into each buffer, occupying the memory.

(3) Application and use of memory during program execution.

If you maintain a connection that can consume as little memory as possible, then we can maintain more concurrent connections, allowing the Web server to support more concurrent connections.

Apache (httpd) is a mature and ancient Web service, and the development and evolution of Apache has been pursued to do this, and it tries to continuously reduce the amount of memory that the service occupies to support greater concurrency. From the perspective of the evolution of Apache's work model, let's take a look at how they optimize memory issues.

1. Prefork MPM, multi-process operation mode

Prefork is the most mature and stable mode of operation of Apache, and is still widely used even now. After the main process is generated, it completes the underlying initialization work, and then, by fork, pre-generates a batch of child processes (the child process replicates the memory space of the parent process and does not need to do the underlying initial chemical works). Then wait for the service, which is pre-generated to reduce the overhead of frequently creating and destroying processes. The benefit of multiple processes is that memory data between processes does not interfere with each other, and that a process that terminates abnormally does not affect other processes. However, in terms of memory, each httpd child process consumes a lot of memory because the child process's memory data is replicated by the parent process. We can roughly assume that there is a lot of "duplicate data" in memory. Ultimately, the maximum number of child processes that we can generate is very limited. In the face of high concurrency, because there are a lot of keep-alive long connections, these sub-processes "Occupy" live, it is likely to lead to the depletion of the available child processes. Therefore, prefork is not very suitable for high concurrency scenarios.

    • Advantages: Mature and stable, compatible with all new and old modules. At the same time, there is no need to worry about thread safety. (for example, our common mod_php, which compiles PHP into Apache's submodule, does not need to support thread safety)
    • Disadvantage: A service process consumes a lot of memory.

2. Worker MPM, multi-process and multi-threaded mixed mode

Worker mode is a mixed mode that uses multi-process and multithreading compared to prefork. It also pre-fork several sub-processes (a small number), and then each child process creates some threads (including a listener thread). Each request comes over and is assigned to 1 threads to service. Threads are lighter than processes because threads typically share the memory space of the parent process, so memory consumption is reduced. In high concurrency scenarios, there are more threads available because they are more memory-efficient than prefork.

However, it does not solve the problem of Keep-alive's long connection "hogging" the thread, except that the object becomes a relatively lightweight thread.

Some people will find it strange, so why not use multithreading here, but also to introduce multi-process? Because stability also needs to be considered, if one thread hangs, it causes the other normal child threads to hang up under the same process. If all are multi-threaded and one thread hangs, the entire Apache service is "wiped out". The current mode of work is affected only by part of Apache services, not the entire service.

Threads share the memory space of the parent process, reducing the memory footprint and causing new problems. Is "thread safe", where multiple threads modify the "competitive behavior" caused by shared resources, forcing the modules we use to support "thread safety". Therefore, it has somewhat increased the instability of web services. For example, the PHP extension used by mod_php also needs to support "thread safety", otherwise it cannot be used in this mode.

    • Pros: Take up less memory and perform better with high concurrency.
    • Disadvantage: Thread safety must be considered, while the introduction of locks increases CPU overhead.

3. Event MPM, multi-process and multithreaded mixed mode, introducing Epoll

This is the newer model in Apache, which is already stable and available in the current version (Apache 2.4.10). It is much like the worker pattern, and the biggest difference is that it solves the problem of resource wasting for long-occupied threads under the keep-alive scenario. In the event mpm, there will be a dedicated thread to manage these keep-alive types of threads, and when there is a real request coming in, pass the request to the service thread, and then allow it to be released when the execution is complete. It reduces the waste of resources that "occupy" the connection without using, and enhances the request processing capability in high concurrency scenarios. Because fewer threads are "idle", the number of threads decreases, and the memory footprint decreases in the same scenario.

Event MPM fails when it encounters some incompatible modules, it will fall back into worker mode, and a worker thread can process a request. The new Apache official comes with modules, all of which support event MPM. Note that the event MPM requires the support of the Linux system (Linux 2.6+) for Epoll to be enabled. Apache's three modes in real-world scenarios, the event MPM is the most memory-saving.

4. Use more lightweight nginx as a Web server

While Apache's continuous optimization reduces memory footprint, it increases the ability to handle high concurrency. However, as mentioned earlier, Apache is an old and mature Web service, and the integration of many stable modules is a relatively heavy web service. Nginx is a relatively lightweight web service that takes up less than Apache's natural memory. Furthermore, Nginx serves n connections through a process. The way that is used is not that Apache increases the process/thread to support more connections. For Nginx, it creates a lot of processes/threads less, which reduces the overhead of a lot of memory.

Static file of QPS performance test results, Nginx performance is about 3 times times more than the Apache processing of static files. The Qps,nginx of dynamic files such as PHP is usually done by means of fastcgi and php-fpm communication, and PHP is a non-existent external service. Apache usually compiles PHP into its own word module (the new version of Apache also supports fastcgi). PHP dynamic file, nginx performance is slightly inferior to Apache.

5. Sendfile Save Memory

Apache, Nginx and many other Web services, all with sendfile support. Sendfile can reduce the memory footprint by reducing the amount of data to a "user-state memory space" (user buffer) copy. Of course, the first reaction of many students is to ask why? In order to be as clear as possible about this principle, we will first go back to the Linux kernel and user-state storage space interaction.

In general, the user state (that is, the memory space of our program) is not directly read-write or operation of various devices (disk, network, terminal, etc.), in the middle usually use the kernel as a "middleman" to complete the operation of the device or read and write.

Read a file from disk and write to file B with the simplest disk read and write example. A file data is started from disk, then loaded into the "kernel buffer", and then copied to the "User buffer" before we can process the data. When writing, the same is the same, loaded from the "user-state buffer" to "kernel buffer", and finally written to disk B file.

It's tiring to write files like this, so some people think they can skip the "user buffer" copy. In fact, this is the implementation of MMP (memory-mapping, Memory mapping), the establishment of a direct mapping of disk space and memory, the data is no longer copied to the "user-state buffer", but instead returns a pointer to the memory space. Thus, our previous read and write file example, will become, a file data from the disk loaded into the "kernel buffer", and then copy from "Kernel buffer" to the B file "kernel buffer", b file and then from "Kernel buffer" write back to disk. This process reduces a memory copy and also consumes less memory.

Well, back to Sendfile's topic, to put it simply, the sendfile approach is similar to MMP, which is to reduce the amount of data from the "kernel buffer" to the "User state buffer" memory copy.

The default disk file reads, to the transfer to the socket, the process (without using sendfile) is:

After using Sendfile:

In this way, not only the memory is saved, but also the CPU overhead.

Iv. saving the CPU of the Web server

For a Web server, the CPU is another very core system resource. Although generally, we think that the execution of the business process consumes our main CPU. However, in the case of Web services, multi-threaded/multi-process context switches are also more CPU intensive. A process/thread usually does not occupy the CPU for a long time, when there is blocking, or when it runs out, it can not continue to occupy the CPU, this time, a context switch occurs, the CPU time slice from the old process/thread switch to the new one. In addition, in scenarios where the number of concurrent connections is high, polling and detection of the status of connections (socket file descriptors) established by these users is also more CPU-intensive.

and the development and evolution of Apache and Nginx are also working to reduce CPU overhead.

1. Select/poll (earlier versions of Apache I/O multiplexing)

Typically, Web services maintain a number of socket file descriptors that are communicated to the user, and I/O multiplexing is intended to facilitate the management and detection of these file descriptors. Earlier versions of Apache, using the Select mode, simply put the socket file descriptor that we are interested in to the kernel and let the kernel tell us that those descriptors are actionable. Poll and select principle are basically the same, so put together, the difference between them, will not be redundant ha.

Select/poll returns a collection of file descriptors that we have previously submitted (the kernel modifies the identifier bits of the socket file descriptor that is readable, writable, or abnormal), and we need to poll to get the file descriptor that we can manipulate. In this process, repeated execution is carried out continuously. In the actual application scenario, most of the socket file descriptors that we monitor are "idle", that is, they cannot be manipulated. We polled the whole set to find a few sockets that we could manipulate. As a result, as we monitor more socket file descriptors (increasing number of users concurrent connections), this polling is becoming more and more heavy, resulting in increased CPU overhead.

If the socket file descriptor that we are monitoring is almost "active", it is more appropriate to use this mode instead.

2. Epoll (new Apache event Mpm,nginx support)

Epoll is an I/O multiplexing officially supported by Linux2.6 and we can understand that it is an improvement to select/poll. First, we also tell the kernel the set of socket file descriptors that we care about, and register the "callback function" with them, and notify us via a callback function if a socket file is ready. Therefore, we do not need to specifically poll the entire set of socket file descriptor, directly can get the operation of the socket file descriptor. So, most of those "idle" descriptors, we don't iterate. Even if we monitor the socket file description More and more, we polled only "active operational" socket file descriptor.

In fact, there is an extreme point of the scenario, that is, all of our file descriptors are almost "active", which results in a large number of callback function execution, but also increase the CPU overhead. However, in the real world of Web services, most of the time, there are many "idle" connections in the Connection collection.

3. Thread/Process creation destruction and context switching

Typically, Apache is a process/thread serving a connection within a certain time period. So, Apache has a lot of processes/threads that serve a lot of connections. Web services at peak times, there are many processes/threads that create a lot of context switching overhead. In Nginx, it typically has only 1 master master processes and several worker subprocess, and then 1 worker processes serve many connections, thus saving the CPU's context switching overhead.

Although the two models are different, but in fact can not directly out of the good or bad, in general, each has their own advantages, do not speculate on Kazakhstan.

4. The overhead of locking the CPU under multi-threading

Both the worker and event modes in Apache are multithreaded. Multithreading because of the memory space of the shared parent process, when accessing the shared data, there is a competition, that is, thread safety issues. So usually the lock is introduced (Linux is more commonly used for thread-related locks with mutex Metux, read-write lock rwlock, etc.), the thread that successfully acquires the lock can continue to execute, and gets the usual choice of failure to wait for blocking. The mechanism of the introduction of locks, the complexity of the program tends to increase a lot, but also the thread "deadlock" or "starve" the risk (multi-process in the access process to share resources, there is the same problem).

Deadlock behavior (two threads lock each other for resources they want to get, blocking each other, and never meeting the conditions):

Starvation phenomenon (a thread that has not been able to get the resource that it wants to lock and can never perform the next step):

In order to avoid the problems caused by these locks, we have to increase the complexity of the program, the solution is generally:

(1) The lock on the resources, according to the order of the agreed good, we all first on the shared resources X plus lock, lock successfully before you can lock shared resources Y.

(2) If the thread occupies resource x, but locks the resource y fails, the lock is discarded, and the resource x that was previously occupied is freed.

In the case of PHP, the worker and event modes of Apache must also be thread-safe. In general, the new version of the PHP official library is not a thread-safe issue, the need to focus on third-party extensions. PHP implements thread safety and is not implemented by locks. Instead, a copy of the global variable is requested for each thread, which is equivalent to the private memory space of the thread, but it consumes a bit more memory. However, the advantage is that there is no need to introduce a complex lock mechanism implementation, and also to avoid the locking mechanism of the CPU overhead.

Incidentally, PHP-FPM (FastCGI), which often works with Nginx, uses a multi-process and therefore does not have a thread-safe problem.


Perhaps some students after reading, will come to the conclusion that NGINX+PHP-FPM's working style, seems to be the most economical system resources of the Web system work mode. To some extent, it can be said that, but the construction of web systems, from the perspective of practical business applications, specific problems need to be specific analysis, to find the most appropriate technical solutions.

The continuous evolution and development of Web services, and strive to use as few system resources as possible to support more user requests, this is a magnificent way forward. These technical solutions bring together a lot of ideas that are worth learning and drawing on.

The evolution of high concurrency Web services: Saving system memory and CPU

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.