Why Nginx is more efficient than Apache httpd: Principle article

Source: Internet
Author: User
Tags epoll http authentication imap sendfile

One, process, thread?

A process is an entity of a program that has a certain independent function that is already running on the computer. In early systems (such as Linux 2.4), a process is a basic operating unit, in a thread-supporting system (such as windows,linux2.6), where the thread is the basic operating unit, and the process is simply a container for the thread. The program itself is simply a description of the instruction, the data and its organization, and the process is the real running instance of the program (those instructions and data). Several processes may be related to the same program, and each process can run independently (sequentially) or asynchronously (parallel). Modern computer systems can load multiple programs into memory in a single process over a period of time, and are shared by time (or division multiplexing) to show the sensation of simultaneous (parallel) operation on a processor. Similarly, the use of multithreading technology (multithreading means that each line routine represents a separate execution context within a process) of the operating system or computer architecture, the same program of parallel threads, can be on a multi-CPU host or network to actually run concurrently (on different CPUs).

Second, common Web Service mode 2.1 Three working model comparison:

A Web server must work on a socket in some way to provide service to the user. General Web server in the processing of user requests, there are generally the following three ways to choose: multi-process mode, multi-threaded, asynchronous way.

    • Multi-process mode: initiates a process for each request to process. Because in the operating system, the build process, the destruction process, the inter-process switching both consume CPU and memory, when the load is high, performance will be significantly reduced.

Pros: Stability! because independent processes are used to handle stand-alone requests, and processes are independent, a single process problem does not affect other processes, so stability is best.

Cons: Resource consumption! when the request is too large, a large number of process requests are required, process generation, switching overhead, and inter-process resources are independent, resulting in memory reuse.

    • Multithreading: a process in which a user request is processed with multiple threads. Because the thread overhead is significantly smaller than the process, and some resources can be shared, it is more efficient.

Pros: less overhead! part of the data between threads is shared, and the resource overhead required to switch between thread generation and thread is much smaller than switching between processes.

Cons: Stability! thread switching too quickly can cause thread jitter, and too many threads can cause server instability.

    • Asynchronous Way: processing requests in a non-blocking manner is the least expensive of the three ways. However, although the asynchronous way is high efficiency, but also high requirements, because the scheduling between the multi-tasking if there is a problem, there may be a global failure, so the use of asynchronous work, is generally a few features relatively simple, but it is consistent with the server task scheduling, and the code does not affect the scheduling error code exists in the program.

Pros: Best Performance! a process or thread processes multiple requests, does not require additional overhead, performs best, and consumes the least resources.

Cons: Stability! an error in a process or thread can cause a large number of requests to be unhandled or even cause the entire service to go down.

2.2 Process for a Web request:

    1. Customer initiated the case to the server NIC;
    2. The server NIC accepts the request and transfers it to the kernel processing;
    3. The kernel will submit the request to the Web server process that is working in the user space, based on the socket that the request corresponds to
    4. The Web server process makes system calls to the kernel based on user requests, requesting access to the appropriate resources (such as index.html)
    5. The kernel discovers that the Web server process is requesting a resource that resides on the hard disk, so the driver connects to the disk
    6. Kernel Scheduler disk to get the required resources
    7. The kernel stores resources in its own buffer and notifies the Web server process
    8. The Web server process obtains resources through system calls and copies them into the process's own buffer
    9. The Web server process forms a response that is sent back to the kernel through system calls in response to user requests
    10. The kernel sends the response to the NIC
    11. The network card sends the response to the user

With such a complex process, a single request is completed.

Simply put: The user requests--to the user space--the system calls--the kernel space--and the kernel-to-disk read page resource--returns to user space--response to the user . The above simple explanation of the client to the Web service request process, in this process, there are two I/O process, one is the client requested network I/O, and the other is the Web server request page disk I/O. Let's talk about the I/O model for Linux.

Three, various I/O model detailed

Through the above analysis of the processing of the connection, we know that the Web server process working in the user space is unable to directly manipulate the IO, it needs to be done through system calls, the relationship is as follows:

That is, the process makes a system call to the kernel to request Io, the kernel dispatches the resource from IO to the kernel buffer (wait stage), and the kernel also needs to copy the data from the kernel buffer (copy phase) to the user space of the Web server process to complete the IO dispatch. These are the stages that will take time. Depending on the processing wait mechanism of the wait and copy phases, I/O actions can be divided into the following five modes:

    • Blocking I/O
    • Non-blocking I/O
    • I/O multiplexing (SELECT and poll)
    • Signal (Event) driver I/O (SIGIO)
    • asynchronous I/O (AIO)
      Introduction to the 3.1 I/O model

      It is necessary to explain the concept of blocking, non-blocking, synchronous, asynchronous, I/O first.

3.1.1 Blocking and non-blocking:

Blocking and non-blocking refers to whether an operation is performed until the end of the operation, or return immediately.

For example, restaurant waiter for the user order, when the user ordered the dishes, the waiter will menu to the background chef, at this time there are two ways:

    • The first kind: Just wait in the out-of-the-way window until the chef finishes cooking and sends the dish to the window, then the waiter will send the dish to the user's hands;
    • The second type: Wait a minute and ask the chef, is there a dish ready? If you do not deal with other things first, wait to ask again;

The first is blocking, and the second is non-blocking.

3.1.2 Synchronous and asynchronous:

Synchronization and Asynchrony are another concept, which is a property of the event itself. Also take the first order for example, the waiter directly with the chef to deal with, the dish out did not come out, the waiter direct guidance, but only when the chef will send the food to the waiter hand, the process is normal completion, this is the synchronization of events. The same is a à la carte, some restaurants have a special transfer of food staff, when the chef fried good dishes, the transfer of vegetables to the food delivery window, and inform the waiter, this becomes asynchronous. In fact, asynchronous can be divided into two types: with notification and without notification. The one in front of you belongs to the one with the notice. Some of the food-transfer staff may not be enough initiative to inform you, you need to pay attention to the state from time to time. This is asynchronous without notification.

For synchronized events, you can only do this in a blocking manner. For asynchronous events, blocking and non-blocking are all possible. There are two ways of non-blocking: actively querying and passively receiving messages. Passive does not mean that it is not good, here it is precisely the more efficient, because in the active query most of the query is doing no work. Both are available for asynchronous events with notifications. And for without notice, you can only use active query.

3.1.3 Full asynchronous I/O

Back to I/O, whether I or O, access to peripherals (disks) can be divided into two phases: request and execution. The request is to look at the status information of the peripheral (such as whether it is ready), and execution is the real I/O operation. Prior to Linux 2.6, only "request" was an asynchronous event, and after 2.6 the AIO (asynchronous I/O) was introduced to "execute" asynchronously. Although Linux/unix is used as a server, this is a lot behind windows, IOCP (AIO on Windows, very efficient) on the Win2000. So learn Linux don't always think windows here bad there bad (multithreading mechanism of Windows also due to Linux).

Five models of 3.1.4 I/O

Based on the above analysis, I/O can be divided into five models:

    • blocking I/o: All processes are completely blocked
    • nonblocking I/o: If there is no data buffer, return Ewouldblock immediately
    • I/O multiplexing (SELECT and poll): block separately during wait and copy stages
    • Signal-driven I/O (SIGIO): does not block during the wait phase, but the copy phase is blocked (signal-driven I/O, notification)
    • asynchronous I/O (AIO): fully five-block mode when I/O completion is providing a signal

The "execution" phase of the top four I/O models on Linux is synchronous, and only the last one achieves true full asynchrony. The first type of blocking is the most primitive method and the most exhausting way. Of course tired and not tired to see who. The application is dealing with the kernel. This is the most tiring way for an application, but it is the easiest way for the kernel to do this. Also take a la carte this case, you are the application, the chef is the kernel, if you have been waiting, the chef will be more convenient (do not have to handle other waiter's dishes). Of course now the computer design, including the operating system, more and more for the end user to consider, in order to satisfy the user, the kernel slowly assume more and more work, the evolution of the IO model is the same.

Non-blocking I/O, I/O multiplexing, and signal-driven I/O are all non-blocking, of course, for the "request" phase. Non-blocking is the active querying of peripheral state. The select,poll in I/O multiplexing is also active query, the difference is that select and poll can simultaneously query the state of multiple FD (file handle), and select has the limit of FD number. Epoll is based on the callback function. The signal-driven I/O is based on the signal message. These two should be in the "passive Receive message" category. Finally, the emergence of the great AIO, the kernel has done everything, the application of the upper layer to achieve full-asynchronous, the best performance, of course, the highest degree of complexity.

3.2 Detailed Introduction to each I/O model "This section is excerpted from the Lone Orphan blog": 3.2.1 Blocking I/O

Description: The application invokes an IO function that causes the application to block and wait for the data to be ready. If the data is not ready, wait until the data is ready, copy from the kernel to the user space, and the IO function returns a success indicator. This doesn't have to be explained, blocking sockets. Is the diagram of the procedure it invokes: (Note that the general network I/O is blocking I/O, the client makes the request, the Web server process responds, and the request is in a waiting state until the process does not return the page)

3.2.2 Non-blocking I/O

We set up a set of interfaces to non-blocking is to tell the kernel, when the requested I/O operation cannot be completed, do not sleep the process, but return an error. This way our I/O operations function will constantly test whether the data is ready, and if not, continue testing until the data is ready. During this ongoing testing process, CPU time is heavily consumed, and the I/O model is not used by all general Web servers. Specific processes such as:

3.2.3 I/O multiplexing (SELECT and poll)

The I/O multiplexing model uses the Select or poll function or the Epoll function (which is supported by the Linux2.6 kernel), which also blocks the process, but unlike blocking I/O, these two functions can block multiple I/O operations at the same time. I/O functions can be detected at the same time for multiple read operations, multiple write operations, and I/O operation functions are not actually invoked until there is data readable or writable. Specific processes such as:

3.2.4 Signal-driven I/O (SIGIO)

First, we allow the socket interface to drive the signal-driven I/O and install a signal processing function, and the process continues to run without blocking. When the data is ready, the process receives a sigio signal that can be called by the I/O operation function in the signal processing function to process the data. Specific processes such as:

3.2.5 Asynchronous I/O (AIO)

When an asynchronous procedure call is made, the caller cannot get the result immediately. The part that actually handles the call notifies the caller of the input-output operation through state, notification, and callback after completion. Specific processes such as:

3.2.6 I/O model summary (e.g.)

As we can see, the more backward, the less the blockage, the theoretical efficiency is also the best. Of the five I/O models, the first three are synchronous I/O, the latter of which are asynchronous I/O.

Synchronous I/O:

Blocking I/O
Non-blocking I/O
I/O multiplexing (SELECT and poll)

asynchronous I/O:

Signal-driven I/O (SIGIO) (semi-asynchronous)
asynchronous I/O (AIO) (True async)

Differences between asynchronous I/O and signal-driven I/O:

In the signal-driven I/O mode, the kernel can copy the notification to our application when sending a Sigio message.
In asynchronous I/O mode, the kernel notifies our application only after all operations have been completed by the kernel operation.

3.3 Linux I/O model implementation [go from isolated City blog]3.3.1 The main implementation methods are as follows:
    • Select
    • Poll
    • Epoll
    • Kqueue
    • /dev/poll
    • Iocp

Note, where IOCP is implemented by Windows, select, poll, Epoll are implemented by Linux, Kqueue is implemented by FreeBSD, and/dev/poll is implemented by Sun's Solaris. Select, poll corresponds to the 3rd type (I/O multiplexing) model, IOCP corresponds to the 5th (asynchronous I/O) model, then Epoll, Kqueue,/dev/poll? In fact, the same model as SELECT, just a bit more advanced, can be seen as having a 4th (signal-driven I/O) model of some features, such as the callback mechanism.

3.3.2 Why Epoll, Kqueue,/dev/poll than select Advanced?

The answer is that they are not polled. Because they replaced it with callback. Think about it, when the socket is more, each time select () through the traversal fd_setsize a socket to complete the dispatch, regardless of which socket is active, all traverse over again. This can waste a lot of CPU time. If you can register a callback function with the socket, and when they are active, the related operation is done automatically, which avoids polling, which is what Epoll, Kqueue,/dev/poll do. This may not be easy to understand, then I say a realistic example, if you are in college, living in the dormitory building has a lot of rooms, your friends will come to you. Select version of the tube aunt will take your friends to the room to find, until you find you. and epoll version of the house tube aunt will first write down each classmate's room number, when your friend comes, just tell your friend you live in which room can, do not have to personally take your friend filled the building to find someone. If there are 10,000 people, you have to find their own living in this building students, select version and Epoll version of the aunt, who is more efficient, self-evident. Similarly, in high concurrent servers, polling I/O is one of the most time-consuming operations, and the performance of Select, Epoll, and/dev/poll is also very clear.

3.3.3 Windows or *nix (IOCP or Kqueue, Epoll,/dev/poll)?

Admittedly, Windows IOCP is very good and there are few systems that support asynchronous I/O, but because of the limitations of the system itself, large servers are still under UNIX. And as mentioned above, Kqueue, Epoll,/dev/poll and IOCP are more than a layer of blocking from the kernel copy data to the application layer, and thus cannot be counted as asynchronous I/O classes. However, this layer of small blockage insignificant, kqueue, Epoll,/dev/poll have done very good.

3.3.4 summarizes some key points

Only IOCP (Windows implementation) is asynchronous I/O, and other mechanisms are more or less blocked.

Select (Linux implementation) is inefficient because it requires polling every time. But inefficient is also relative, depending on the situation, but also through a good design to improve

Epoll (Linux implementation), Kqueue (FreeBSD implementation),/dev/poll (Solaris Implementation) are Reacor modes, IOCP is Proactor mode.

Apache 2.2.9 supports only select models before 2.2.9 support for Epoll models

Nginx Support Epoll Model

Java NIO Package is a select model

Four, Apache httpd work mode 4.1 Apache three modes of operation

We all know that Apache has three types of work modules, prefork, worker, event, respectively.

    • Prefork: Multi-process, each request with a process response, this process will use the Select mechanism to notify.
    • Worker: Multi-threaded, a process can generate multiple threads, each thread responds to a request, but the notification mechanism is select but can accept more requests.
    • Event: Based on the asynchronous I/O model, a process or thread, each process or thread responds to multiple user requests, and is implemented based on the event-driven (that is, the epoll mechanism).
How the 4.2 prefork works

If you do not explicitly specify a mpm,prefork with "--with-mpm", it is the default MPM on the UNIX platform. The pre-derived subprocess that it uses is also the pattern used in Apache1.3. Prefork itself is not used in the thread, version 2.0 uses it to maintain compatibility with version 1.3, on the other hand, the process is independent of each other by using separate sub-processes to handle different requests, which makes it one of the most stable mpm.

How the 4.3 worker works

Compared to Prefork,worker is the new MPM for multi-threaded and multi-process hybrid models in version 2.0. Because threads are used for processing, relatively large amounts of requests can be handled, and system resources are less expensive than process-based servers. However, the worker also uses multiple processes, each of which generates multiple threads to obtain stability based on the process server, and this MPM works as a Apache2.0 trend.

4.4 Event based on the nature of the events mechanism

A process responds to multiple user requests, uses the callback mechanism to make the socket reusable, requests that the process does not process the request, but directly to other mechanisms to handle it, through the Epoll mechanism to notify the request whether it is completed, and in this process, the process itself is idle and can always receive user requests. A process can be implemented to respond to multiple user requests. Supports a large number of concurrent connections and consumes less resources.

V. How to improve the concurrent connection processing ability of Web server

There are several basic conditions:

    • Thread-based, that is, one process generates multiple threads, and each thread responds to each request of the user.
    • An event-based model in which a process processes multiple requests and notifies the user that a request is completed through the epoll mechanism.
    • Disk-based AIO (asynchronous I/O)
    • Support mmap memory mapping, mmap the traditional Web server, when the page input, the disk page is entered into the kernel cache, and then copied from the kernel cache to the Web server, mmap mechanism is to let the kernel cache and disk mapping, Web server, directly copy the page content. It is not necessary to first enter the page on the disk into the kernel cache.

Just right, Nginx supports all of the above features. So Nginx official online said, Nginx support 50000 concurrency, there is a basis.

Six, Nginx Excellence 6.1 Introduction

Traditionally, a Web service based on a process or threading model schema processes concurrent connection requests through each process or per thread, which is bound to block during network and I/O operations, with the other corollary being low utilization of memory or CPU. Generating a new process/thread requires that its runtime environment be prepared beforehand, which includes allocating heap memory and stack memory for it, and creating a new execution context for it. These operations require CPU usage, and too many processes/threads can cause thread jitter or frequent context switching, which can further degrade the system performance. Another high-performance Web server/web server reverse proxy: nginx (Engine X), Nginx's main focus is its high performance and high-density utilization of physical computing resources, so it adopts different architectural models. Inspired by the advanced processing mechanism based on "events" in various operating system designs, Nginx uses a modular, event-driven, asynchronous, single-threaded and non-blocking architecture, and uses a large number of multiplexing and event notification mechanisms. In Nginx, connection requests are handled by a handful of process workers with only one thread in an efficient loopback (run-loop) mechanism, and each worker can process thousands of concurrent connections and requests in parallel.

6.2 Nginx Working principle

Nginx runs multiple processes on demand simultaneously: one master process (master) and several worker processes (worker), cache loader processes (caches loader) and cache Manager processes (cache manager), and so on when the cache is configured. All processes contain only one thread, and the process of interprocess communication is achieved primarily through the "shared memory" mechanism. The main process runs as root, and the worker, cache loader, and cache manager should run as non-privileged users.

The main process is to complete the following tasks:
    • Read and verify positive configuration information;
    • Create, bind, and close sockets;
    • The number of worker processes to start, terminate, and maintain;
    • Reconfiguration of the operating features without aborting the service;
    • Control non-disruptive program upgrades, enable new binaries, and roll back to older versions when needed;
    • Re-open the log file;
    • compiling embedded Perl scripts;
    • The main tasks that the worker process accomplishes include:
    • Receiving, passing in and processing connections from the client;
    • Provide reverse proxy and filtering function;
    • Any other tasks that can be done by nginx;

Note: If the load is CPU-intensive, such as SSL or compression applications, the number of workers should be the same as the number of CPUs, if the load is mainly IO-intensive, such as responding to a large number of content to the client, the number of workers should be 1.5 or twice times the number of CPUs.

6.3 Nginx Architecture

Nginx code is composed of a core and a series of modules, the core is mainly used to provide the basic functions of Web server, as well as web and mail reverse proxy functions, but also to enable network protocols, create the necessary runtime environment and ensure smooth interaction between different modules. However, most of the functions associated with the protocol and the functionality specific to an application are implemented by Nginx's modules. These functional modules can be broadly divided into event modules, phased processors, output filters, variable processors, protocols, upstream, and load balancing in several categories, which together make up Nginx's HTTP functionality. The event module is primarily used to provide OS independent (different operating system event mechanisms differ) event notification mechanisms such as kqueue or Epoll. The Protocol module is responsible for implementing Nginx through HTTP, Tls/ssl, SMTP, POP3, and IMAP to establish a session with the corresponding client. Within Nginx, interprocess communication is implemented through the pipeline or chain of modules; in other words, each function or operation is implemented by a single module. For example, compressing, communicating with the upstream server through the fastcgi or UWSGI protocol, and establishing a session with memcached.

6.4 Nginx Basic function
    • Processing static files, index files and automatic indexing;
    • Reverse proxy acceleration (no caching), simple load balancing and fault tolerance;
    • FastCGI, simple load balancing and fault tolerance;
    • Modular structure. Filters include gzipping, byte ranges, chunked responses, and Ssi-filter. In the SSI filter, multiple sub-requests to the same proxy or FastCGI are processed concurrently;
    • SSL and TLS SNI support;
6.5 Nginx IMAP/POP3 Proxy service function
    • Redirect users to the IMAP/POP3 backend using an external HTTP authentication server;
    • Use an external HTTP authentication server to authenticate the user after the connection is redirected to the internal SMTP backend;
    • Authentication method:
    • Pop3:pop3 User/pass, APOP, AUTH LOGIN PLAIN cram-md5;
    • Imap:imap LOGIN;
    • Smtp:auth LOGIN PLAIN cram-md5;
    • SSL support;
    • STARTTLS and STLS support in IMAP and POP3 modes;
6.6 Nginx-Supported operating systems
    • FreeBSD 3.x, 4.x, 5.x, 6.x i386; FreeBSD 5.x, 6.x AMD64;
    • Linux 2.2, 2.4, 2.6 i386; Linux 2.6 amd64;
    • Solaris 8 i386; Solaris 9 i386 and sun4u; Solaris Ten i386;
    • MacOS X (10.4) PPC;
    • Windows-compiled versions support Windows-series operating systems;
The structure and extension of 6.7 Nginx
    • A master process and multiple worker processes, the worker process runs on a non-privileged user;
    • Kqueue (FreeBSD 4.1+), Epoll (Linux 2.6+), RT Signals (Linux 2.2.19+),/dev/poll (Solaris 7 11/99+), select, and poll support;
    • The different features supported by Kqueue include Ev_clear, ev_disable (temporary Forbidden event), Note_lowat, ev_eof, number of valid data, error codes;
    • Sendfile (FreeBSD 3.1+), Sendfile (Linux 2.2+), Sendfile64 (Linux 2.4.21+), and Sendfilev (Solaris 8 7/01+) support;
    • Input filtering (FreeBSD 4.1+) and tcp_defer_accept (Linux 2.4+) support;
    • 10,000 the inactive HTTP keep-alive connection requires only 2.5M of memory.
    • Minimal data copy operation;
6.8 Nginx Other HTTP features
    • Virtual Host service based on IP and name;
    • GET interface of Memcached;
    • Support keep-alive and pipeline connection;
    • Flexible and simple configuration;
    • Reconfiguration and online upgrade without interrupting customer's work process;
    • Customizable access logs, log write cache, and fast log back volumes;
    • 4XX-5XX error code redirection;
    • Rewrite rewrite module based on PCRE;
    • Access control based on client IP address and HTTP Basic authentication;
    • PUT, DELETE, and Mkcol methods;
    • Support for FLV (Flash video);
    • Bandwidth limit;
6.9 Why Choose Nginx
      • In the case of high connection concurrency, Nginx is a good alternative to Apache server: Nginx in the United States is a virtual hosting business owners often choose one of the software platform. Capable of supporting up to 50,000 concurrent connections, thanks to Nginx for choosing Epoll and Kqueue as the development model.
      • Nginx as a Load Balancer server: Nginx can either directly support the internal Rails and PHP programs outside the service, also can support as an HTTP proxy server external services. Nginx is written in C, which is much better than Perlbal, both in terms of system resource overhead and CPU efficiency.
      • As a mail proxy: Nginx is also a very good mail proxy server (one of the first to develop this product is also as a mail proxy server), Last.fm describes the success and wonderful use of experience.
      • Nginx is a [#installation installation] is very simple, the configuration file is very concise (also can support Perl syntax), Bugs very few servers: Nginx boot is particularly easy, and can be almost uninterrupted operation, even if the run for several months do not need to restart . You will also be able to upgrade the software version without interruption of service.
      • Nginx's birth mainly solves c10k problem

Why Nginx is more efficient than Apache httpd: Principle article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.