In-depth Nginx: How do we design performance and expansion?
NGINX can gain a leading position in web performance, which is determined by its software design. Many web servers and application servers use a simple thread or process-based architecture. NGINX is based on a complex event-driven architecture, enable it to extend to thousands of concurrent connections on modern hardware.
The following in-depth NGINX information diagram illustrates how to maintain multiple connections in a single process by in-depth mining of high-level process architecture. This blog further explains in detail how everything works.
Knowledge-NGINX Process Model
To better understand this design, you need to understand how NGINX runs. NGINX has a master process (which performs privileged operations, such as reading configuration and binding ports) and some worker and auxiliary processes.
# service nginx restart
*Restarting nginx
# ps -ef --forest | grep nginx
root 324751013:36?00:00:00 nginx: master process /usr/sbin/nginx \
-c /etc/nginx/nginx.conf
nginx 3247632475013:36?00:00:00 \_ nginx: worker process
nginx 3247732475013:36?00:00:00 \_ nginx: worker process
nginx 3247932475013:36?00:00:00 \_ nginx: worker process
nginx 3248032475013:36?00:00:00 \_ nginx: worker process
nginx 3248132475013:36?00:00:00 \_ nginx: cache manager process
nginx 3248232475013:36?00:00:00 \_ nginx: cache loader process
On a quad-core server, the NGINX master process creates four worker processes and two cache auxiliary processes that manage disk content cache.
Why is architecture important?
The fundamental foundation of any Unix application is a thread or process. (From the perspective of the Linux operating system, threads and processes are mostly the same. The main difference is the degree of memory they share .) A thread or process is a self-contained instruction set. The operating system can schedule and run them on a CPU core. Most complex applications run multiple threads or processes in parallel for two reasons:
- They can use more computing cores at the same time.
- Threads or processes can easily perform parallel operations. (For example, keep multiple connections at the same time ).
Processes and threads consume resources. Each of them uses memory and other system resources. They will switch in and out of the CPU core (one operation can be called context switching ). Most modern servers can maintain hundreds of small, active threads or processes in parallel, but a large number of context switches may cause serious performance degradation once the memory is exhausted or high I/O pressure occurs.
A common method for designing network applications is to allocate a thread or process to each connection. This architecture is simple and easy to implement, but it is not scalable when applications need to process thousands of concurrent connections.
How does NGINX work?
NGINX uses a predictable process mode to allocate available hardware resources:
- The master process performs privileged operations, such as reading configurations and binding ports, and then creating a small number of sub-processes (three types below ).
- The cache loader process starts to run when loading the disk cache to the memory, and then exits. Due to proper scheduling, the resource demand is very low.
- Cache manager processes regularly crop records in the disk cache to keep them within the configured size.
- Worker does all the work! They maintain network connections, read and write content to the disk, and communicate with upstream servers.
In most cases, we recommend that you run a worker process on each CPU core to make the most effective use of hardware resources. You can include the worker_processes auto command configuration in the Configuration:
worker_processes auto;
When an NGINX service is active, only the working process is busy. Each worker Process maintains multiple connections in non-blocking mode to reduce context switching.
Each worker process is a single thread and runs independently. It acquires and processes new connections. These processes can use shared memory communication to share cached data, session persistence data, and other shared resources. (In NGINX 1.7.11 and later versions, there is also an optional thread pool for the Worker Process to transfer blocking operations to it. For more details, see "NGINX thread pool can increase performance by 9 times !". For NGINX Plus users, this function is planned to be added to R7 later this year .)
Internal NGINX Working Process
Each NGINX working process is initialized according to the NGINX configuration, and the main process provides a set of listening ports.
NGINX worker processes first wait for the event (accept_mutex and kernel socket fragment) on the listening socket ). The event is initialized by a new connection. These connections are allocated to a state machine-HTTP state machine is the most commonly used, but NGINX also implements a stream (original TCP) state machine and a state machine for several mail protocols (SMTP, IMAP, and POP3.
A state machine is essentially a set of commands that tell NGINX how to process a request. Most web servers use similar state machines like NGINX to implement the same functions-the difference is implementation.
Scheduling state machine
Imagine a state machine as a chess rule. Each HTTP transaction is a chess game. On the one hand, the board is a web server-a master can make decisions very quickly. On the other hand, remote clients-web browsers access websites or applications in a relatively slow network.
In any case, this game rule is complicated. For example, a web server may need to communicate with each Party (acting as an upstream application) or communicate with the authentication server. Third-party modules of web servers can even expand game rules.
A blocking state machine
Recalling our previous description, a process or thread is like a set of independent instruction sets. The operating system can schedule and run it on a CPU core. Most web servers and web applications use each connection to a process or connection to a thread to play this "Chess Game ". Each process or thread contains instructions for playing "one game. Most of the time when the server runs the process is "blocked"-waiting for the client to complete its next step.
- Web server processes listen for new connections on the listening socket (the client initiates a new "game ")
- When it gets a new game, it plays the game and blocks every step of waiting for the client to respond.
- After the game is complete, the web server process may wait for a client to start a new game (this refers to a "maintained" connection ). If the connection is closed (the client is disconnected or times out), the web server process returns and listens to a new game ".
The most important thing to remember is that a dedicated process or thread (a chess player) is required for the HTTP Connection of each activity ). This structure is simple and easy to expand third-party modules ("New Rules "). However, there is still a huge imbalance: especially the lightweight HTTP connection is actually a file descriptor and small memory, mapped to a separate thread or process, which is a very heavyweight system object. This method is easy to program, but it is too wasteful.
NGINX is a real chess master
Maybe you have heard of a simultaneous performance competition game. Is there a chess master playing against many opponents at the same time?
Frif gigiv played against 360 people in Sofia, Bulgaria. His final score is 284, 70, and 6.
This is how the NGINX Working Process "plays chess. Every working process (remember-generally there is a working process on each CPU core) is a chess master who can fight against hundreds of people at the same time (in fact, hundreds of thousands.
- A worker waits for an event on the listener and connection socket.
- Events occur on the socket and are processed by the worker process:
- Listening for socket events means that a client has started a new game. The worker creates a new connection socket.
- The socket connection event means that the client has made a move. The working process responds promptly.
A working process never blocks network traffic and waits for its "competitor" (client) to respond. When it goes down, the worker process immediately continues with other games, where the worker process is processing the next step, or a new player is welcomed at the door.
Why is this faster than the congested multi-process architecture?
Each Working Process of NGINX is well extended and supports hundreds of thousands of connections. Each connection creates another file descriptor in the working process and consumes a small amount of extra memory. Each connection has very little extra cost. The NGINX process can be fixed on a certain CPU. Context switching is rare and generally only occurs when no work is required.
In blocking mode, a connection method of each process requires a large amount of additional resources and overhead, and context switching (switching from one process to another) is very frequent.
For more details, see this article on NGINX architecture, which was written by Andrew Alexeev, vice president of NGINX Development and co-founder.
Through proper system optimization, each working process of NGINX can be extended to process thousands of concurrent HTTP connections, and be able to withstand peak traffic without fear (a large influx of new "games ").
Update Configuration and upgrade NGINX
The NGINX process architecture uses a small number of working processes, which helps to effectively update the configuration file and even the NGINX program itself.
Updating NGINX configuration files is simple, lightweight, and reliable. A typical example is to run commands.nginx –s reload
Check the disk configurations and send the SIGHUP signal to the main process.
When the main process receives a SIGHUP signal, it will do two things:
- Reload the configuration file and allocate a group of new workers. These new working processes immediately begin to accept connections and Process Traffic (using new configuration settings)
- It notifies the old working process to exit gracefully. The worker stops accepting new connections. Once the current http request is complete, the workflow closes the connection completely (that is, there is no "keep" connection ). Once all connections are closed, the working process exits.
This overload process can lead to a small peak of CPU and memory usage, but it is generally less noticeable than the resources loaded with active connections. You can reload configurations multiple times per second (many NGINX users do this ). In rare cases, there are many generations of working processes waiting to close the connection, but even that will be quickly solved.
The Holy Grail of high availability is obtained during NGINX program upgrade-you can update the software at any time without losing connections, downtime, or service interruption.
The program upgrade process is similar to the smooth and heavy-load configuration method. A new NGINX master process runs in parallel with the original master process, and then they share the listening socket. Both processes are active and their respective work processes process traffic. Then you can notify the old master process and its working process to exit gracefully.
The detailed description of the entire process is on NGINX management.
Conclusion
In-depth NGINX information graph provides a high-level overview of NGINX function implementation, but behind this simple explanation is more than ten years of innovation and optimization, this enables NGINX to provide the best possible performance on a wide range of hardware while maintaining the security and reliability required by modern Web applications.
If you want to read