NginX and Node. js are born on the topic of building high-throughput web applications. They are designed based on the event-driven model and can easily break through the C10K bottleneck of traditional web servers such as Apache. The preset configuration can already achieve high concurrency. However, if you want to implement more than requests per second on cheap hardware, you still have some work to do. This article assumes that readers use NginX's HttpProxyModule to act as a reverse proxy for upstream node. js servers. We will introduce the optimization of sysctl in Ubuntu 10.04 and later systems, as well as the optimization of node. js applications and NginX. Of course, if you use a Debian system, you can achieve the same goal, but the optimization method is different. Network Optimization without first understanding the underlying transmission mechanisms of Nginx and Node. js, and performing targeted optimization, it may be futile to tune the two in detail. In general, Nginx uses TCP socket to connect the client to upstream applications. Our system has many threshold values and limits on TCP, which are set through kernel parameters. The default values of these parameters are generally used and cannot meet the high-traffic and short-lived requirements of web servers. Some parameters available for TCP tuning are listed here. To make them take effect, you can put them in/etc/sysctl. or Add a new configuration file, such as/etc/sysctl. d/99-tuning.conf, and then run sysctl-p to let the kernel load them. We use sysctl-cookbook to do this physical activity. It should be noted that the values listed here can be safely used, but we recommend that you study the meaning of each parameter, so that you can select a more appropriate value based on your load, hardware, and usage. Net. ipv4.ip _ local_port_range = '2017 1024 'net. ipv4.tcp _ tw_reuse = '1' net. ipv4.tcp _ fin_timeout = '15' net. core. netdev_max_backlog = '000000' net. core. rmem_max = '000000' net. core. somaxconn = '000000' net. core. wmem_max = '000000' net. ipv4.tcp _ max_syn_backlog = '000000' net. ipv4.tcp _ max_tw_buckets = '000000' net. ipv4.tcp _ no_metrics_save = '1' net. ipv4.tcp _ rmem = '4096 87380 16777216 'net. ipv4.tcp _ syn_retries = '2' net. ipv4.tcp _ synack _ Retries = '2' net. ipv4.tcp _ wmem = '2017 4096 65536 16777216 'vm. min_free_kbytes = '20160301'. Several important parameters are highlighted. Net. ipv4.ip _ local_port_range to serve the downstream clients of the upstream Application Service, NginX must open two TCP connections, one connecting the client and the other connecting the application. When the server receives many connections, the available ports of the system will soon be exhausted. You can change the range of available ports by modifying the net. ipv4.ip _ local_port_range parameter. If the error "possible SYN flooding on port 80. Sending cookies" is found in/var/log/syslog, the system cannot find the available port. Increasing the net. ipv4.ip _ local_port_range parameter can reduce this error. Net. ipv4.tcp _ tw_reuse when the server needs to switch between a large number of TCP connections, a large number of connections in the TIME_WAIT status will be generated. TIME_WAIT means that the connection is closed, but the resource has not been released. Setting net_ipv4_tcp_tw_reuse to 1 is to make the kernel recycle connections whenever possible during security, which is much cheaper than creating new connections. Net. ipv4.tcp _ fin_timeout this is the minimum time that a connection in the TIME_WAIT status must wait before it is recycled. It can speed up recovery. How to check the connection status using netstat: netstat-tan | awk '{print $6}' | sort | uniq-c or using ss: 1ss-sngort as the load on the web server increases, we will begin to encounter some strange restrictions on NginX. The connection is discarded, and the kernel keeps reporting SYN flood. At this time, the average load and CPU usage are very small, and the server can handle more connections clearly, which is really frustrating. After investigation, we found many connections in the TIME_WAIT status. This is the output of one of the servers: ss-sTotal: 388 (kernel 541) TCP: 47461 (estab 311, closed 47135, orphaned 4, synrecv 0, timewait 47135/0 ), ports 33938 Transport Total IP IPv6 * 541--RAW 0 0 UDP 13 10 3 TCP 326 325 1 INET 339 335 4 FRAG 0 0 0 0 There are 47135 TIME_WAIT connections! In addition, we can see from the ss that they are all closed connections. This indicates that the server has consumed most of the available ports, and it also implies that the server has allocated a new port for each connection. Optimizing the network is helpful, but the port is still insufficient. After further research, I found a document about the keepalive command for uplink connections, which writes: set the maximum number of idle active connections to the upstream server, these connections are retained in the cache of the worker process. Interesting. Theoretically, this setting reduces connection waste as much as possible by passing requests on cached connections. As mentioned in this document, we should set proxy_http_version to "1.1" and clear the "Connection" header. After further research, I found that this is a good idea, because HTTP/1.1 greatly optimizes the usage of TCP connections compared with HTTP1.0, while Nginx uses HTTP/1.0 by default. According to the recommended modification of the document, our upstream configuration file becomes like this: upstream backend_nodejs {server nodejs-3: 5016 max_fails = 0 fail_timeout = 10 s; server nodejs-4: 5016 max_fails = 0 fail_timeout = 10 s; server nodejs-5: 5016 max_fails = 0 fail_timeout = 10 s; server nodejs-6: 5016 max_fails = 0 fail_timeout = 10 s; keepalive 512 ;} I also modified the proxy settings in the server section as recommended. In addition, a p roxy_next_upstream is added to skip the faulty server, the keepalive_timeout of the client is adjusted, and access logs are disabled. The configuration becomes as follows: server {listen 80; server_name timeout; listen 16 M; keepalive_timeout 10; location/{handle error timeout http_500 http_502 http_503 http_504; proxy_set_header Connection ""; proxy_http_version 1.1; proxy_pass http: // backend_nodejs;} access_log off; error_log/dev/null crit;} after the new configuration is adopted, I found that the socket occupied by servers is reduced by 90%. Now we can use a much smaller number of connections to transmit requests. The new output is as follows: ss-s Total: 558 (kernel 604) TCP: 4675 (estab 485, closed 4183, orphaned 0, synrecv 0, timewait 4183/0 ), ports 2768 Transport Total IP IPv6 * 604--RAW 0 0 0 UDP 13 10 3 TCP 492 491 1 INET 505 501 4Node. js benefits from event-driven design, which allows asynchronous processing of I/O and Node. js can process a large number of connections and requests out of the box. Although there are other optimization methods, this article will focus on the node. js process. Node is single-threaded and does not use multiple cores automatically. That is to say, the application cannot automatically obtain all the capabilities of the server. To implement Node process clustering, we can modify the application so that it fork multiple threads can receive data on the same port, so as to achieve load across multiple cores. Node has a cluster module that provides all the tools necessary to achieve this goal, but it still requires a lot of effort to add them to the application. If you are using express, eBay has a module named cluster2 that can be used. Prevent context switching when multiple processes are running, ensure that each CPU core is busy with only one process at a time. In general, if the CPU has N cores, we should generate an application process for the N-1. This ensures that each process can obtain a reasonable time slice, and the remaining core is left to the kernel scheduler to run other tasks. We also need to ensure that the server basically does not execute other tasks except Node. js, to prevent CPU contention. We once made a mistake by deploying two node. js applications on the server, and then each application opened a N-1 process. As a result, the CPU is snatched from each other, resulting in a sharp increase in the system load. Although our servers are all 8-core machines, we can still clearly feel the performance overhead caused by context switching. Context switching means that the CPU suspends the current task to execute other tasks. During the switchover, the kernel must suspend all statuses of the current process, and then load and execute another process. To solve this problem, we reduced the number of processes opened by each application and let them share the CPU fairly. As a result, the system load is reduced: Please note that the system load (Blue Line) how to reduce the number of CPU cores (Red Line) to below. We also see the same situation on other servers. Since the total workload remains unchanged, the performance improvement can only be attributed to the reduction of context switching.