Nginx single-million QPS environment to build

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. BACKGROUND

Recently, the company is doing some things networking products, object communication with the MQTT protocol, internal authority and internal relations and other business logic ready to use HTTP implementation. Leader requires a local test to simulate the need for millions of users to be online at the same time. Although the product does not necessarily have so many at the end, but since the request will be simulated. MQTT is using Erlang's Emqtt, a colleague who has tested millions of users online in this machine. HTTP servers, however, have been difficult.

So this blog is going to describe how to simulate a local HTTP server that supports millions of QPS. To put it simply, MQTT million online is to support the million TCP connections, after the connection, as long as the heartbeat pack can be, this is relatively simple, generally slightly adjust some of the Linux system parameters can be. While the millions of HTTP online are not the same. When HTTP is sending a connection request and then closing the connection, millions of concurrent problems are troublesome, and the concept is not appropriate here. A reference value that is now more common is the number of concurrent processing per second. Is QPS. (this point is not understood by many people.) Including my leader, this will be explained to him slowly in the next.

Second, prepare

I've never done too much of a concurrent network, and recently read some blog posts, ready to use one of the simplest models to achieve this million-QPS simulation. Backstage I was written in Java, Tomcat server. This kind of business with the request, the single machine must not be able to reach million QPS (1M-QPS). The 1M-QPS test environment described on the web is a static web page to test. Since the HTTP single machine with the business does not meet the requirements, then need to use the cluster. As far as what I have learned, the basic is to build the cluster structure as shown in the following diagram.

A relatively good machine in accordance with Nginx as load balance, the rest of the ordinary machine is in the back. The database currently uses only one, and subsequent optimizations and extensions are not in this discussion.

If you use the above schema, after you go out of the database, the entire HTTP performance bottleneck is on the front of the nginx load balance. Other Http-jsp-tomcat machines can be scaled horizontally by adding several machines if the performance is not sufficient. And the front of the nginx is not. So this time to test is how to let Nginx support 1M-QPS, and the machine works. Nginx just do data forwarding.

Hardware: Server 88 core, 128G memory

Software: Debian 8, Nginx 1.11, wrk pressure test tool

Third, the operation

1. First set some Linux system parameters to add the following configuration in/etc/sysctl.conf

vm.swappiness = 0 net.ipv4.neigh.default.gc_stale_time=120 net.ipv4.conf.all.rp_filter=0 net.ipv4.conf.default.rp_
filter=0 net.ipv4.conf.default.arp_announce = 2 net.ipv4.conf.all.arp_announce=2 net.ipv4.tcp_max_tw_buckets = 100 net.ipv4.tcp_syncookies = 0 Net.ipv4.tcp_max_syn_backlog = 3240000 net.ipv4.tcp_window_scaling = 1 #net. ipv4.tcp_ Keepalive_time = Net.ipv4.tcp_synack_retries = 2 Net.ipv6.conf.all.disable_ipv6 = 1 Net.ipv6.conf.default.disable_ IPv6 = 1 Net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.conf.lo.arp_announce=2 fs.file-max = 40000500 Fs.nr_open = 40000500 NE T.ipv4.tcp_tw_reuse = 1 Net.ipv4.tcp_tw_recycle = 1 Net.ipv4.tcp_keepalive_time = 1 NET.IPV4.TCP_KEEPALIVE_INTVL = net.  Ipv4.tcp_keepalive_probes = 3 Net.ipv4.tcp_fin_timeout = 5 Net.ipv4.tcp_mem = 768432 2097152 15242880 = 4096 4096 33554432 Net.ipv4.tcp_wmem = 4096 4096 33554432 Net.core.somaxconn = 6553600 Net.ipv4.ip_local_port_range = 204 8 64500 net.core.wmem_default = 183888608 net.core.rmEm_default = 183888608 Net.core.rmem_max = 33554432 Net.core.wmem_max = 33554432 Net.core.netdev_max_backlog = 2621244 ker nel.sem=250 65536 2048 Kernel.shmmni = 655360 Kernel.shmmax = 34359738368 Kerntl.shmall = 4194304 Kernel.msgmni = 6553
5 Kernel.msgmax = 65536 KERNEL.MSGMNB = 65536 net.netfilter.nf_conntrack_max=1000000 net.nf_conntrack_max=1000000 net.ipv4.netfilter.ip_conntrack_max=1000000 kernel.perf_cpu_time_max_percent=60 Kernel.perf_event_max_sample_

rate=6250 net.ipv4.tcp_max_orphans=1048576 kernel.sched_migration_cost_ns=5000000 Net.core.optmem_max = 25165824 kernel.sem=10000 2560000 10000 256

and enter Ulimit-n 20000500 on the command line.

All the above parameters, here is not much to say, specifically want to understand the Internet to find information

2.Nginx Installation and Configuration

Nginx with the simplest apt-get can also be used in accordance with the source code can also. There are no special requirements. The following configuration is a bit of a requirement. $NGINX/conf/nginx.conf (/etc/nginx/conf/nginx.conf)

Worker_processes  ;  #这个根据硬件有多少核CPU而定
pid        logs/nginx.pid;

Events {
    worker_connections  1024;
}

HTTP {
    include       mime.types;
    Default_type  Application/octet-stream;
    Sendfile on        ;
    Tcp_nopush on     ;

    Keepalive_timeout  ;

    gzip off  ;
    Access_log off;   #日志功能要关闭

    Server {
        Listen       888 backlog=168888;
        server_name  localhost;
    root/dev/shm/
    }

}

This is the simplest configuration, the specific Nginx there are many can be tuned, and Nginx load Balancing configuration, please refer to my other blog <nginx + Tomcat static and dynamic separation to achieve load balancing >

Worker_processes 4;

Error_log/var/log/nginx/error.log info;


Pid/var/run/nginx.pid;
    events{use Epoll;
    Worker_connections 409600;
    Multi_accept on;
Accept_mutex off;
    } http{sendfile on;
    Tcp_nopush on;
    Tcp_nodelay on;
    Open_file_cache max=200000 inactive=200s;
    Open_file_cache_valid 300s;
    Open_file_cache_min_uses 2;
    Keepalive_timeout 5;
    Keepalive_requests 20000000;
    Client_header_timeout 5;
    Client_body_timeout 5;
    Reset_timedout_connection on;

    Send_timeout 5;
    #日志 Access_log off;
    #access_log/var/log/nginx/access.log;
    #error_log/var/log/nginx/error.log;
    #gzip compression transfer gzip off;  
    #最小1K #gzip_min_length 1k;
    #gzip_buffers 64K;
    #gzip_http_version 1.1;
    #gzip_comp_level 6;
    #gzip_types text/plain application/x-javascript text/css application/xml application/javascript;
    #gzip_vary on; #负载均衡组 #静态服务器组 #upstream static.zh-jieli.com {# server 127.0.0.1:808 WeigHt=1;
    #} #动态服务器组 upstream zh-jieli.com {server 127.0.0.1:8080;
    #配置代理参数 #proxy_redirect off;
    #proxy_set_header Host $host;
    #proxy_set_header X-real-ip $remote _addr;
    #proxy_set_header x-forwarded-for $proxy _add_x_forwarded_for;
    #client_max_body_size 10m;
    #client_body_buffer_size 128k;
    #proxy_connect_timeout 65;
    #proxy_send_timeout 65;
    #proxy_read_timeout 65;
    #proxy_buffer_size 4k;
    #proxy_buffers 4 32k;
    #proxy_busy_buffers_size 64k;
    #缓存配置 #proxy_cache_key ' $host: $server _port$request_uri ';
    #proxy_temp_file_write_size 64k;
    # #proxy_temp_path/dev/shm/jielierp/proxy_temp_path;
    # #proxy_cache_path/dev/shm/jielierp/proxy_cache_path levels=1:2 keys_zone=cache_one:200m inactive=5d max_size=1g;
    
    #proxy_ignore_headers x-accel-expires Expires Cache-control Set-cookie;
        server{Listen backlog=163840;
        server_name test2;
    root/dev/shm/; } server {LisTen 443 SSL;
        server_name test;
        Location/{index index;
            } location ~. *$ {index index;
        Proxy_pass http://zh-jieli.com;
        SSL on;
        Ssl_certificate Keys/client.pem;
    Ssl_certificate_key keys/client.key.unsecure; }
}

3. Create a simple HTML file

See root/dev/shm/, this is the HTTP server root directory. echo "a" >/dev/shm/a.html

4. Install wrk

git clone https://github.com/wg/wrk.git then make

Four, the operation

Start Nginx

Initiating WRK for concurrent request testing

./wrk-t88-c10000-d20s "Http://127.0.0.1:888/a.html"

Top Result Chart

WRK Result Chart

Basic in 1.2 million QPS. Nginx set some tuning parameters, plus if not on this machine to run WRK, 1.5 million QPS basic is no problem.

Five, to achieve 1M-QPS need to pay attention to

Linux parameters must be adjusted (pit: This pit is relatively small, basically mentioned high concurrent Linux, there will be mentioned to modify these parameters)

Access_log off in Nginx; Log function must be closed (pit: This impact is still relatively large, because the Nginx log is written on the disk, and my server's disk is a normal disk, so this will become a disk IO bottleneck, so start with the simplest configuration, and then according to the different hardware changes.) Some parameters are very fast in other machines, and in my machine it is very slow.

The test concurrency tool is best tested with the wrk above me (pit: First Test with Apache AB, followed by AB Test high performance, end using WRK tool)

It is best to test wrk in this machine (pit: From the above wrk figure can be seen, the transmission of data in 262MB per second or so, the normal 100M network card is far from reach. Although my a.html is only one byte, there is too much HTTP header information. There is a small pit, that is, my server is a gigabit network card, but I use another test machine is a gigabit NIC, but the middle of the switch or router or network cable are not up to the requirements, so this will become a network bottleneck.

The top ones are the pits I've encountered, especially the last two, and at one point Nginx performance problems were suspected. Because a lot of evaluation on the Internet that Nginx do agent can reach million QPS, I always test, basically in 2~3 million QPS just. The final discovery is the test tool and the reason for the NIC.

Six, finally say two more words

In fact, the whole installation test environment is relatively simple, this blog is the most important point or the last few points of attention, because has not done this before, so there is no experience, here recorded down, remind themselves later. The front end of the load balancer guarantees no problems, the next issue is the rear of the HTTP service cluster. My current Java features a common server (TOMCAT) can only support more than 10,000 simple requests (because of the JVM and the web framework, and so on), and then, if the kind of functional API that needs a simple query database is only 2~3k QPS, Finally, a server that requires transaction operations can only support 120~150QPS. The first is OK, by adding the machine can expand horizontally, but the latter two related to the database is more trouble, a variety of new nouns (NoSQL, caching, read-write separation, sub-Library table, index, optimized SQL statements, etc.) are coming. Ability is limited, follow up to learn too many things.

----2016-11-22 19:27:18 Increase-----

A single test when a lot of configuration is not used, the reason is that all the requests and replies are basically in memory processing, did not create a socket connection, no network card router switches, etc. The two-day Test found that after the Nginx agent connected to the number of concurrent, check the day, found that the problem is nginx keepalive parameters are not open. This parameter is not open, will cause each agent, will create a http-socket, processing is closed, this will be compared to occupy resources, while connecting the number of not go. After adding the keepalive parameter, the n concurrent requests of the extranet can be sent multiple requests via a socket. (This description is less clear, if you know about the HTTP protocol connection problem in http1.0 http1.1 http2.0, the above description is not difficult to understand.) )

The upstream of the cluster is set to

# here to set the keepalive indicates the number of connections between this machine and the back-end Machine
# There are other settings, such as weights, load types, etc.
upstream wunaozai.cnblogs.com {
    server 192.168.25.106:888;
    Server 192.168.25.100:888;
    Server 192.168.9.201:888;
    KeepAlive 1000;
}

The server Agent is set to

server {
    listen backlog=168888;
    server_name localhost2;
    Location ~. *$ {
        index index;
        Proxy_pass http://wunaozai.cnblogs.com;
        Proxy_set_header Connection "keep-alive";
        Proxy_http_version 1.1;
        Proxy_ignore_client_abort on;
        #下面的timeout跟自己的业务相关设置对应的timeout
        proxy_connect_timeout;
        Proxy_read_timeout;
        Proxy_send_timeout
    }
}

Resources:

Http://datacratic.com/site/blog/1m-qps-nginx-and-ubuntu-1204-ec2

Http://serverfault.com/questions/408546/how-to-achieve-500k-requests-per-second-on-my-webserver

https://lowlatencyweb.wordpress.com/2012/03/20/500000-requestssec-modern-http-servers-are-fast/

http://blog.jobbole.com/87531/

Https://github.com/wg/wrk

Author: Xiao Ming
Source: http://www.cnblogs.com/wunaozai/p/6073731.html
Copyright author and blog Park in common, Huan Ying reprint, but without the consent of the author must retain this paragraph of the statement, and in the article page obvious location to the original connection, otherwise retain the right to pursue legal responsibility. &NBSP
If there are any errors in the article, please note that. Lest more people be misled.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Nginx single-million QPS environment to build

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Nginx single-million QPS environment to build

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support