Original in: http://rdc.taobao.com/blog/cs? P = 1062
Many TCP adjustment parameters are introduced here, which is very exciting.
For a server, we generally consider the QPS that it can support, but for an application, we need to focus on the number of connections that it can support, rather than the QPS, of course, QPS is also one of the performance points we need to consider. This type of application is common in message pushing systems, also known as comet applications, such as chat rooms or instant message pushing systems. For more information about the comet application, see my previous introduction. For such systems, many messages are pushed to the client only when they are generated. Therefore, when no messages are generated, you need to hold the client connection. In this way, when there are a large number of clients, you need to hold a large number of connections, which are called persistent connections.
First, let's analyze the system resources required for such services: CPU, network, and memory. Therefore, to achieve the optimal system performance, we must first find the bottleneck of the system. Such a persistent connection often does not send data, so it can also be viewed as an inactive connection. For the system, this kind of inactive connection does not occupy CPU and network resources, but only occupies the system memory. Therefore, we assume that the system can support the number of connections we want as long as the system memory is sufficient. Is that true? If this is the case, it is also a test for the kernel to maintain this large data structure.
To complete the test, we need a server with a large number of clients. Therefore, the server and client programs are required. To achieve this goal, I thought like this: the client creates a connection and initiates a request to the server. The server holds the connection without returning data.
1. Prepare the server
For the server, due to previous assumptions, we need a large memory server for deploying nginx comet applications. The following is my server:
✓ Summary: Dell r710, 2 x Xeon e5520 2.27 GHz, 23.5 GB/24 GB 1333 MHz
Using System: Dell poweredge r710 (Dell 0vwn1r)
Extends processors: 2 x Xeon e5520 2.27 GHz 5860 MHz FSB (16 cores)
Memory: 23.5 GB/24 GB 1333 MHz = 6x4 GB, 12 x empty
Portable disk-control: megaraid_sas0: Dell/lsilogic Perc 6/I, package 6.2.0-0013, FW 1.22.02-0612,
Protocol Network: eth0 (bnx2): Broadcom netxtreme II bcm5709 Gigabit Ethernet, 1000 MB/S
Operating OS: RHEL server 5.4 (tikanga), Linux 2.6.18-164. EL5 x86_64, 64-bit
The server program is very simple. It is a comet module written based on nginx. This module accepts user requests and maintains user connections without returning them. The nginx status module can be used directly to monitor the maximum number of connections.
The server also needs to adjust the system parameters in/etc/sysctl. conf:
Export net. Core. somaxconn = 2048
Export net. Core. rmem_default = 262144
Export net. Core. wmem_default = 262144
Export net. Core. rmem_max = 16777216
Export net. Core. wmem_max = 16777216
Ipvnet. ipv4.tcp _ rmem = 4096 4096 16777216
Ipvnet. ipv4.tcp _ WMEM = 4096 4096 16777216
Ipvnet. ipv4.tcp _ mem = 786432 2097152 3145728
Export net. ipv4.tcp _ max_syn_backlog = 16384
Export net. Core. netdev_max_backlog = 20000
Ipvnet. ipv4.tcp _ fin_timeout = 15
Export net. ipv4.tcp _ max_syn_backlog = 16384
Export net. ipv4.tcp _ tw_reuse = 1
Export net. ipv4.tcp _ tw_recycle = 1
Export net. ipv4.tcp _ max_orphans = 131072
Bytes
Valid/sbin/sysctl-P
Here, we mainly look at these items:
Net. ipv4.tcp _ rmem is used to configure the read buffer size. There are three values: the first is the minimum value of the read buffer, the third is the maximum value, and the middle is the default value. We can modify the read buffer size in the program, but it cannot exceed the minimum and maximum values. To minimize the amount of memory used by each socket, the default value is 4096.
Net. ipv4.tcp _ WMEM is used to configure the write buffer size.
The size of the read buffer and write buffer directly affects the memory usage of the socket in the kernel.
Net. ipv4.tcp _ mem is used to configure the memory size of TCP. Its unit is page, not byte. When the second value is exceeded, TCP enters the pressure mode. At this time, TCP tries to stabilize its memory usage. When the value is smaller than the first value, it exits the pressure mode. When the memory usage exceeds the third value, TCP rejects socket allocation. When you view dmesg, a lot of logs are displayed, "TCP: Too quota of orphaned sockets ".
In addition, net. the value 4.tcp _ max_orphans also needs to be set. This value indicates that the system can handle the number of sockets that do not belong to any process. When we need to quickly establish a large number of connections, you need to pay attention to this value. When the number of sockets that do not belong to any process exceeds this value, dmesg will see "Too sockets of orphaned sockets ".
In addition, the server needs to open a large number of file descriptors, such as 2 million. However, when we set the maximum file descriptor limit, we will encounter some problems. We will explain in detail later.
2. Prepare the client
Because we need to build a large number of clients, we know that the local port used to connect to a service on a system is limited. Because the port is a 16-bit integer, it can only be 0 to 65535, and 0 to 1023 are reserved ports, so only 1024 to 65534 can be allocated, that is, 64511. That is to say, one machine can only create more than 60 thousand persistent connections. To achieve our 2 million connection, we need about 34 clients.
Of course, we can use virtual IP addresses to implement so many clients. If it is a virtual IP address, each IP address can be bound to more than 60 thousand ports, and 34 virtual IP addresses can be used. Here, I applied for the company's resources, so I used a physical machine.
Because of the default system parameters, the number of ports automatically allocated is limited, from 32768 to 61000. Therefore, we need to change the parameters of the client/etc/sysctl. conf:
Export net. ipv4.ip _ local_port_range = 1024 65535
Bytes
Bytes/sbin/sysctl-P
The client program is a test program written based on libevent, which constantly creates new connection requests.
3. Because the client and server need to establish a large number of sockets, we need to speed up the maximum file descriptor.
The client needs to create more than 60 thousand sockets. After setting the maximum value to 100,000, add the following in/etc/security/limits. conf:
◦ Admin soft nofile 100000
Managing Admin hard nofile 100000
The server needs to create a 2 million connection, so I want to set nofile to 2 million. Okay, the problem is coming.
When nofile is set to 2 million, the system cannot log on directly. After several attempts, you can only set the maximum value to 1 million. After checking the source code, I realized that there was a macro definition before 2.6.25, which defined the maximum value of this value as 1024*1024, Which is exactly 1 million, in kernel 2.6.25 and later versions, this value can be set through/proc/sys/fs/nr_open. So I upgraded the kernel to 2.6.32. Ulimit detailed introduction see blog: Old talk: ulimit Problem and Its Impact: http://blog.yufeng.info/archives/1380
After upgrading the kernel, we continue to tune the kernel as follows:
Sudo bash-C 'echo 2000000>/proc/sys/fs/nr_open'
Now set nofile.
◦ Admin soft nofile 2000000
Managing Admin hard nofile 2000000
4. Finally, during the test, the configuration in server/sbin/sysctl is constantly adjusted based on the information played by the dmesg system. Finally, the test completes the persistent connection of 2 million.
To minimize memory usage, I changed the request_pool_size of nginx from 4 K to 1 K by default. In addition, the default values in net. ipv4.tcp _ WMEM and net. ipv4.tcp _ rmem are also set to 4 K.
2 million data is obtained through nginx monitoring during connection:
2 million system memory usage during connection:
5. Normal. When we configure nginx online, the request_pool_size of nginx needs to be adjusted according to the actual situation. The default values of Net. ipv4.tcp _ rmem and net. ipv4.tcp _ WMEM also need to be adjusted.