Improve socket performance on Linux-accelerate network applications

Last Update:2017-06-19 Source: Internet

Author: User

Tags telnet program

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Article title: improve socket performance on Linux-accelerate network applications. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.

When developing a socket application, the first task is to ensure reliability and meet specific requirements. With the four tips given in this article, you can design and develop a socket program for optimal performance from the beginning. This article includes the use of Sockets APIs, two socket options that can improve performance, and GNU/Linux optimization.

Follow these skills to develop applications with superior performance:

Minimize the delay of message transmission.
Minimize the load of system calls.
Adjusts the TCP window for the Bandwidth Delay Product.
Dynamic Optimization of the GNU/Linux TCP/IP stack.

　　Tip 1. minimize packet transmission latency

When communication is performed through TCP socket, data is split into data blocks so that they can be encapsulated into the TCP payload (the payload in the TCP packet) of the given connection. The size of TCP payload depends on several factors (such as the maximum message length and path), but these factors are known when the connection is initiated. To achieve the best performance, we aim to fill each packet with as much available data as possible. When there is not enough data to fill in payload (also known as the maximum segment length (maximum segment size) or MSS ), TCP uses the Nagle algorithm to automatically connect some small buffers to a packet segment. In this way, the application efficiency can be improved by minimizing the number of sent packets, and the overall network congestion problem can be reduced.

Although John Nagle's algorithm can connect the data to a larger packet to minimize the number of sent packets, sometimes you may want to send only a smaller packet. A simple example is the telnet program, which allows users to interact with a remote system. this is usually done through a shell. If a user is required to fill a segment with characters entered before sending the packet, this method cannot meet our needs.

Another example is the HTTP protocol. Generally, the client browser generates a small request (an HTTP request message), and the Web server returns a larger response (Web page ).

　　Solution

The first thing you should consider is that the Nagle algorithm meets a requirement. Because this algorithm combines data and tries to form a complete TCP packet segment, it introduces some latency. However, this algorithm can minimize the number of packets sent online and thus minimize network congestion.

However, the Sockets API provides a solution to minimize the transmission latency. To disable the Nagle algorithm, you can setTCP_NODELAYSocket options, as shown in listing 1.

　　Listing 1. disable the Nagle algorithm for TCP socket

Int sock, flag, ret;

/* Create new stream socket */
Sock =Socket(AF_INET, SOCK_STREAM, 0 );

/* Disable the Nagle (TCP No Delay) algorithm */
Flag = 1;
Ret =Setsockopt(Sock, IPPROTO_TCP, TCP_NODELAY, (char *) & flag, sizeof (flag ));

If (ret =-1 ){
Printf ("Couldn't setsockopt (TCP_NODELAY) \ n ");
Exit (-1 );
}

Tip:The Samba experiment shows that®Windows®When reading data on the Samba drive on the server, disabling the Nagle algorithm can almost double the read performance.

　　Tip 2. minimize the load of system calls

When you use a socket to read and write data, you are using a system call ). This call (for examplereadOrwriteAcross the boundaries between the user space application and the kernel. In addition, before entering the kernel, your call will use the C library to enter a common function (system_call()). Slavesystem_call(), This call will enter the file system layer, and the kernel will determine the type of device being processed here. Finally, the call enters the socket layer, where data is read or queued for transmission through the socket (this involves data copies ).

This process indicates that the system call is not only performed in the application and kernel, but also through many layers in the application and kernel. This process consumes a lot of resources, so the more calls, the longer the time required to work through this call chain, the lower the performance of the application.

Since we cannot avoid these system calls, the only choice is to minimize the number of times these calls are used. Fortunately, we can control this process.

　　Solution

When writing data to a socket, try to write all the data at a time, instead of performing multiple write operations. For read operations, it is best to pass in the maximum buffer that can be supported, because if there is not enough data, the kernel will also try to fill the entire buffer (and also need to keep the TCP notification window open ). In this way, you can minimize the number of calls and achieve better overall performance.

　　Tip 3. adjust the TCP window for the Bandwidth Delay Product

The performance of TCP depends on several factors. The two most important factors are link bandwidth (packet transmission rate over the network) and round-trip time) or RTT (the delay between sending a message and receiving a response from the other end ). The two values determine the content called Bandwidth Delay Product (BDP.

Given the link bandwidth and RTT, you can calculate the BDP value. what does this mean? BDP provides a simple method to calculate the theoretically optimal TCP socket buffer size (which stores the data waiting for transmission and waiting for the application to receive ). If the buffer is too small, the TCP window cannot be fully opened, which will limit the performance. If the buffer area is too large, valuable memory resources will be wasted. If the buffer size you set is suitable, you can fully utilize the available bandwidth. Here is an example:BDP = link_bandwidth * RTT

If an application communicates over a 100 Mbps LAN, its RRT is 50 MS, then BDP is:100MBps * 0.050 sec / 8 = 0.625MB = 625KB

Note:Dividing by 8 is the byte used for communication.

Therefore, you can set the TCP window to BDP or 1.25 MB. However, in Linux 2.6, the default TCP window size is 2.2 KB, which limits the connection bandwidth to MBps. the calculation method is as follows:

throughput = window_size / RTT 110KB / 0.050 = 2.2MBps

If the window size calculated above is used, the bandwidth is 12.5 MBps. the calculation method is as follows:

625KB / 0.050 = 12.5MBps

The difference is indeed great, and it can provide a larger throughput for the socket. So now you know how to calculate the optimal buffer size for your socket. But how can we change it?

　　Solution

The Sockets API provides several socket options, two of which can be used to modify the size of the socket sending and receiving buffer. Listing 2 shows how to useSO_SNDBUFAndSO_RCVBUFTo adjust the size of the sending and receiving buffer.

Note:Although the size of the socket buffer determines the size of the advertised TCP window, TCP maintains a congestion window in the advertised window. Therefore, due to the existence of this congestion window, the given socket may never use the largest announcement window.

　　List 2. manually set the buffer size of the sending and receiving socket

Int ret, sock, sock_buf_size;

Sock =Socket(AF_INET, SOCK_STREAM, 0 );

Sock_buf_size = BDP;

Ret =Setsockopt(Sock, SOL_SOCKET, SO_SNDBUF,
(Char *) & sock_buf_size, sizeof (sock_buf_size ));

Ret =Setsockopt(Sock, SOL_SOCKET, SO_RCVBUF,
(Char *) & sock_buf_size, sizeof (sock_buf_size ));

In the Linux 2.6 kernel, the size of the sending buffer is defined by the caller, but the receiving buffer is automatically doubled. You cangetsockoptTo verify the size of each buffer.

For window scaling, TCP initially supports a window with a maximum size of 64 KB (use a 16-bit value to define the window size ). With window scaling (RFC 1323) extension, you can use a 32-bit value to indicate the window size. The TCP/IP stack provided in GNU/Linux supports this option (and other options ).

Tip:The Linux kernel also includes the ability to automatically optimize these socket buffers (see Table 1 below)tcp_rmemAndtcp_wmem), But these options will affect the entire stack. If you only need to adjust the window size for a connection or a type of connection, this mechanism may not meet your needs.

　　Tip 4. dynamically optimize the GNU/Linux TCP/IP stack

The standard GNU/Linux release attempts to optimize various deployment conditions. This means that the standard release may not have special optimizations to your environment.

　　Solution

GNU/Linux provides many adjustable kernel parameters that you can use to dynamically configure the operating system for your own purposes. Next, let's take a look at some of the more important options that affect socket performance.

In/procSome adjustable kernel parameters exist in the virtual file system. Each file in this file system represents one or more parameters.catTool to read or useechoCommand. Listing 3 shows how to query or enable an adjustable parameter (in this case, IP forwarding can be enabled on the TCP/IP stack ).

　　Listing 3. Optimization: enable IP forwarding in the TCP/IP stack

[Root @ camus] # cat/proc/sys/net/ipv4/ip_forward
0
[Root @ camus] # echo "1">/poc/sys/net/ipv4/ip_forward
[Root @ camus] # cat/proc/sys/net/ipv4/ip_forward
1
[Root @ camus] #

Table 1 provides several adjustable parameters that can help you improve the performance of the Linux TCP/IP stack.

Table 1. adjustable kernel parameters for TCP/IP stack performance
Adjustable parameters	Default value	Option description
`/proc/sys/net/core/rmem_default`	"110592"	Defines the default size of the receiving window. for larger BDP, this size should also be larger.
`/proc/sys/net/core/rmem_max`	"110592"	Defines the maximum size of the receiving window. for larger BDP, this size should also be larger.
`/proc/sys/net/core/wmem_default`	"110592"	Defines the default size of the sending window. for larger BDP, this size should also be larger.
`/proc/sys/net/core/wmem_max`	"110592"	Defines the maximum size of the sending window. for larger BDP, this size should also be larger.
`/proc/sys/net/ipv4/tcp_window_scaling`	"1"	Enable the window scaling defined in RFC 1323. to support Windows larger than 64 kB, this value must be enabled.
`/proc/sys/net/ipv4/tcp_sack`	"1"	Enable Selective Acknowledgment, which can improve the performance by selectively responding to messages received in disordered order (this allows the sender to send only lost packets ); (For Wan communication) this option should be enabled, but this will increase the CPU usage.
`/proc/sys/net/ipv4/tcp_fack`	"1"	Forward Acknowledgment can be enabled to select a response (SACK) to reduce congestion. this option should also be enabled.
`/proc/sys/net/ipv4/tcp_timestamps`	"1"	Enable RTT computing with a more precise method (see RFC 1323). This option should be enabled for better performance.
`/proc/sys/net/ipv4/tcp_mem`	24576 32768 49152"	Determine how the TCP stack reflects memory usage. the unit of each value is a memory page (usually 4 kB ). The first value is the lower limit of memory usage. The second value is the maximum application pressure on the buffer zone in memory pressure mode. The third value is the upper limit of memory. At this level, messages can be discarded to reduce memory usage. For larger BDP values, you can increase these values (but remember that the unit is the memory page, not the byte ).
`/proc/sys/net/ipv4/tcp_wmem`	4096 16384 131072"	Defines the memory used by each socket for automatic optimization. The first value is the minimum number of bytes allocated for the socket sending buffer. The second value is the default value (this value will be`wmem_default`The buffer can be increased to this value when the system load is not heavy. The third value is the maximum number of bytes in the sending buffer space (this value will be`wmem_max`OverWrite ).
`/proc/sys/net/ipv4/tcp_rmem`	4096 87380 174760"	And`tcp_wmem`Similar, but it indicates the value of the receiving buffer used for automatic tuning.
`/proc/sys/net/ipv4/tcp_low_latency`	"0"	Allow TCP/IP stack to adapt to high throughput and low latency; this option should be disabled.
`/proc/sys/net/ipv4/tcp_westwood`	"0"	Enable the sender's congestion control algorithm to maintain the Throughput evaluation and try to optimize the overall bandwidth utilization. this option should be enabled for WAN communication.
`/proc/sys/net/ipv4/tcp_bic`	"1"	Enable Binary Increase Congestion for a fast long-distance network; this allows for better access to links for operations at the GB speed; this option should be enabled for WAN communication.

Like any tuning effort, the best way is to continuously experiment. The behavior of your application, the speed of the processor, and the amount of memory available will affect the way these parameters affect performance. In some cases, beneficial operations may be harmful (and vice versa ). Therefore, we need to test each option one by one and then check the results of each option. In other words, we need to trust our own experience, but we need to verify each modification.

Tip:The following describes a permanent configuration issue. Note: If you restart the GNU/Linux system, any adjustable kernel parameters you need will be restored to the default value. You can use/etc/sysctl.confWhen the system starts, set these parameters to the values you set.

　　GNU/Linux tools

GNU/Linux is very attractive to me because there are many tools available. Although most of them are command line tools, they are both very useful and intuitive. GNU/Linux provides several tools-some of which are provided by GNU/Linux and some are open source software-used to debug network applications and measure bandwidth/throughput, and check the usage of the link.

Table 2 lists the most useful GNU/Linux tools and their usage. Table 3 lists several useful tools not provided by the GNU/Linux release. For more information about tools in Table 3, see references.

**Table 2. tools available in any GNU/Linux release**
GNU/Linux tools	Purpose
`ping`	This is the most common tool used to check the availability of the host, but it can also be used to identify the RTT for bandwidth delay product computing.
`traceroute`	Print the path (route) of a series of routers and gateways attached to a network host to determine the delay between each hop.
`netstat`	Determine statistical information about network subsystems, protocols, and connections.
`tcpdump`	Displays the Protocol-Level message tracing information of one or more connections. It also includes the time information, which you can use to study the packet time of different protocol services.

**Table 3. useful performance tools not provided in the GNU/Linux release**
GNU/Linux tools	Purpose
`netlog`	Provides some network performance information for applications.
`nettimer`	Generates a metric for the bandwidth of the bottleneck link. it can be used for automatic protocol optimization.
`Ethereal`	Provides an easy-to-use graphical interface`tcpump`(Packet tracking) features.
`iperf`	Measure the network performance of TCP and UDP, measure the maximum bandwidth, and report the loss of latency and datagram.

　　 Conclusion

Try to use the techniques and techniques described in this article to improve the performance of socket applications, including disabling the Nagle algorithm to reduce transmission latency and setting the buffer size to improve socket bandwidth utilization, reduce the load of system calls by minimizing the number of system calls, and optimize the Linux TCP/IP stack by using adjustable kernel parameters.

The features of the application must be considered during optimization. For example, will your application communicate over the Internet based on a LAN? If your application only operates within the LAN, increasing the size of the socket buffer may not significantly improve, but enabling the jumbo frame will definitely improve the performance!

Finally, usetcpdumpOrEtherealTo check the optimized results. The changes seen at the packet level can help demonstrate the successful results after optimization using these technologies.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More