Http://www.ibm.com/developerworks/cn/linux/l-hisock.html
When developing a socket application, the primary task is usually to ensure reliability and meet specific requirements. With the 4 tips given in this article, you can design and develop a socket program from scratch to achieve optimal performance. This article includes the use of the Sockets API, two socket options to improve performance, and gnu/linux optimizations.
To be able to develop applications with excellent performance, follow these tips:
- Minimizing delay of message transmission.
- Minimizes the load on system calls.
- Adjusts the TCP window for Bandwidth Delay Product.
- Dynamically optimizes the Gnu/linux TCP/IP stack.
Tip 1. Minimizing delay of message transmission
When communicating over a TCP socket, the data is split into chunks so that they can be encapsulated in a given connection's TCP payload (referred to as payload in TCP packets). The size of the TCP payload depends on several factors, such as the maximum message length and path, but these factors are known when the connection is initiated. To achieve the best performance, our goal is to populate each message with as much data as available. When there is not enough data to fill payload (also known as the maximum segment length (maximum segment size) or MSS), TCP uses the Nagle algorithm to automatically connect some small buffers to a segment of the message. This can improve the efficiency of the application by minimizing the number of messages sent, and alleviate the overall network congestion problem.
Although John Nagle's algorithm can minimize the number of messages sent by connecting the data to larger messages, sometimes you may want to send only smaller messages. A simple example is the Telnet program, which allows the user to interact with the remote system, usually through a shell. If a user is asked to fill a message segment with the characters entered before the message is sent, this method will definitely not meet our needs.
Another example is the HTTP protocol. Typically, the client browser produces a small request (an HTTP request message), and the Web server returns a larger response (Web page).
Solution Solutions
The first thing you should consider is that the Nagle algorithm satisfies a requirement. Because this algorithm merges the data, it attempts to form a complete TCP segment, so it introduces some delay. However, this algorithm minimizes the number of packets sent on the line, thus minimizing the problem of network congestion.
However, the Sockets API can provide a solution in cases where the transmission delay needs to be minimized. To disable the Nagle algorithm, you can set the TCP_NODELAY
socket option, as shown in Listing 1.
Listing 1. Disabling the Nagle algorithm for a TCP socket
Socket setsockopt (Sock, Ipproto_tcp, Tcp_nodelay, (char *) &flag, sizeof (flag)); if (ret = =-1) { printf ("couldn" t setsockopt (tcp_n Odelay) \ n "); Exit (-1);}
tip: experiments with samba show that disabling the Nagle algorithm can almost double the read performance when reading data from a samba drive on a microsoft®windows® server.
Back to top of page
Tip 2. Minimizing the load on system calls
Whenever you read and write data through a socket, you are using a system call. This invocation (for example, read
or write
) spans the boundaries of the user-space application and the kernel. In addition, before entering the kernel, your call passes through the C library to a common function () in the kernel system_call()
. From system_call()
there, this call goes to the filesystem layer where the kernel determines which type of device is being processed. Finally, the call enters the socket layer where the data is read or queued for transmission through the socket (this involves a copy of the data).
This process shows that system calls are not just operational in the application and kernel, but also through many layers in the application and the kernel. This process consumes a lot of resources, so the more calls you make, the longer it takes to work through the call chain and the lower the performance of the application.
Since we cannot avoid these system calls, the only option is to minimize the number of times these calls are used. Fortunately, we can take control of the process.
Solution Solutions
When writing data to a socket, try to write all of the data one at a time instead of performing multiple write data operations. For a read operation, it is best to pass in the maximum buffer that can be supported, because if there is not enough data, the kernel will attempt to populate the entire buffer (in addition to keeping the TCP Advertisement window open). This way, you can minimize the number of calls and achieve better overall performance.
Back to top of page
Tip 3. To adjust the TCP window for Bandwidth Delay Product
The performance of TCP depends on several factors. The two most important factors are the link bandwidth (link bandwidth)(the rate at which messages are transmitted over the network) and round-trip times (round-trip time) or RTT (the delay between the sending of a message and the response received to the other end). These two values determine what is called the Bandwidth Delay Product(BDP).
Given the link bandwidth and RTT, you can calculate the value of the BDP, but what does that mean? The BDP gives a simple way to calculate the theoretically optimal TCP socket buffer size (which holds the data queued for transmission and waiting for the application to receive). If the buffer is too small, then the TCP window cannot be fully opened, which limits performance. If the buffer is too large, valuable memory resources can be wasted. If you set the buffer size exactly right, you can take full advantage of the available bandwidth. Let's look at an example:
BDP = link_bandwidth * RTT
If the application is communicating over a 100Mbps LAN, and its RRT is in MS, then the BDP is:
100MBps * 0.050 sec / 8 = 0.625MB = 625KB
Note: dividing this here by 8 is the byte that converts the bits into the traffic used.
Therefore, we can set the TCP window to BDP or 1.25MB. However, the default TCP window size on Linux 2.6 is 110KB, which limits the bandwidth of the connection to 2.2MBps and is calculated as follows:
throughput = window_size / RTT
110KB / 0.050 = 2.2MBps
If we use the window size calculated above, we get the bandwidth is 12.5MBps, the calculation method is as follows:
625KB / 0.050 = 12.5MBps
The difference is really large and can provide more throughput for the socket. So now you know how to calculate the optimal buffer size for your socket. But how to change it?
Solution Solutions
The Sockets API provides several socket options, two of which can be used to modify the size of the socket's send and receive buffers. Listing 2 shows how to use the SO_SNDBUF
and SO_RCVBUF
options to adjust the size of the send and receive buffers.
Note: Although the size of the socket buffer determines the size of the advertised TCP window, TCP also maintains a congestion window within the notification window. Therefore, due to the presence of this congested window, the given socket may never take advantage of the largest advertised window.
Listing 2. Manually set the send and receive socket buffer size
Socket setsockopt (Sock, Sol_socket, So_sndbuf, setsockopt (Sock, Sol_socket, So_rcvbuf, (char *) &sock_buf_size, sizeof (sock_buf_size));
In the Linux 2.6 kernel, the size of the send buffer is defined by the calling user, but the receive buffer is automatically doubled. You can make getsockopt
calls to verify the size of each buffer.
Jumbo Frame (Jumbo frame)
We can also consider changing the size of the package from 1,500 bytes to 9,000 bytes (called Jumbo frames). Jumbo frames can be set in the local network by setting the maximum transmission unit (Maximum transmit UNIT,MTU), which can greatly improve performance.
For window scaling, TCP can initially support windows up to 64KB (using a 16-bit value to define the size of the window). With Window scaling (RFC 1323) extension, you can use a 32-bit value to represent the size of the window. The TCP/IP stack provided in Gnu/linux can support this option (along with some other options).
Tips: The Linux kernel also includes the ability to automatically optimize these socket buffers (see table 1 below tcp_rmem
and tcp_wmem
), but these options affect the entire stack. If you only need to adjust the size of the window for a single connection or a class of connections, this mechanism may not be sufficient for your needs.
Back to top of page
Tip 4. Dynamic optimization of gnu/linux TCP/IP stacks
The standard Gnu/linux release attempts to optimize a variety of deployment scenarios. This means that the standard distribution may not have been specifically optimized for your environment.
Solution Solutions
Gnu/linux provides a number of adjustable kernel parameters that you can use to dynamically configure the operating system for your own use. Let's look at some of the more important options that affect socket performance.
/proc
There are some adjustable kernel parameters in the virtual file system. Each file in the file system represents one or more parameters that can be read by the cat
tool or modified using a echo
command. Listing 3 shows how to query or enable an adjustable parameter (in this case, IP forwarding can be enabled in the TCP/IP stack).
Listing 3. Tuning: Enabling IP Forwarding in the TCP/IP stack
[Email protected]]# cat/proc/sys/net/ipv4/ip_forward0[[email protected]]# echo "1" >/poc/sys/net/ipv4/ip_forward [Email protected]]# cat/proc/sys/net/ipv4/ip_forward1[[email protected]]#
Table 1 shows a few adjustable parameters that can help you improve the performance of your Linux TCP/IP stack.
table 1. The adjustable kernel parameters used by TCP/IP stack performance
Adjustable Parameters |
Default Value |
Option Description |
/proc/sys/net/core/rmem_default |
"110592" |
Defines the default receive window size, which should be larger for larger BDP. |
/proc/sys/net/core/rmem_max |
"110592" |
Defines the maximum size of the receive window, which should be larger for larger BDP. |
/proc/sys/net/core/wmem_default |
"110592" |
Defines the default send window size, which should be larger for larger BDP. |
/proc/sys/net/core/wmem_max |
"110592" |
Defines the maximum size of the sending window, which should be larger for larger BDP. |
/proc/sys/net/ipv4/tcp_window_scaling |
"1" |
Enables window scaling defined by RFC 1323, which must be enabled to support Windows larger than 64KB. |
/proc/sys/net/ipv4/tcp_sack |
"1" |
Enable selective response (selective acknowledgment), which can improve performance by selectively answering packets received by a random order (which allows the sender to send only the missing segment); (for WAN traffic) This option should be enabled, but this will increase the CPU The occupation. |
/proc/sys/net/ipv4/tcp_fack |
"1" |
Enable forward reply (Forward acknowledgment), which can be selectively answered (SACK) to reduce congestion; This option should also be enabled. |
/proc/sys/net/ipv4/tcp_timestamps |
"1" |
A more precise method of sending timeouts in a specific proportion (see RFC 1323) to enable calculation of RTT; This option should be enabled for better performance. |
/proc/sys/net/ipv4/tcp_mem |
"24576 32768 49152" |
Determine how the TCP stack should reflect memory usage, and the units of each value are memory pages (typically 4KB). The first value is the lower limit of memory usage. The second value is the upper limit of the applied pressure that the memory pressure pattern begins to use for the buffer. The third value is the upper memory limit. At this level, messages can be discarded, thereby reducing the use of memory. For larger BDP You can increase these values (but remember that their units are memory pages, not bytes). |
/proc/sys/net/ipv4/tcp_wmem |
"4096 16384 131072" |
Defines the memory used by each socket for automatic tuning. The first value is the minimum number of bytes allocated for the send buffer of the socket. The second value is the default value, which is wmem_default overwritten, and the buffer can grow to this value if the system load is not heavy. The third value is the maximum number of bytes in the Send buffer space (this value is wmem_max overwritten). |
/proc/sys/net/ipv4/tcp_rmem |
"4096 87380 174760" |
tcp_wmem is similar, but it represents the value of the receive buffer used for automatic tuning. |
/proc/sys/net/ipv4/tcp_low_latency |
"0" |
Allow TCP/IP stacks to accommodate low latency in high throughput situations; This option should be disabled. |
/proc/sys/net/ipv4/tcp_westwood |
"0" |
Enables the sender-side congestion control algorithm, which maintains the evaluation of throughput and attempts to optimize the overall utilization of bandwidth, which should be enabled for WAN traffic. |
/proc/sys/net/ipv4/tcp_bic |
"1" |
Enables Binary increase congestion for fast, long-distance networks, which makes better use of links that operate at GB speed, which should be enabled for WAN traffic. |
As with any tuning effort, the best approach is actually to keep experimenting. The behavior of your application, the speed of the processor, and the amount of available memory will affect how these parameters affect performance. In some cases, the actions you think are beneficial may be just as harmful (and vice versa). Therefore, we need to experiment with each option individually and then examine the results of each option. In other words, we need to trust our own experience, but we have to verify each modification.
Tip: Here's a question about permanent configuration. Note that if you restart the Gnu/linux system, any adjustable kernel parameters you need will revert to the default values. To use the values you set as the default values for these parameters, you can /etc/sysctl.conf
configure these parameters to the values you set when the system starts.
Back to top of page
Gnu/linux Tools
Gnu/linux is very attractive to me because there are a lot of tools that can be used. Although most of them are command-line tools, they are very useful and intuitive. Gnu/linux provides several tools-some of which are provided by Gnu/linux themselves, some open source software-for debugging Web applications, measuring bandwidth/throughput, and checking the use of links.
Table 2 lists some of the most useful gnu/linux tools and their purpose. Table 3 lists a few useful tools that are not available in the Gnu/linux release. For more information about the tools in table 3, see Resources.
table 2. ren Tools that can be found in the Gnu/linux release
gnu/linux Tools |
Purpose |
ping |
This is the most common tool for checking the availability of hosts, but can also be used to identify RTT for bandwidth-deferred product calculations. |
traceroute |
Prints a path (route) of a series of routers and gateways that are connected to a network host to determine each HO The delay between p. |
netstat |
determine various statistics about network subsystems, protocols, and connections. |
tcpdump |
displays message tracking information for one or more connected protocol-level messages, which also includes time information that you can use to study different co- The message time of the service. |
Table 3. Useful performance tools not available in the Gnu/linux release
Gnu/linux Tools |
Use |
netlog |
Provides some information about the performance of the network to the application. |
nettimer |
Generate a metric for bottleneck link bandwidth, which can be used for automatic optimization of protocols. |
Ethereal |
Provides the tcpump characteristics of (message tracking) in an easy-to-use graphical interface. |
iperf |
Measure the network performance of TCP and UDP, measure the maximum bandwidth, and report latency and loss of datagrams. |
Back to top of page
Conclusion
Try using the techniques and techniques described in this article to improve the performance of the socket application, including by disabling the Nagle algorithm to reduce the transmission delay, by setting the buffer size to improve the utilization of the socket bandwidth, and by minimizing the number of system calls to reduce the load on the system call, And the use of adjustable kernel parameters to optimize the Linux TCP/IP stack.
You also need to consider the characteristics of your application when you are optimizing. For example, is your application LAN-based or does it communicate over the Internet? If your application is only operating inside the LAN, increasing the size of the socket buffer may not be much of an improvement, but enabling jumbo frames will definitely improve performance dramatically!
Finally, you will also use tcpdump
or Ethereal
to check the results after optimization. Changes seen at the message level can help demonstrate the success of using these technologies for optimization.
Improve socket performance on Linux