How to improve socket performance on Linux

Source: Internet
Author: User
Tags echo command rfc cpu usage telnet program


4 ways to accelerate network applications

Using the Sockets API, we can develop client and server applications that can communicate on the local network or on a global scale over the Internet. As with other APIs, you can use the Sockets API in a number of ways to improve the performance of the socket or to limit the performance of the socket. This article explores 4 ways to use the Sockets API to get the maximum performance of an application and to Gnu/linux. Optimization of the environment to achieve the best results.


When developing a socket application, the primary task is usually to ensure reliability and meet specific requirements. With the 4 tips given in this article, you can design and develop a socket program from scratch to achieve optimal performance. This article includes the use of the Sockets API, two socket options to improve performance, and gnu/linux optimizations.

To be able to develop applications with excellent performance, follow these tips:

Minimizing delay of message transmission.

Minimizes the load on system calls.

Adjusts the TCP window for Bandwidth Delay Product.

Dynamically optimizes the Gnu/linux TCP/IP stack.

Tip 1. Minimizing delay of message transmission


When communicating over a TCP socket, the data is split into chunks so that they can be encapsulated in a given connection's TCP payload (referred to as payload in TCP packets). The size of the TCP payload depends on several factors, such as the maximum message length and path, but these factors are known when the connection is initiated. To achieve the best performance, our goal is to populate each message with as much data as available. When there is not enough data to fill payload (also known as the maximum segment length (maximum segment size) or MSS), TCP uses the Nagle algorithm to automatically connect some small buffers to a segment of the message. This can improve the efficiency of the application by minimizing the number of messages sent, and alleviate the overall network congestion problem.

Although John Nagle's algorithm can minimize the number of messages sent by connecting the data to larger messages, sometimes you may want to send only smaller messages. A simple example is the Telnet program, which allows the user to interact with the remote system, usually through a shell. If a user is asked to fill a message segment with the characters entered before the message is sent, this method will definitely not meet our needs.

Another example is the HTTP protocol. Typically, the client browser produces a small request (an HTTP request message), and the Web server returns a larger response (Web page).

Solution Solutions


The first thing you should consider is that the Nagle algorithm satisfies a requirement. Because this algorithm merges the data, it attempts to form a complete TCP segment, so it introduces some delay. However, this algorithm minimizes the number of packets sent on the line, thus minimizing the problem of network congestion.

However, the Sockets API can provide a solution in cases where the transmission delay needs to be minimized. To disable the Nagle algorithm, you can set the tcp_nodelay socket option, as shown in Listing 1.

Listing 1. Disabling the Nagle algorithm for a TCP socket


int sock, flag, ret;

/* Create NEW stream Socket */

Sock = socket (af_inet, sock_stream, 0);

/* Disable the Nagle (TCP No Delay) algorithm */

flag = 1;

ret = setsockopt (sock, Ipproto_tcp, Tcp_nodelay, (char *) &flag, sizeof (flag));

if (ret = =-1) {

printf ("Couldn ' t setsockopt (tcp_nodelay) \ n");

Exit (-1);

}

Tip: Experiments using Samba show that the Microsoft? Windows? Disabling the Nagle algorithm can almost double the read performance when reading data on a Samba drive on the server.


Tip 2. Minimizing the load on system calls


Whenever you read and write data through a socket, you are using a system call. This invocation, such as read or write, spans the boundaries of the user-space application and the kernel. In addition, before entering the kernel, your call passes through the C library to a common function (System_call ()) in the kernel. From System_call (), this call goes to the filesystem layer, where the kernel determines which type of device is being processed. Finally, the call enters the socket layer where the data is read or queued for transmission through the socket (this involves a copy of the data).

This process shows that system calls are not just operational in the application and kernel, but also through many layers in the application and the kernel. This process consumes a lot of resources, so the more calls you make, the longer it takes to work through the call chain and the lower the performance of the application.

Since we cannot avoid these system calls, the only option is to minimize the number of times these calls are used. Fortunately, we can take control of the process.

Solution Solutions


When writing data to a socket, try to write all of the data one at a time instead of performing multiple write data operations. For a read operation, it is best to pass in the maximum buffer that can be supported, because if there is not enough data, the kernel will attempt to populate the entire buffer (in addition to keeping the TCP Advertisement window open). This way, you can minimize the number of calls and achieve better overall performance.


Tip 3. To adjust the TCP window for Bandwidth Delay Product


The performance of TCP depends on several factors. The two most important factors are the link bandwidth (link bandwidth) (the rate at which messages are transmitted over the network) and round-trip times (round-trip time) or RTT (the delay between the sending of a message and the response received to the other end). These two values determine what is called the Bandwidth Delay Product (BDP).

Given the link bandwidth and RTT, you can calculate the value of the BDP, but what does that mean? The BDP gives a simple way to calculate the theoretically optimal TCP socket buffer size (which holds the data queued for transmission and waiting for the application to receive). If the buffer is too small, then the TCP window cannot be fully opened, which limits performance. If the buffer is too large, valuable memory resources can be wasted. If you set the buffer size exactly right, you can take full advantage of the available bandwidth. Let's look at an example:

BDP = Link_bandwidth * RTT

If the application is communicating over a 100Mbps LAN, and its RRT is in MS, then the BDP is:

100MBps * 0.050 SEC/8 = 0.625MB = 625KB

Note: Dividing this here by 8 is the byte that converts the bits into the traffic used.

Therefore, we can set the TCP window to BDP or 1.25MB. However, the default TCP window size on Linux 2.6 is 110KB, which limits the bandwidth of the connection to 2.2MBps and is calculated as follows:

Throughput = Window_size/rtt


110kb/0.050 = 2.2MBps

If we use the window size calculated above, we get the bandwidth is 12.5MBps, the calculation method is as follows:

625kb/0.050 = 12.5MBps

The difference is really large and can provide more throughput for the socket. So now you know how to calculate the optimal buffer size for your socket. But how to change it?

Solution Solutions


The Sockets API provides several socket options, two of which can be used to modify the size of the socket's send and receive buffers. Listing 2 shows how to use the SO_SNDBUF and SO_RCVBUF options to adjust the size of the send and receive buffers.

Note: Although the size of the socket buffer determines the size of the advertised TCP window, TCP also maintains a congestion window within the notification window. Therefore, due to the presence of this congested window, the given socket may never take advantage of the largest advertised window.

Listing 2. Manually set the send and receive socket buffer size


int ret, sock, Sock_buf_size;

Sock = socket (af_inet, sock_stream, 0);

Sock_buf_size = BDP;

ret = setsockopt (sock, Sol_socket, So_sndbuf,

(char *) &sock_buf_size, sizeof (sock_buf_size));

ret = setsockopt (sock, Sol_socket, So_rcvbuf,

(char *) &sock_buf_size, sizeof (sock_buf_size));

In the Linux 2.6 kernel, the size of the send buffer is defined by the calling user, but the receive buffer is automatically doubled. You can make getsockopt calls to verify the size of each buffer.

Jumbo Frame (Jumbo frame)


We can also consider changing the size of the package from 1,500 bytes to 9,000 bytes (called Jumbo frames). Jumbo frames can be set in the local network by setting the maximum transmission unit (Maximum transmit UNIT,MTU), which can greatly improve performance.

For window scaling, TCP can initially support windows up to 64KB (using a 16-bit value to define the size of the window). With Window scaling (RFC 1323) extension, you can use a 32-bit value to represent the size of the window. The TCP/IP stack provided in Gnu/linux can support this option (along with some other options).

Tip: The Linux kernel also includes the ability to automatically optimize these socket buffers (see TCP_RMEM and Tcp_wmem in table 1 below), but these options affect the entire stack. If you only need to adjust the size of the window for a single connection or a class of connections, this mechanism may not be sufficient for your needs.

Tip 4. Dynamic optimization of gnu/linux TCP/IP stacks


The standard Gnu/linux release attempts to optimize a variety of deployment scenarios. This means that the standard distribution may not have been specifically optimized for your environment.

Solution Solutions


Gnu/linux provides a number of adjustable kernel parameters that you can use to dynamically configure the operating system for your own use. Let's look at some of the more important options that affect socket performance.

There are some adjustable kernel parameters in the/proc virtual file system. Each file in the file system represents one or more parameters that can be read through the Cat tool or modified using the echo command. Listing 3 shows how to query or enable an adjustable parameter (in this case, IP forwarding can be enabled in the TCP/IP stack).

Listing 3. Tuning: Enabling IP Forwarding in the TCP/IP stack


[[Email protected]]# Cat/proc/sys/net/ipv4/ip_forward

0

[[Email protected]]# echo ' 1 ' >/poc/sys/net/ipv4/ip_forward

[[Email protected]]# Cat/proc/sys/net/ipv4/ip_forward

1

[[Email protected]]#

Table 1 shows a few adjustable parameters that can help you improve the performance of your Linux TCP/IP stack.

Table 1. The adjustable kernel parameters used by TCP/IP stack performance

Adjustable parameter default value option description

/proc/sys/net/core/rmem_default "110592" defines the default receive window size, which should be larger for larger BDP.

/proc/sys/net/core/rmem_max "110592" defines the maximum size of the receive window, which should be larger for larger BDP.

/proc/sys/net/core/wmem_default "110592" defines the default Send window size, which should be larger for larger BDP.

/proc/sys/net/core/wmem_max "110592" defines the maximum size of the sending window; For larger BDP, this size should be larger.

/proc/sys/net/ipv4/tcp_window_scaling "1" enables window scaling defined by RFC 1323, which must be enabled to support Windows larger than 64KB.

/proc/sys/net/ipv4/tcp_sack "1" enables selective answer (selective acknowledgment), This can improve performance by selectively responding to messages received by a random order (which allows the sender to send only the missing segment); (for WAN traffic) This option should be enabled, but this increases CPU usage.

/proc/sys/net/ipv4/tcp_fack "1" enables forward reply (Forward acknowledgment), which can be selectively answered (SACK) to reduce the occurrence of congestion This option should also be enabled.

/proc/sys/net/ipv4/tcp_timestamps "1" is a more accurate method of sending timeouts in a specific proportion (see RFC 1323) to enable calculation of RTT; This option should be enabled for better performance.

/proc/sys/net/ipv4/tcp_mem "24576 32768 49152" determines how the TCP stack should reflect memory usage; the units of each value are memory pages (usually 4KB). The first value is the lower limit of memory usage. The second value is the upper limit of the applied pressure that the memory pressure pattern begins to use for the buffer. The third value is the upper memory limit. At this level, messages can be discarded, thereby reducing the use of memory. For larger BDP You can increase these values (but remember that their units are memory pages, not bytes).

/proc/sys/net/ipv4/tcp_wmem "4096 16384 131072" defines the memory used by each socket for automatic tuning. The first value is the minimum number of bytes allocated for the send buffer of the socket. The second value is the default value (the value is overwritten by Wmem_default) and the buffer can grow to this value if the system load is not heavy. The third value is the maximum number of bytes in the Send buffer space (the value is overwritten by Wmem_max).

/proc/sys/net/ipv4/tcp_rmem "4096 87380 174760" is similar to Tcp_wmem, but it represents the value of the receive buffer used for automatic tuning.

/proc/sys/net/ipv4/tcp_low_latency "0" allows TCP/IP stacks to accommodate low latency in high throughput situations; This option should be disabled.

The /proc/sys/net/ipv4/tcp_westwood "0 " enables the sender-side congestion control algorithm, which maintains the evaluation of throughput and attempts to optimize the overall utilization of bandwidth; for WAN This option should be enabled for communication.

/proc/sys/net/ipv4/tcp_bic "1" enables Binary increase congestion for fast, long-distance networks, which makes better use of links that operate at GB speed, and for WAN Communication should enable this option.

As with any tuning effort, the best approach is actually to keep experimenting. The behavior of your application, the speed of the processor, and the amount of available memory will affect how these parameters affect performance. In some cases, the actions you think are beneficial may be just as harmful (and vice versa). Therefore, we need to experiment with each option individually and then examine the results of each option. In other words, we need to trust our own experience, but we have to verify each modification.

Tip: Here's a question about permanent configuration. Note that if you restart the Gnu/linux system, any adjustable kernel parameters you need will revert to the default values. To use the values you set as the default values for these parameters, you can configure these parameters to the values you set when the system starts by using/etc/sysctl.conf.

Gnu/linux Tools


Gnu/linux is very attractive to me because there are a lot of tools that can be used. Although most of them are command-line tools, they are very useful and intuitive. Gnu/linux provides several tools-some of which are provided by Gnu/linux themselves, some open source software-for debugging Web applications, measuring bandwidth/throughput, and checking the use of links.

Table 2 lists some of the most useful gnu/linux tools and their purpose. Table 3 lists a few useful tools that are not available in the Gnu/linux release. For more information about the tools in table 3, see Resources.

Table 2. Tools that can be found in any Gnu/linux release

Gnu/linux tool use

Ping This is the most common tool used to check the availability of a host, but it can also be used to identify the RTT for bandwidth-delayed product calculations.

Traceroute Prints a path (route) that is connected to a network host that includes a series of routers and gateways to determine the delay between each hop.

Netstat determines the various statistical information about network subsystems, protocols, and connections.

tcpdump Displays message tracking information for one or more connected protocol-level messages, including time information that you can use to study the message time of different protocol services.

Table 3. Useful performance tools not available in the Gnu/linux release

Gnu/linux tool use

Netlog provides some information about the performance of the network to the application.

Nettimer generates a metric for bottleneck link bandwidth, which can be used for automatic optimization of protocols.

The Ethereal provides tcpump (message tracking) features in an easy-to-use graphical interface.

The Iperf measures the network performance of TCP and UDP, measures the maximum bandwidth, and reports on latency and loss of datagrams.

Back to top of page

Conclusion


Try using the techniques and techniques described in this article to improve the performance of the socket application, including by disabling the Nagle algorithm to reduce the transmission delay, by setting the buffer size to improve the utilization of the socket bandwidth, and by minimizing the number of system calls to reduce the load on the system call, And the use of adjustable kernel parameters to optimize the Linux TCP/IP stack.

You also need to consider the characteristics of your application when you are optimizing. For example, is your application LAN-based or does it communicate over the Internet? If your application is only operating inside the LAN, increasing the size of the socket buffer may not be much of an improvement, but enabling jumbo frames will definitely improve performance dramatically!

Finally, use tcpdump or Ethereal to check the results after optimization. Changes seen at the message level can help demonstrate the success of using these technologies for optimization.


How to improve socket performance on Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.