Discussion on the optimization method of TCP/IP

Last Update:2017-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Many people often have a sense of the feeling of TCP optimization, in fact, as long as the understanding of the operation of TCP can be opened its mystery. Ilya Grigorik in "high performance Browser networking" to do a lot of meticulous description, so that people read the sobering, I probably summed up, in order to more easily understandable.

Flow control

When transmitting data, if the sender transmits more data than the receiver can handle, the receiver will lose the packet. To avoid such problems, traffic control requires that both parties declare their own receive windows each time they interact. rwnd"size, used to indicate how much data you can save most, this is mainly for the receiving side, the popular point is to let the sender know that the receiver can eat a few bowls of rice, if the window attenuation to zero, then that the full, must digest digestion, if you will be able to relieve incontinence, it is lost bag.

Flow control

The name of the receiver and the sender is relative, if you stand on the user's point of view: When browsing the web, the data following the main line, at this time the client is the receiver, the server is the sender; When uploading files, the data above the main line, at this time the client is the sender, the server is the recipient side.

Slow start

Although the flow control can avoid the sender overload the receiver, but cannot avoid overload network, this is because the receiving window "rwnd" only reflects the situation of the individual server, but can not reflect the overall situation of the network.

In order to avoid the problem of overload network, slow start introduces the concept of congestion window "cwnd", which is used to indicate the data that the sender is allowed to transmit maximum before receiving confirmation. "cwnd" the Same " rwnd"is different: it's just an internal parameter to the sender, no need to notify the receiver, its initial value is often relatively small, and then with the packet received by the receiver confirmed that the window doubled, a bit similar to boxing, the beginning of the enemy, often is a fist temptation, slowly in the heart, began to increase the strength of the attack 。

Slow Start

In the slow start process, with the increase in "cwnd", there may be network overload, its external performance is lost packet, once such problems occur, the size of the "cwnd" will rapidly decay, so that the network can slow down.

Congestion avoidance

Note: The amount of unconfirmed data that is actually transmitted in the network depends on the small values in "rwnd" and "cwnd".

Congestion avoidance

From the introduction of slow start, we can see that the sender passed the " cwnd"size control, to avoid network overload, in this process, lost packet is not so much a network problem, rather than a feedback mechanism, through which we can perceive the network congestion, and then adjust the data transmission strategy, in fact, there is a slow start threshold. The concept of ssthresh", if the"cwnd"is less than"ssthresh", then it is in the slow start stage; If the"cwnd"is greater than"ssthresh", then the congestion avoidance phase," cwnd"is no longer like the slow start phase as a whole, but tends to linear growth, in order to avoid network congestion, this phase has a variety of algorithm implementations, usually keep the default can be, here is not one of the explanations, interested readers can check themselves.

...

How to adjust "rwnd" to a reasonable value

A lot of people have encountered the network transmission speed is too slow problem, for example, clearly is the hundred gigabit network, its maximum transmission of the theoretical value of the data also have to have a 10 trillion, but the actual situation is far apart, may be only a trillion. If this kind of problem eliminates the profiteers factor, mostly is because receives the window "rwnd" setting unreasonable cause.

The reasonable value of the receiving window "rwnd" actually depends on the size of the BDP, which is the product of bandwidth and latency. Assuming the bandwidth is 100Mbps and the latency is 100ms, the calculation process is as follows:

BDP = 100Mbps * 100ms = (100/8) * (100/1000) = 1.25MB

The size of the receive window "rwnd" should not be less than 1.25MB if you want to maximize the swallowing metric in this issue. Say Point extension content: TCP uses 16 bits to record window size, that is, the maximum is 64KB, if you exceed it, you need to use the tcp_window_scaling mechanism. Reference: TCP Windows and Window scaling.

In Linux, you can control the size of the receive window by configuring the buffer size in the kernel parameters:

Shell> Sysctl-a | grep mem
net.ipv4.tcp_rmem = <MIN> <DEFAULT> <MAX>

If we set a large enough buffer for transport performance, will memory burst when a large number of requests arrive at the same time? Usually not, because Linux itself has a mechanism for buffering the size automatically, the actual size of the window will automatically float between the minimum and maximum, in order to find a balance between performance and resources.

You can confirm the state of the buffer size auto tuning mechanism (0: OFF, 1: Open) by:

Shell> Sysctl-a | grep tcp_moderate_rcvbuf

If the buffer size auto tuning mechanism is off, set the buffered default value to BDP, and if the buffer size auto tuning mechanism is turned on, set the maximum buffering value to the BDP.

In fact, there is a further detail here: In addition to preserving the transmitted data itself, the buffer retains some space to hold the information related to the TCP connection itself, in other words, not all of the space will be used to save the data, and the corresponding additional costs are calculated as follows:

The code is as follows	Copy Code
Buffer/2^tcp_adv_win_scale

Depending on the Linux kernel version, the value of the Net.ipv4.tcp_adv_win_scale may be 1 or 2, and if 1, the One-second buffer is used for extra overhead, and 2 for the extra overhead. According to this logic, the concrete calculation method of buffering the final reasonable value is as follows:

The code is as follows	Copy Code
BDP/(1–1/2^tcp_adv_win_scale)

Also, to remind you of the latency test method, the delay in the BDP refers to the RTT, which is usually easy to get with the ping command, but if the ICMP is blocked, the ping is useless, and you can try Synack at this point.

How to adjust "cwnd" to a reasonable value

Generally speaking, the initial value of the "cwnd" depends on the size of the MSS, as follows:

The code is as follows	Copy Code
MIN (4 * MSS, MAX (2 * MSS, 4380))

The Ethernet standard MSS size is usually 1460, so the initial value of the "cwnd" is 3MSS.

When we browse the video or download the software, "cwnd" the initial value of the impact is not obvious, this is because the amount of data transmission is relatively large, long time, in contrast, even if the slow start phase "cwnd" initial value is small, it will be in a relatively short period of time to accelerate to full window, basically negligible.

However, when we browse the Web, the situation is different, because the amount of data transferred is small, time is relatively short, in contrast, if the slow start phase "cwnd" Initial value is relatively small, then it is likely to accelerate to the full window, the communication is over. This is like bolt in the hundred meters race, if the starting slow, even if his acceleration, may not get good results, because before he fully run, the finish line has arrived.

For example: Suppose the page 20kb,mss size 1460B, so the entire page is 15MSS.

Let's take a look at what happens when the "cwnd" initial value is smaller (equals 4MSS):

Small Window

Look again at what happens when the "cwnd" initial value is larger (greater than 15MSS):

Big Window

Obviously, the removal of TCP handshake and server processing, the original need for three RTT to complete the data transfer, when we increase the initial value of "cwnd", only one RTT is completed, the efficiency is very high.

Recommendation: Montana Mnot wrote a tool called HTRACR that can be used to test the impact.

Since the increase in "cwnd" initial value so good, then in the end how much should be set for good? Google has done a lot of research in this area, weighing efficiency and stability, and the final recommendation is 10MSS. If your Linux version is not too old, then you can adjust the "cwnd" initial value by using the following methods:

Shell> IP Route | While read P; Do
           IP route change $p initcwnd;
       Done

What needs to be reminded is that a one-sided elevation of the "cwnd" size of the sender is not necessarily valid, because we said that the actual amount of data transmitted in the network depends on the small value in the "rwnd" and "cwnd", so once the receiver's "rwnd" is small, it will hinder the "cwnd" 's play.

Recommendation: Related detailed description information please refer to: Tuning Initcwnd for optimum performance.

Sometimes we might want to check the "cwnd" initializer settings for the target server, and at this point you can count the packets:

Test Initcwnd

Through the handshake phase to confirm the RTT is 168, the start of the transmission after the first packet of time is 409, plus the RTT is 577, from 409 to 577 there are two packets, so the "cwnd" initial value is 2MSS.

What needs to be added is that a simple package may not be accurate, because the NIC may do something to the package, please refer to the information: segmentation and Checksum offloading:turning off with Ethtool.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Discussion on the optimization method of TCP/IP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Discussion on the optimization method of TCP/IP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support