TCP Core concepts-slow start, ssthresh, congestion avoidance, the true meaning of fairness

Last Update:2016-05-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This paper mainly describes the origin of Ssthresh in TCP congestion control and why congestion avoids detection of packet loss, Ssthresh will be set to half of the current window.
You have to spit the groove again before you enter the confirmation content! At present, in the online search, any information on the view, or even the RfC, do not understand what is Ssthresh, its value is what attention, almost all the information is said, if the window is larger than Ssthresh, then the implementation of linear windowing congestion avoidance phase, otherwise slow start ... This makes almost everyone remember this conclusion, and after a long period of washing, many people do not know why the fact but the performance of the disapproval, in fact, including myself. So when I understand what Ssthresh is all about, when I know the relationship between the 1/2 coefficients and the fairness of Ssthresh after the drop, I can't wait to share these things!

TCP data segments to fill the network between the end of the assumption end system between A and B in TCP communication, so long as there is a space distance between A and B, due to the speed of transmission delay, it must mean that A and b there is a certain capacity, can accommodate a number of data segments, you can think of it is the general cache, in addition, In order to be more universal, we think that the queue caches of all routers, switches, etc. between A and B are also included in the network cache of A and B. As shown in the following:

In order to achieve the highest network utilization, we want the cache between A and B (including the node queue and the network itself) to be fully filled with TCP data segments and are maintained continuously.

The TCP data segment continues to flow without gaps as shown, when the network between A and B is filled, there is a total of n data segments between A and a, and the sender can continue to send data? In fact, because after the network is filled, the sender sends a data segment at the same time, the receiving end also consumes a data segment, and sends an ACK, until the ack of the segment that fills the network of A/b reaches the sending end, according to the assumption that the rate of ACK is consistent with the data segment rate. We calculate, the sender can continue to send a total of 2*n data segments, this 2*n data segment send the start point is the 1th data segment sent time, the end point is the first Data segment ACK back to the sending side of the time, exactly a RTT, set the transmit rate r, then the following equation is obvious:
2*n = R*rtt
The process is as follows:

We can also see that the first n segments are in order to fill the network between a, B, and the last n segments are in the case of "A, B is already full load" TCP ACK Clock driver pacing. Following this 2*n data segment is a new cycle, is also a 2*n data segment in the RTT, this is the ideal situation, the data segment is filled with the network, uninterrupted continuously from the sending side, the ACK is continuously returned from the receiving end.

Two zones (safe & dangerous) I really do not want to expose the mystery now, but I do not want to suspense, after all, it is not too early.
Please note that in the T28 at this point in time, before T28, A and B are not always filled, but after t28, they are always full. This means that the data segment before T28 can be buffered, that is, there is still some free space in the network to buffer the data, so that the data is not discarded, but after T28, the network is full, we see that there is no empty space available for packet buffering, which means that once congestion occurs, Data packets must be lost!
It is clear that T28 was safe before, and T28 is dangerous, and this is where safe area and dangerous area come from! What are the divisions? It is the capacity of the network! In the illustration above, it is 4! When the data segment of flight is less than 4, it means that it can be transmitted aggressively, and when flight data crosses 4, it must be conserved!
What is radical? What is conservative? Radical is the fastest speed of the window can be increased to safe and dangerous boundary, because TCP is driven by ACK, as long as the receipt of an ACK, it means that the channel is unblocked, the window can be incremented by an MSS, and so-called conservative is, You must wait for the current window's data to be all ACK, indicating that the data you just sent is reachable to the end, in order to add an MSS to the window. Now let's summarize the following two areas of the windowing method:
safe area:

Dangerous area:

We have to compare the characteristics of the window in these two areas, if the goal of the security area is to fill the network, then in the security area we know that the network is not filled, at this time as long as there is an ACK to the window can be added, until the network is filled, and now we look at the danger zone, at this time we know that the , we want to keep it full, to keep it up, to improve the utilization of the network, we know that the TCP send window (that is, how much data can be sent) is ACK-driven, and the ACK is like a clock, we want the data to be sent continuously to keep the network full, it is necessary to ensure that the ACK is constantly coming, So the goal of continuing to send data at this stage is simply to eliminate the blank period of the ACK clock, or to call Lockout period, because only then, a steady stream of ACK can drive the data to be sent continuously.
Look back to see the window process diagram T20 this point in time, the network is 4,5,6,7 the four packets are filled, but in the next moment T21, with 4 received, because there is no arrival of the ACK, the network was emptied 1 MSS, theoretically, we know that the final window is definitely 2*n, in this case , N is 4, in the example, the final window is increased to 8, however, in reality, congestion can occur at any time, that is, in the window from 4 to 8 of the period, will occur at any time drop packets, which means that the window may never increase to 8! So how do we know that the current window is reasonable and then try to continue adding the window? The answer is that the data in the previous window is all confirmed! This is the origin of congestion avoidance, a process that is slow, not a design problem, but that it must be so slow. Congestion avoidance this name is very good, indeed in the avoidance!
Understanding the above description, we can give the concept of the TCP pipeline, and then all the truth is white.
The concept of the TCP pipeline in fact, the TCP pipeline contains two parts, according to the formula 2*n=r*rtt,2*n is the capacity of the pipeline, the first part of the network is filled before the capacity, as long as the knowledge has not been filled, fill, use slow boot enough, the second part of the network is filled after the capacity, Driven entirely by ACK, using congestion avoidance to detect windows, ideally, in theory, the capacity of the two parts is equal. Therefore, we can know the truth of the matter.
What is the problem 1:n?
Answer: N is ssthresh!
Question 2: Why does the Ssthresh drop to half the current window after detecting congestion?
A: The detection of congestion indicates that the capacity of the pipeline is the current window C, and c=2*n, so n= *c!
Question 3: Why does the congestion avoidance phase increase? That AI
Answer: Only when the data of a window is confirmed, can ensure that the previous window is valid, after all, the network is full, and the ACK may be asymmetric return, congestion can occur at any time.
Question 4: Why multiply sex minus?
Answer: See question 2.
Question 5: Why can I refer to a number of windows in the slow start phase?
A: Because you can ensure that the window is less than n, that is, the network is not full, even if congestion occurs, there is also a surplus cache available.
Question 6: Is there a problem?
A: If everything is as good as it is, of course there is no problem, the problem is that Ssthresh has never been accurately estimated.

Back to the reality of the tcptcp in the design of the vision is very good, but the real world is not a friendly world, however, TCP itself is self-adaptive, it does not specify the size of the value of Ssthresh, and even no recommendation, it is entirely dependent on TCP self-discovery packet loss or congestion, With the current window half as the current value of the Ssthresh, as the connection continues, Ssthresh also dynamically adjust, because no matter how cruel the reality, the ideal feedback system is always a goal of convergence, that is, the actual bandwidth tends to ssthresh twice times the size of the Surprisingly, Ssthresh is calculated by relying on congestion avoidance algorithms. Of course, with the development of TCP, this C=2*n=r*rttThe classic formula has undergone a variety of variations, such as cubic no longer calculates the pipe capacity with a factor of 2 as the Ssthresh.

Hystatr optimizations for slow start we know that the Ssthresh setting is to drop packets as feedback signals, and now the question is, when the connection is just established, how to set the Ssthresh when there is no packet loss as the feedback signal?
In general, the default implementation is to set it to a huge value, then the fastest speed after a drop, and then set Ssthresh to drop the packet half of the window, and then like the Ssthresh twice times the slow approximation. This can be problematic, however, because there is no ssthresh as a threshold limit, and it is too expensive to drop packets. Therefore, if the Ssthresh value can be detected during the slow boot process, you can exit the slow boot state at any time. So how do we detect it?
Still is C=2*n=r*rttThis formula, the key to see how we use it. Since we are only probing the situation when the network is stuffed, that is, the value of N, therefore:
n=r* (RTT/2)
Let's see what R is. The so-called rate is actually a certain amount of transactions divided by the time to do these transactions, if we send out to n ' packets, altogether used time period t, then:
r=n '/T
After substituting, get:
n= (N '/T) * (RTT/2)
Ideally, after all the network capacity, N=n ', then can be very simple to get t equals RTT/2 time, the description reached the Ssthresh, the exit slow start!
So how do you implement it? Since we cannot detect the N data segments individually to reach the receiving end and time, we can use the equivalent ACK to calculate in a disguised way, starting with the first data segment of a window as the Tstart, and updating the following values each time an ACK is received:
rttmin: The smallest RTT within the sampling period to maximize the ideal round-trip delay between A and B.
Tcurr: Current Time
You can exit slow start if the following conditions are true:
Tcurr-tstart >= RTTMIN/2
Very easy to understand. However, the reality is not ideal, in most cases, the above algorithm does not bring a better effect, why? Because the entire bandwidth is not exclusive to a TCP connection, but all TCP connections around the world even include UDP shares, so the above formula basically cannot represent any real situation, so in practice, it is more inclined to use RTT to estimate that the network has been stuffed. Using RTT to estimate network capacity Ssthresh is more practical because it takes into account the queueing delay in congestion, so under this method, the conditions for exiting the slow start are:
Tcurr_rtt > rttmin + fixed_value

The above is designed to solve the first slow start in the absence of Ssthresh value of the way to predict Ssthresh, in fact, at any time thereafter, as long as the slow start, you can use the above algorithm to predict the current Ssthresh, Instead of having to use the congestion algorithm to give a ssthresh or just a 1/2 drop window (though you've seen how reasonable this 1/2 is!) )

The fast crossing problem of Ssthresh we know that, when the slow start, the window speed is very fast (basically is based on the ACK feedback, the data segment doubled out of the dodo), then the window increases to close to still less than Ssthresh, will appear as shown in the case:

However, this is not easy to happen in the implementation, what limits the occurrence of it? The following points:
1.TCP Delay confirmation mechanism can only delay 2 MSS slow start window, received an ACK increment 1 MSS, even when using ABC, that is, the window can only exceed Ssthresh 2 MSS, which is guaranteed by the following code:

if (Sysctl_tcp_abc > 1 && tp->bytes_acked >= 2*tp->mss_cache) CNT <<= 1;

2. Even if there is a large loss of ACK, the default implementation of TCP is also the number of ACK, not the number of bytes that are ACK 3. When there is a large number of ACK loss and ABC enabled, see method 1.4. Two-segment processing mode The 4.x kernel of Linux uses the number of ACK bytes to count the window value by default ( ABC scheme), when traversing Ssthresh, the TCP congestion control logic divides the number of bytes that are ack into two parts, Ssthresh the following part to count the slow start, and the portion of Ssthresh above is used to count congestion avoidance. In summary, summed up the situation of Ssthresh crossing:

Fairness convergence of AIMD for simplicity, let's assume that there are TCP1 and TCP2 two connections that share a link and now see how they "converge to fairness", and the following illustration shows everything clearly:

If you don't understand this diagram, please Google yourself. We can be sure that at the bottom of the fair line, the slope of the red window-reducing line is constant less than the fair line (oblique 45 degrees angle) of the slope, two links each window, the slope of its descending window will be more and more close to the slope of the fair line, that is, convergence to the fair, eventually, it will be in the green Be fair (though utilization is not so high!) )。

We can also see that TCP1 and TCP2 are proportional to the window, in this case the ratio is 0.5, they have to be 0.5? Not too! As long as the equal proportion, the figure of note 1 will always be set up, the final convergence will always be set up, the difference is only the final convergence amount of green Line length and range! While it is reasonable to maintain a 0.5 per cent reduction in the percentage of windows that are based on the initial Reno TCP (see the above inference), the ratio is no longer 0.5, given the complexity of the reality.
Now let's see what happens if the TCP1 and TCP2 have different ratios of window drops. Assuming that the TCP2 is still in the 0.5 ratio, and the TCP2 is less than 0.5, it will look like this:

We can see that the lowest window ratio among competitors will eventually preempt almost all of the bandwidth, and it will gradually run the bandwidth of all other connections to the upper left corner, eventually zeroing in. So, if you want to make your TCP aggressive, reduce the window ratio is not it? It's not so easy! You know, my two pictures above have a common premise, that is, the competitor's RTT is equal! But in reality, will this be the case?? Very difficult! If the RTT is equal, for example, their source and target are all in the same place, then they are in all likelihood a partnership, not a competition! Explosion!

Then, RTT will be a very important role! Indeed, the actual TCP in operation, the RTT fluctuation is very large, which almost will my above the discussion completely overturned, it is obviously heartbreaking! However, the above analysis as a theoretical model is still meaningful, it will at least let you understand the nature of TCP behavior. As for the actual situation, the RTT fluctuation is a meaningful signal that allows the end system to see the queueing behavior of the intermediate router switch, so the so-called "noise" of the RTT will occur, and many people want to get rid of them, smoothing them off, but it also means that you are blocking important signals.
The fluctuation of the RTT is very dynamic and sexy, it uses the numerical characterization of the whole queueing theory, or you can launch the Markov arrival process, or you just think they are sad noises ... As a result, the real TCP almost completely improved the guidance of Reno, in addition to Reno almost no congestion algorithm in the discovery of packet drops Ssthresh to the current window half. This is the evolution of TCP, but this evolution always revolves around a kernel, which is what I said, simple, understandable, yet surprisingly.

TCP Core concepts-slow start, ssthresh, congestion avoidance, the true meaning of fairness

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

TCP Core concepts-slow start, ssthresh, congestion avoidance, the true meaning of fairness

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support