TCP Congestion Control algorithm discussion ON-BBR vs Reno/cubic

Last Update:2017-03-17 Source: Internet

Author: User

Tags ack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Whether in the work, or in peacetime with the technical friends to discuss the issue of bbr, are inevitably faced with the "bbr and cubic between the contrast" problem, after all, Cubic in the Linux platform is the default congestion algorithm, has been for many Years.
Now suddenly have a propaganda marvellous bbr algorithm, if really so good, that certainly is better than cubic, if really so, in the end where is good? If not, then what is the problem?
I think about BBR where the problem is not to answer, because if the efficiency and fairness in the second kill cubic, that the basic result is to replace the bbr, in addition to the theoretical workers no one will ask Why. therefore, This article only discusses the second kind of Question.
Dynamic process Analysis of Brr&cubic i'll show you the dynamic process of a cubic flow and bbr flow:
1.CUBIC and BBR coexist in shared links;
2.CUBIC will gradually increase the number of packets, fill the queue (in fact, if the BBR exactly at the critical point that is the maximum bandwidth sent, cubic even send a packet will be queued. ), which causes BBR to queue passively, Bbr's performance is:
1> due to the passive queueing, the RTT increases;
2> due to the cubic shared bandwidth, the detected bandwidth is reduced, which in fact is affected by 1>.
3. however, This will not affect the BBR package for a short period of time, because the BBR measurement bandwidth is the maximum bandwidth in a window (default 10 rtt);
4. That is, as long as this bandwidth window and the Minimum RTT window (the default 10 Seconds) enough to stick to the queue to fill without slipping away, cubic once the packet will be reduced to the window, BBR take the opportunity to return to a clean state, but cubic will repeat the previous process, I think the whole process is not cubic by the bbr, this is a normal behavior, if the cubic packet loss bbr bandwidth window, RTT window (according to the default of 10 RTT and 10s) has not slipped away, can only say that the BBR is not suppressed by cubic;
5. On the contrary, CUBIC sometimes suppress bbr, but it (CUBIC) gains, but it also has to pay a price;
6. Throughout the process, although the BBR will not step back (this is guaranteed by the value of the window), but at least not to take the initiative to queue, if not bbr and cubic competition, replaced by another cubic, is not worse?
7. In another case, if the queue is deep enough and the RTT is long enough to allow the cubic to fill the 10s, the cubic drops BBR to 4 Windows (probertt) before the packet is dropped, which instead says cubic suppressed bbr;
8. Conclusion is that, when the shallow queue, cubic by the BBR influence is relatively small, deep queue, BBR will be affected by cubic, at this time bbr a bit like ledbat;
9. But wait, everything is still affected by the AQM strategy!
10. There is no conclusion, no doubt, but in general, the BBR outright.
...
Maybe my analysis is a little bit around, or more rubbish, then please look at the answer of the BBR author:
Https://groups.google.com/forum/#!topic/bbr-dev/mvH4hGmc8dU
Basically all the same, bbr relatively simple and good understanding, so there is no benevolent see this problem, so the answer is almost the same, I also give a friend back to the above dynamic analysis only to see this forum topic, is in the Off-duty bus Wenzhou boss to me the Link. Usually feel bbr more tricky, that is almost all Google powder, a large part of the Google Brain residue, in short, is that Google is what things are good, Google ambition, to destroy their own awe, in fact, these brain remnants of powder are only broken, like Ark son, Every day yell this also is not right that also is not right, oneself but cannot say the slightest right, at least is the suggestion also not, this kind of negative energy I am naturally disgusted.
true, Google is a great company, technology is very good, but this does not mean that as long as Google's things are very threshold very good, at least I know the technology daniel, there are three or four in the BBR before the very same algorithm, the earlier is in 2005, I do not know how he saw bbr, if it is me, I dare say when see bbr, absolutely did not fix the network card lighting so Excited.
Back to reality, continue to pull the technology.
Can I ignore the Probertt phase? Wenzhou boss Again. Wenzhou boss don't like Probertt state of the window suddenly dropped to 4 mss, so with my previous practice, the boss ignored the Probertt state ...
It's like a wonderful "acceleration optimization". But what's the truth? I think Wenzhou boss is also according to the idea of the plan, in a word, in the mind there is prejudice, down the window anyway is bad thing, add window anyway is good. This prejudice affects almost all Chinese people, saying this is not that I am bad for the chinese, but because I know less foreigners, insufficient samples, not enough to draw the conclusion of Foreigners.
well, just to say, Probertt's goal is to re-detect the smallest Rtt. Under what circumstances do we need to do this? Very simply, in the case where the current RTT is Unreliable. 10 seconds, The RTT is increasing, how many people in the queue ... BBR can guarantee that they do not line up! If you don't line up, you will not punish yourself, and then others will have nothing to do with Themselves. This is fundamental! Under this fundamental, BBR outstanding through the Probertt to settle the Fairness. so, How do you make sure you don't queue? The answer given by BBR is to keep the 4 MSS window, so why?
Why is 4 MSS if you want to detect the minimum rtt, then the "must" can not queue, at least must not be active queue, then send to a MSS data section is a reasonable action, if the network even a paragraph can not bear the line, it is obviously against the principle of statistical reuse. so, is it really just a paragraph to be sent? TCP segment sent to confirm the need to go through a round-trip path, taking into account the TCP one-way ACK mechanism, in one data segment in transit, while the other has been the acknowledgment of the received segment on the return path also satisfies the measurement of the RTT "only one segment on the path" hypothesis, so said, Filling the entire round-trip path pipeline requires 2 Segments.
If the TCP transmission is completely in accordance with the principle of immediate response, things will be very simple, but TCP has a delay response mechanism, and now has the LRO mechanism ... All this is caused by the contradiction between End-to-end control and network control! This contradiction is actually the contradiction between flow control and congestion Control. End-to-end is a personal matter, two-party negotiation can, and the network is a shared platform, to multiplayer Game. For example, you always want to have an exclusive path with the two of you who interact with you, but another pair of interactions won't allow you two to do so ... This is the basic problem of the network, which is also the programmer and network designers to consider the issue of different starting point, so that you let programmers to design a network protocol, in the end must be very selfish and rubbish.
Why is keeping 4 mss enough and just enough? Considering the worst case scenario, the TCP data receiver has a delay ACK enabled, which means that each 2 MSS is received with an ACK that confirms two or more TCP segments at a time (considering that the ACK may also be lost, so use "above" to describe), if you want to keep the ACK clock running smoothly , the unacknowledged data segments are bound to fill the entire Bandwidth-delay product PIPELINE. In the forward path and the reverse path (that is, The ACK Path) to maintain enough data segments, in fact, is 2 MSS size data Segment.
anyway, These data are inflight data, what is infilght?
It is sent to the unreachable, plus confirmed but not yet Confirmed. This is Inflight. How can I keep the ACK clock unblocked if I don't queue up? obviously, inflight more than 4 MSS segment data can, But now add a request, self-actively not queued, which means that they can only send 4 mss! As for the other, regardless of the current connection, forensics can Be.
Suppose everyone opens the delay ACK, according to my explanation, 2 segments on the road, 2 segments confirmed on the road, a total of 4 segments, This is the smallest inflight! Although a little bit the same, the Wenzhou boss explanation is, A will send the paragraph in the sending side, two pieces of data has been confirmed by the receiving side, the other confirmed data section is still on the road, Wenzhou shoes rain water will not be fat.
...
Pacing rate and Congestion window there is a debate on whether to use the window congestion control or speed congestion Control.
The scheme is based on the window congestion control, the idea behind the network can be hosted on the total amount of data to control congestion, but the idea is a bit unreasonable, that is, the network pipeline is a bandwidth delay product composition of the two-dimensional pipeline, with time ductility, the total amount of network load should consider the time factor, The amount of data should be evenly distributed throughout the pipeline, not just a single value. So congestion control should be based on the rate, not the WINDOW. This consideration is reasonable, but the window control and rate control is not two yuan so I don't have him, they are in fact tied Together. window-based control actually gives rate control to the Network.
Although the data is sent in bursts, however, if the network has a speed control device, then the data arrives at the receiving end is not sudden, but the smooth arrival, and then the ACK stream will be based on a certain rate of smooth Return. We know that the sending of TCP data is driven by an ACK clock stream, and that a flat interval ACK arrival event will drive a smooth, time-lapse data transmission, and the conclusion is that even with the window control method for data bursts, the final sending behavior may be modulated by the ACK stream to a certain rate ...
however, The above premise is that the window must be accurate! In fact, the window almost never accurate, all TCP congestion control algorithm can not get accurate window values, It is inevitable to be based on the AIMD model to constantly detect, this model brings problems I will not say, bufferbloat, is AIMD with the program is AIMD brought Problems. The solution is to use the measurement and detection model instead of the AIMD detection model to solve the bufferbloat problem, which is followed by the use of rate control instead of window Control.
Measurement model to solve the idea of bufferbloat is very simple, if I know that I can only follow the 10mb/s rate of transmission, I will not improve to 100mb/s try to send more, because more hair also useless.
however, in recent years, more and more congestion control algorithms adopt rate control, that is, from the source of the data to alleviate the burst, let the data 1.1 points to send out, possibilities, and thus solve the bufferbloat Problem. BBR is a relatively late algorithm using the measurement model, in fact, early in the Linux 3.10 era, TCP has already exported pacing rate this variable, but honestly, BBR before the pacing Rate has little to do with reducing performance in most cases except for bufferbloat, because its pacing rate is simply calculated by dividing the window by the rtt, which is almost an identity because the network pipeline itself is so defined, That is, the pipe capacity equals the product of the bandwidth and Rtt. If the window is a conclusion from the AIMD model, then pacing rate is not correct.
The new generation of BBR algorithm uses a relatively accurate measurement to get the pacing rate value, in order to smooth the burr, it saved a tolerance window (default 10 Rtt) within the pacing rate to take the maximum Value. It seems to be completely out of the box, but this time the window jumps out Again.
now that you have a precise and reasonable pacing rate, Why do you need a window?
If the BBR has been sent at a constant speed and the network allows it to always be sent at a constant speed, then the window is indeed unnecessary, yet BBR cannot be transmitted at constant Speed. Network pipeline is a statistical multiplexing, for a single TCP connection, its available bandwidth is always changing, so bbr have to try to detect more bandwidth, while the discovery of congestion, the active Spin-down convergence. In the Probemore phase, BBR will try to increase the transmission rate of 25%, if the feedback rate collected after a RTT has indeed increased by 25%, indicating that the probe is hit, there is a new free bandwidth, if the collected bandwidth does not change, then there is no new free bandwidth to free up. This is the principle of probemore.
The question now is what should be the total amount of data sent in the Probemore phase?
obviously, If there is no free bandwidth, the number of data will certainly be lost, if there is free bandwidth, but also to wait for feedback back to confirm that there is really free bandwidth to increase the amount of data sent, it must be ensured that the amount of inflight sent during Probemore can not increase. This requires a congestion window to control it. Under the premise of fully understanding bbr, we set the maximum bandwidth in the current window to W0, the minimum RTT is RTT0, then the following equation is Determined:
inflight0 = w0*rtt0
To probemore,bbr attempt to increase the transmit rate by 25%, if the probe succeeds with 25% of the free bandwidth, then the inflight required to fill the pipeline is:
inflight1 = (125%) *w0*rtt0
however, before the pending, in order to avoid packet loss, can not increase the amount of inflight, so that is still to maintain the inflight0, the value is the congestion window during probemore. If a new rate W1 is collected after a rtt, then:
W1 = 125%*w0
At this point W1 will replace the W0 to become the new maximum bandwidth, while the RTT is unchanged and still not RTT0, which means that the inflight values the network can accommodate are:
w1*rtt0 = (125%) *w0*rtt0 = inflight1
...

According to the above analysis, BBR will use the product of the current maximum bandwidth and the minimum RTT as the current congestion window (the effect of delay ACK is not considered here). The BBR does not increase the total amount of inflight until it is confirmed that the acceleration is actually effective.

Summing up is the BBR Probemore is divided into two stages of continuous circulation:

the First phase only increases the transmit rate and maintains the inflight
Phase II confirms acceleration of the first phase and increases inflight
This principle also illustrates the need to "measure the instantaneous rate", and BBR must ensure that an immediate rate is calculated for each ack, so that it can be confirmed or denied that the Probemore speed-up measure before a RTT is effective and then decides whether to increase the congestion window.
This is the relationship between pacing rate and congestion window, which is completely different from cubic.
Although Cubic can also calculate a pacing rate based on the window value (according to the above reasoning, if the congestion window is calculated according to the cubic method, probemore will increase the congestion window by 25% in the beginning, and if the acceleration is no, it will lose 25% of the data), But the pacing rate obtained with BBR measurement has a completely different meaning. Cubic pacing rate and congestion window is an identity relationship, behind the definition of bdp, and BBR is a "two-stage probe" in the meaning of the Inside.
...
Actually also very good understanding ah, the rate is an instantaneous amount, and inflight is a rate of integration, as long as the pipeline is not full, the rate can be changed at any time, inflight is directly determine whether the pipeline is full, and Spilt.
At the end of this section, the text of a BBR forum (from: Https://groups.google.com/forum/#!topic/bbr-dev/KHvgqYIl1cE) is said to be better:

In a standard TCP, where CWnd is the primary control and pacing have been added on top, which would be True.
BBR is different, because it uses pacing as the primary Control. A CWnd control is left in as a safety net, for cases where the pacing rate temporarily exceeds the delivery Rate. This was normal in BBR during bandwidth probes, and also occurs when the delivery rate Reduces.

broadly, having a CWnd independent of Rate*rtt allows BBR to tolerate much higher packet loss rates than conventional tcps, Without running out of windows in which to perform loss recovery. however, it does also result in filling the buffer when path contention increases, until BBR detects the contention and ad Justs to it.

-Jonathan Morton
Back to Reno/cubic although this article and since October 2016 articles are in appreciation of BBR good, but back to find Reno is true Harmony Algorithm. Do not take Reno and BBR speed, and to compare adaptability to KNOW.
Keep in mind that TCP is not designed for any actual link, it is a universal protocol, its convergence model whether in the 56Kb bandwidth or 1000Mb bandwidth, regardless of the path deep queue, or the path shallow queue, should be exactly the Same. If you draw its window/time graph, you will find that all the scenes of the Reno algorithm are similar, scaled only, the back is a super simple aimd!
however, the harmony of the model does not mean that it is highly available, although it is consistent at 1000Mb and even 10000Mb when the sawtooth and 56Kb jagged shape, but people do not need this consistency, on the contrary, people want to be distorted graphics, in accordance with the timeline to distort the Window/time graph, So there are a lot of variants, such as Bic,cubic,scalable,high speed, and so on, in any case, it is Reno Behind.
Reno is not stagnant, still on the path of evolution ...
In September 2016, bbr Half-way to kill, seems to be overwhelming advantage of the Reno a family ...
If you believe in China's rise, I think there's something like that. China from Pre-qin to early 19th century, has been flat with the West (i am more objective, I do not like to say that ancient Chinese seconds to kill the west, and so on), at least not too far behind (in the Hiro period, the Pan-arab period and 15th century after the technology than the West lag), but in 19th century, the west to the overwhelming advantage of The speech that was made under the influence of progressive thought was the stagnation of the Chinese model, which was a wrong model that must be westernized and adopt everything from the west, including culture! later, Many scholars in our country believed this set, from the end of the Qing Dynasty to the cultural revolution, almost all of them are ashamed of the traditional culture of the times, the wind has been blowing to the present, affecting the countless people including myself.
however, we see that in recent years, the western set seems to have encountered some problems, which is precisely the Chinese traditional culture can be tolerated and resolved, it seems to herald a sense of return ... I'm not going to start talking here.
BBR Crush cubic, like the opium war, but the Reno still seems to Evolve.

TCP Congestion Control algorithm discussion ON-BBR vs Reno/cubic

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More