How Google's BBR congestion control algorithm fights packet loss

Source: Internet
Author: User
Tags ack switches

I don't know how to say it. In short, the boat, from the mouth, I can not see HUANGFA and impoverished! I'm not going to say anything except cursing!
Prior to BBR, there are two kinds of congestion control algorithms, based on packet loss and delay-based, regardless of which is based on detection, in other words, packet loss based on packet loss as a means to find congestion, and based on time-delay algorithm is to increase the delay as a means to find congestion, They are wrong because their intentions are wrong:
Packet loss algorithm: In order to find congestion will have to create congestion, this TMD is too jiba irony, in order to detoxification, you must first TMD drug addiction! But there's no drug addiction, so what about rehab? I feel that TCP is so much more "attributed" to the initial paper on congestion control more than 30 years ago. In that era, unlike our current network, almost a few queues, because the memory is more expensive, so the routers and switches are almost not too deep queue, or even a very shallow queue, in that case, the packet drops do indicate the congestion signal, but later as the device queue more and more deep (Moore's Law dictates), Before the packet is dropped, a TCP connection must constantly fill the very deep queue, what happens when the deep queue is filled? It's congestion!
I have to repeat the difference between time-scalable cache and time-wall cache, for the former, time can consume any packet, but for the latter, the existence of time wall will inevitably occur congestion, if you do not understand this basic difference, will design the wrong congestion control scheme. Unfortunately, TCP has not differentiated between these two types of caches for 30 years! The same size, the two performance is completely different!
I'm back at the time. The extensibility cache is called the first class cache, and the time wall cache is called the second class cache. The algorithm finds congestion only when the second type of cache is filled, and congestion is already beginning to ease! Lag!
Time Delay algorithm:

Ok! I admit that the delay algorithm proactively avoids data buildup for the second type of cache (if you don't realize that, stop reading.) ), however, due to the existence of the packet loss algorithm, the time-delay algorithm is always in the suppressed state. Want to know the problem of delay algorithm, please yourself Baidu, do not need Google.

No matter what kind of two kinds of methods, are stupid chicken excrement practice! Congestion judgment should focus on "congestion (queue buildup) starts to occur" instead of "Buffer full"!!!


So, these algorithms are all wrong! So BBR must be right?
In recent days, starting from September 16, BBR was hyped, as if it is God, but the fact will prove that all Jiba, TCP congestion control field itself is a non-solution field, now with bbr tear cubic seems relatively handy, in the end to a RBB can be close with BBR , the bandwidth is still so much, the hardware resources are placed in that, we should be fair to share this is the king, let a single big to seize the bandwidth of others do not tell others, this is a silly force. So I say that the TCP acceleration is a scandal.

BBR is not God, not even man, but it ...
BBR distinguishes between two types of caches and swears that the second type of caching is no longer the basis of the BDP, but BBR has the principle that it will persist in MAX-BW and Min-rtt, and allow other algorithms to grab the second class of caches, cao! The gentlemen's agreement of TCP lets those other algorithms find that the second type of cache fills up, slowing down to the point where the first type of cache is not enough to fill, and Bbr says, "This is what you gave me."
To summarize, BBR does not preempt the meaningless second-class caches, which are not only meaningless-they do not increase the rate, and when they are filled, they pay a price and spin down! BBR as long as they belong to their own. BBR will take full advantage of the first type of cache!

So, I think BBR is not based on the packet loss algorithm, is not based on the delay algorithm, but a feedback-based algorithm! Since there is no guessing, there is no speculation that BBR is based only on the present, regardless of history (Win_minmax shows that it only cares about the history in the time window!). )!

The BBR algorithm avoids populating the second type of cache, so it is designed to avoid congestion-the real cong_avoid!.

Avoid congestion, and in Linux it's a tradition to avoid losing packets! The convergence point of the BBR is on the left side of the second type of cache, so it does not drop packets due to congestion, but drops are divided into three categories:
1. Noise packet Loss
2. Congestion Passive packet loss
3. Traffic supervision Active packet loss

BBR has been able to deal with 1 and 2 (for 1, through the time window can be filtered, for 2, the algorithm itself, the feedback convergence characteristics of the decision), then for 3,bbr what is the killer? This is BBR's long-term rate feature.
Let me show you a note about BBR long-term:
token-bucket Traffic policers is common (see "a internet-wide analysis of traffic policing", Sigcomm 2016). BBR detects Token-bucket Policers and explicitly
Models their policed, to reduce unnecessary losses. We estimate that we ' re policed if we see 2 consecutive sampling intervals with consistent throughput
and high packet loss. If we think we ' re being policed, set LT_BW to the "long-term" average delivery rate from those 2 intervals.


As I said before, BBR will calculate the rate of this send and CWnd based on the last transmit bandwidth of the instantaneous measurement, as if BBR never experienced a packet loss! But it's not true! No algorithm can ignore the 3rd type of packet loss. No congestion, no noise, but it's a lost bag, why? Because the router has the power to determine the loss of any packet. Almost all router switches will have a token bucket! The TCP stream is slag in front of the router! Although it is slag residue, BBR can still find the router of this packet loss behavior.

BBR in the acquisition of real-time bandwidth, but also silently observe the packet loss rate.
If BBR found two consecutive delivered cycles (similar to RTT, but in the case of congestion or under supervision), the TCP connection satisfies two points, one is the throughput rate is constant, and the other is holding the high packet loss rate, what does that mean? Indicates that the connection was made by the intermediate device speed limit or traffic shaping ... Besides, you can think of something else, and if you think about it, you can make an algorithm yourself.

I have been prompted, Wenzhou leather shoes bosses, get up! I'm capable, but I'm not going to do it because I despise the blind philosophy!

Where is the code for this part of TCP? Patch up the latest BBR patch, and you'll be able to see what's going on.
It is 12 o'clock noon, the wife went out to work, mother in the kitchen cooking, and I am here to write these messy things! Cao! I'm playing with fire! I'd like to say a few more words, but I have to go to the code:
static void bbr_lt_bw_sampling (struct sock *sk, const struct rate_sample *rs)
{
...
}

...
I still do not want to analyze the source code. I'm just talking about logic here.

How does the TCP sender detect that its connected traffic is regulated by the traffic on the speed-limiting device? I still take the classic VJ congestion model diagram as an example:




As shown, although the sender is increasing the amount of data sent or other connections increase the amount of transmission, but for location A, the rate is certain, for location B, either it is empty, or it is full, this is not detected by TCP! The only thing TCP can do is to make a choice of sending rate according to the ACK of the feedback!

This kind of BBR can avoid the 2nd class drops, and can identify the 1th class drops, but for the 3rd class drops, for the BBR, for the time being, the 3rd Class packet loss processing is the last thing to do.
BBR handling the third type of packet loss is very simple, just record the packet loss itself. When the following occurs, BBR believes that there is traffic regulation or speed limit on the network path:
1. Maintain high packet loss rate for a period of time;
2. The current bandwidth measured over a period of time is almost identical.

When BBR detects this situation, it no longer uses the current measured bandwidth as the benchmark for calculating pacing rate and CWnd, but rather the average measurement bandwidth for this time period!
This test is at the very beginning!
Read the code of the Bbr_lt_bw_sampling function in detail and spend the last few minutes, and you'll know everything. BBR this is tantamount to simply identifying the presence of traffic-monitoring devices and adapting it to its rate! Note that this is a simple algorithm, not something that can make a bunch of people rip the crap out of it!
With this long-term algorithm, BBR can combat the loss of packets caused by the policies of the regulatory device. In this sense, the BBR can detect almost all of the lost packets:
1). If you receive a duplicate ACK (repeat ack,sacked number, sacked highest value ...), although the TCP core believes that a packet loss has occurred, but will not enter PRR,BBR will be dismissive, continue their strategy, see BBR Engine manual;
2). If there is a real congestion, BBR will find this to be true after the minimum RTT period, although it is lagging, but it is much more likely than cubic to find a second type of cache being filled. BBR will not slow down blindly, but still based on the detected MAX-BW, unless MAX-BW is already very small!
3). In the event of a regulatory packet loss, the BBR will be detected over a longer period of time, and it is found that in this cycle a relatively high packet loss rate (detected lost counter is large) and the rate is consistent, the BBR will limit the amount of transmission to the average of actual bandwidth.
...

I've always sniffed at TCP, just as I sniffed at the SSL/TLS protocol, working, I've been dealing with ssl,tcp for seven years, and sometimes I've been doing a good job, in private, moaning and cursing in the curse, hoping to end this scandal as soon as possible. Every time I think about this, I even have the impulse to fall the computer, but many people will have doubts, I am so disgusting these, why analyze them, why do I do so?
My help has begun!

Do you have anything more interesting? Last night I looked at the TLS offload to the network card a speech, an instant have interest, but I can not go to do what hardware, in the late afternoon and fortunately found a KTLS plan, I particularly want to finish this today, but forget, because I have no time! (Did I ever vomit a trough?) Yes, in fact, others are also spit groove, but they have changed the world, to achieve the amount of ktls, and I, is a fool! Even if I was a jerk, I also transplanted OpenVPN to the kernel, what about you?! )

TCP itself is slag! This slag defiled the great IP network, destroying the simplicity of the IP network, like a bowl of ramen chicken poo! I hope, I hope, moaning, moaning and beating all the bosses of the company associated with TCP, or, just cursing!

How Google's BBR congestion control algorithm fights packet loss

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.