Google's BBR Congestion control algorithm model analysis

Source: Internet
Author: User

Before I enter the text of this article, I'll give you the background.
1. First of all, I have the impact of the hippocampus typhoon on Shenzhen is very accurate, have seen my friends circle know, have not seen also do not need to know, White earned a day "Home Office" is the benefit, but the Home Office is not high efficiency, the effect is not good.
2. Why I could have 3 blogs in the morning of Friday, on the one hand, in order to make up for the time wasted by typhoons causing "Home Office", on the other hand, I feel that sharing the latest things as quickly as possible is a spirit that conforms to the faith of devout Christians rather than moral restraint.
3. In the first half of the year, Wenzhou leather shoes factory owner recommended a meeting, roughly speaking, "so far the Linux process Scheduler implementation is wrong ...", which makes me feel that can write a science, but has not written, this time bbr reminds me of this thing.
As this article is about BBR, so about the scheduler's content, please read the "The LINUX scheduler:a decade of Wasted cores", on this aspect of science, I may never have the opportunity to write, but you can according to my recommended things to search their own, And, I remember that one day I was on the bus I had to recommend this to my colleague, do not know whether he was shared through other channels.
Previous articles on TCP BBR Congestion control algorithm, I put energy and focus on the periphery of BBR and implementation itself, however, both of these are not the core! Why do I say that? The hippocampus just left, I take the typhoon as an example. The periphery of the typhoon and the typhoon eye are not much influence of the zone, so I said, a technology, you just know its periphery-such as its background, usage, or just know its implementation-such as read/debug through its source code, or both, are not important, no influence, the core of technology is not here! So, after introducing the background and algorithmic details of BBR, I would like to briefly describe the model behind the idea, which is key.
Since it involves the bbr behind the thought, I put it as capital, no longer BBR, but BBR!
The content of this article comes from the explanation of the BBR algorithm's personal understanding. The pictures were changed from Yuchung Cheng and Neal Cardwell ppt, and I added my own understanding.
0. Model

The model is the most fundamental!

I hate to put everything together, I like conquer, so I like the orthogonal base! I want to break things up into N-without-coupling, and then study their properties. This idea I have put forward countless times, but almost no one will listen to, because once the decomposition, you will not see the target, see the results, the demolition of the things can not be set up again ... Thankfully, the idea of TCP's BBR algorithm is the same, unfortunately, the top experts in the TCP field do not have the n-dimensional disassembly, others just dismantled 2 dimensions.

bandwidth and RTT BandWidth & RTT
I was amazed at how Yuchung Cheng (Zheng Zhong) and Neal Cardwell found this orthogonal base, and why no one had discovered it in 30 years, and the most surprising was that they were right! Their models are based on expansion:




This picture describes the behavior of the network almost completely! This is the essential model of network transmission! The reason why the previous reno to Cubic is wrong, because they do not use this model, I first explain the model, and then see how to put reno/cubic on this model, how absurd.
1. There is a break before the first, we have to know where reno/cubic wrong, and then to know how BBR is improved.
In the bones, people always want time to be the horizontal axis, in addition to life and sex, almost all human behavior is to try to shorten the time.
What is efficiency? People only know that the denominator of efficiency is time! Once the time is determined as the horizontal axis, many times it will deceive a lot of truth. This is a philosophical question, and I'll talk about it later.
People in the establishment of a performance model for TCP, always use "serial number-time curve", "Serial number RTT curve" and so on to observe (such as Wireshark and other analytical tools), the goal is very simple, is to check the "one connection the fastest to dodo how much data",
But, these are all wrong! Why?
Because the time-axis has a lag deception feature, all time-based efficiency improvement programs are "first pollution after the treatment" scheme!
Looking at the model diagram above, all the 30-year congestion control algorithms are aimed at converging to an erroneous convergence point (to the right of the graph): continuously increasing the data transmission, trying to fill the entire network and all the caches on the network, so that it will achieve a higher bandwidth utilization, until the discovery of packet loss, Then quickly reduce the amount of data sent, then re-forward to the wrong convergence point, so repeated. This is the root of the sawtooth! This phenomenon is clearly shown on the model diagram above and is not explained at all. If the model was used at the outset, no one could ask: why should we always wander around the alert area? The correct convergence point should not be the right side of the unblocked mode?? I use the classic VJ version of TCP congestion diagram to illustrate:




We're going to be bees, not mice. The people in the northern provinces prefer to go to the home Tun food tun Meat, and finally there is a lot of food broken waste, seemingly high throughput, the actual efficiency is very low, and southerners are different, Southerners are always buying the day just enough to eat fresh food, will not go to the home tun. The reason for this discrepancy is mainly because the winters in the north are very cold, food is hard to find, and the South has no worries about fresh food all year round. If we go into consideration of this problem, the right approach must be the practice of southerners, any northern people tun food is not to say he likes Tun food, he knows that the food is not fresh good, is out of the way to the food, once the conditions mature, such as greenhouses, Greenhouse appeared, Northerners also began to buy the day's fresh food.
TCP and this is different, TCP reno/cubic is not because the condition does not have to the cache in the data packet, for can make cubic so complex algorithm of people, fix BBR is a piece of cake, I admit I don't understand cubic of those formulas behind the things, But my understanding of BBR is very clear, like my dabbler level was almost a few months ago to achieve bbr similar things, but I can not imagine cubic so complicated things, the reason why Reno/cubic was used for more than 30 years, because people always think that is the right approach, No one seems to think that this is a problem.
A new model based on the orthogonal base of {rtt,delivery rate} As one comes out, people seem to see the truth of the truth suddenly:




The root of the problem is that the BBR congestion control model and the congestion control model prior to BBR have different definitions for the BDP. In the BBR model, the BDP does not include the network cache, whereas the previous model includes caching, which means that there is a strong coupling between the congestion control itself and the network cache, including the cache congestion control model! To make a good congestion control algorithm, it is necessary to thoroughly understand the behavior of the network cache, however, as an end-to-end TCP protocol, it is impossible to understand the network cache. So all along, since 30 has not appeared a better algorithm, Reno to cubic, improved only the algorithm itself, the model has not changed!
The network cache is complex, there are deep queue-based caches, and there are shallow queue-based caches, regardless, you will encounter bufferbloat problem, which is not solved by TCP, although TCP still try to fill up the BDP including these forever elusive network cache, This filling process is gradual, starts at slow start, then ... This step-up of the BDP is two processes, first of all the RTT is constant, and gradually fills up the pipe space that does not include the network cache (in my definition, this is a time-extensibility cache space), and then the process of gradually filling up the network cache (in my definition, this part belongs to the time wall space without time ductility), The problem lies in the unknown and unpredictable nature of TCP to the latter process!!
Now that we know the root of the reno/cubic problem, how does bbr solve it? In other words, BBR hope to converge in the red circle position, how does it do it?
Before enumerating the practices, let me first enumerate the wonderful things about the orthogonal base of {rtt,delivery rate}:




Admittedly, the BBR model has found that the maximum bandwidth and the minimum RTT in a long enough time window is its convergence point, which makes BBR have a clear goal, that is, to find the maximum bandwidth, that is, delivery rate, and to find the smallest RTT, this goal for the moment, the target is set aside, First answer a question, why sampling time window to be long enough, is to this time enough to filter out false congestion, after all, time can break through all the lies, dilute all the sorrow, then why this time is not too long, because to the network environment changes have immediate adaptability. Based on this, the BBR can withstand almost the most spurious congestion scenarios.
Finally, the BBR BDP, as shown, no longer includes the network cache for the alert zone:




I use a unified graph to represent the relationship between RTT and bandwidth:




Well, now I can see how BBR is detecting the maximum bandwidth in this new model.
2.BBR detection of maximum bandwidth and minimum RTT it is clear from the model diagram how the maximum bandwidth can be detected:




However, the detection of the minimum RTT is not very intuitive.
Here we first talk about the "delay-based" congestion control algorithm before BBR. Reno/cubic belongs to the congestion control algorithm based on packet loss, but like Vegas, which is based on time-delay algorithm, the difference is that Vegas is sensitive to the change of RTT, the factor of determining congestion is the change of RTT rather than packet loss, regardless of which algorithm, it needs an external event, Indicates that the TCP connection is already in a congested state. It turns out that algorithms such as Vegas do not work well because they cannot resist spurious congestion, and occasionally an increase in RTT caused by a non-congestion can also trigger TCP active windows, which cannot be used in a way that competes against other packet-based TCP connections that share deep queues.
BBR needs to measure the minimum rtt, but is it a delay-based congestion algorithm?
Not! At first, when I first measured bbr, it was the day after I was drunk on the national day, and almost used all the equipment in my home (imac one, Macbook Pro one, ThinkPad one, root-android tablet one, brushed the Glory cube One, Raspberry Pi Development Board piece, root-android phone one, pole route one, iphone two, ipad two ...), set up a test network, let BBR and Cubic,reno run together, and create a RTT mutation, found that BBR did not spin down, Even the abrupt change of RTT does not react, and the abrupt response to RTT changes is the basic feature of delay-based congestion algorithm, but BBR is not! Any combination of BBR can explode other algorithms. Why is this?
BBR The minimum rtt,bbr is used to calculate the pacing rate and congestion window in a time window that does not slide over time by approximately 10 seconds. The BBR does not respond to the RTT getting larger. But if the whole of the congestion, the RTT does become larger, bbr how to find this situation? The answer lies in the extended slippage of this time window, and if the smaller RTT is not captured within a time window, the current RTT is assigned a minimum RTT. This is how the BBR resists false congestion. Second-level window, nothing can be concealed. This is the measurement of RTT and the principle of its use:




It is now clear that BBR's preemption does not degrade because of its response to the delay, and BBR does not respond directly to the delay. BBR not preemption, but also not weakness, it is doing the right thing required, it just did it.
Before the online to see a Silicon Valley work intern questioned BBR will be due to delay reaction and not conducive to fairness, I would like to reply, and then feel that the wall is too troublesome to turn over,
3. A summary of what we have always needed is a "sustainable development" programme to address efficiency issues. Unfortunately, over the past 30 years of TCP, all of the congestion control algorithms we've seen are the so-called first-post-pollution-governance scheme that runs through all the algorithms from Reno to Cubic. First put the cache into a burst, and then active mitigation, because all based on packet loss algorithm are rushing to fill all the cache, based on time-delay algorithm too gentleman's active spin down behavior is not competitive enough, typical bad currency drive good currency scene.
I always thought that the increase of TCP, multiplicative reduction is a person for me, I for everyone's strategy, indeed, can understand, but this is a kind of respect for a cup, self-punishment a pot of strategy, the outcome is universal fall ...
Everyone can casually say what the role of Ssthresh, such as when the window is larger than it is how, when less than it is what, but someone know what it means? Perhaps, you will find that SS is the abbreviation of slow start, while Thresh is the abbreviation of threshold, the threshold meaning, but this is not the correct answer, the correct answer is here:
The capacity of a path can be informally defined by the sum of unused available bandwidth in the forward path The size of buffers at bottleneck routers.
Typically, this capacity estimation was given by Ssthresh.

The truth is so recently revealed, no wonder people only know the usage of Ssthresh and do not know its meaning Ah! A little more detail can also be seen I wrote an article "TCP Core concept-slow start, ssthresh, congestion avoidance, the true meaning of fairness."
In fact, Ssthresh defines all caches on the path (including time-stretched caches and time-wall caches), so when a packet is detected, the CWnd is reduced by 1/2 and assigned to Ssthresh, because the BDP is fully loaded, so 1/2 of its capacity is the cache capacity of the path! perfect! However, this is wrong, TCP mistakenly think all the cache can be filled, but the fact is, only the time of the extensibility of the cache, that is, the network itself is to be filled, and the time wall cache is an emergency cache, all TCP as long as all avoid the use of time wall cache (including routers, switches on the queue cache), Its ability to really play an emergency role, bandwidth utilization will be maximized!
Emergency lanes are emergency, not driving!
BBR's new model shows all this error to everyone, so BBR's instructions are to keep the maximum bandwidth and minimize the use of the network cache. In fact, based on the new model, the amount of BBR is much simpler than the previous algorithm! Do not feel that the new algorithm must be difficult, on the contrary, BBR super simple.

Perhaps you have thought of bbr similar ideas, but it can be implemented on Linux or the TCP implementation of Linux operation, and not just a congestion module so simple, the previous few days of the article said N times, BBR before the congestion control algorithm in the non-open state will be taken over, Again the good algorithm also completely useless, because cubic tried to fill all the cache space including the queue cache, in the current core deep queue, the edge shallow queue high-speed network environment, only less than 40% of the time TCP congestion state is in open state, most of the situation, the traditional algorithm simply can't run! Fortunately, in the implementation of BBR, the author noticed this, completed the TCP congestion control surgery, quick!

Will the BBR be the default TCP congestion control algorithm in the 4.9 or 5.0 cores? I think there may be more tests, although cubic performance is not good, but at least because of its performance caused more serious problems, cubic operation is very stable. But I personally hope that I want BBR to quickly become the standard for all Linux versions, completely ending the so-called TCP single-sided acceleration of this scandal!

Attached: time-scalable caching and time-wall caching I've always stressed that BBR is simple because it's really simple. Because in the first half of the year, I almost came up with a similar algorithm, when I was ready to operate on PRR.
I think the congestion control algorithm to the Congestion window control is not big enough, I want to use the congestion algorithm itself logic to the absolute window how to adjust, rather than blindly prr! I do not believe that the detection of three duplicate ACK must have been dropped packets, even in non-open state, such as the recovery state, I also hope that my algorithm can be considered can increase the window ... I have carefully analyzed the characteristics of the network, in addition to two types of cache:
with time extensibility of the cache, that is, the network itself (to be exact, the data running on the cable, from A to B takes time, so that the network is a storage function);
Time-wall caching, which is the queue cache for router switches. The nature of this type of cache is memory.

For this, see my June article, "TCP self-clocking/congestion control/bandwidth utilization of the context of the semi-scene analysis," I think, cubic algorithm in the end to induce TCP filling time wall cache, however, in terms of cubic itself, the difference between these two kinds of cache is very difficult ... I didn't have a good model at the time, but I did differentiate between the two types of caches ...
Unfortunately, I did not go on, a lot of things are not my personal ability to around, nor I can arrange, but when I saw bbr at that moment, there is a feeling of finding the answer. Although, although many things are not my personal ability and arrangements, when I fear the beginning, most of the time unconsciously, silently ended. Do you have this feeling too?
Finally, that philosophical question, listening to Dou Wei's old songs, I think I can talk about it. The denominator of efficiency is the question of time.
No matter how hard people try, the denominator can't be 0! But there is an exception, that is, a snake from the tail began to eat their own situation! Can you imagine a scene like that? This is not possible unless you can traverse the timeline arbitrarily, is that it? I think this kind of thing is not so complicated, just up a dimension to think about it, any lower level of the dimension of anything for a higher dimension is a point, that is, a small universe. A snake is a three-dimensional animal, it is never aware of the four-dimensional space inside a point, is that it ate their own. Explosion!

Google's BBR Congestion control algorithm model analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.