The algorithm of window descent when TCP enters fast recovery

Source: Internet
Author: User
Tags ack

Dead...
Dead...

TCP in the discovery of packet loss, will take certain measures, as to how to find out that the packet is not the content of this article, this article mainly describes the discovery of TCP after packet loss measures.

In Linux, for example, the window occurs at the time of entering the fast recovery (temporarily regardless of RTO and local congestion), before the window is a disorder state, refers to the system found an exception, such as received a duplicate ACK or received a push ACK carried sack information, However, not to the point of retransmission, such as not yet to achieve the threshold value of chaos, under the threshold value, the system will assume that this is just a disorderly sequence, immediately will be good, in the disorder State, TCP congestion window is frozen, that is neither increase nor decrease, there will be two results, One is in order to be exceeded before the system back to normal, then will go into the Congestion Avoidance state, window execution AI, on the other hand, if the chaos is exceeded, then will enter the fast recovery State, window execution MD, in this state, the congestion window will continue to reduce, the following space is to describe the window process.

TCP goes into a state called "fast recovery" and restores what? Return to the normal state! In fast recovery, the main task is to re-transmit "is considered to be" a lost packet. "is considered" the reason is to add quotation marks because everything is speculation! In this process, TCP will think that the network is congested, since the congestion occurred, and then how many packets are not used. TCP is designed to be a gentleman agreement, that is, when they find the packet lost, they will take the initiative to reduce the window, we all do, in order to expect the network quickly return to normal, which is why like the Chinese innovation of the algorithm will never enter the standard, will never write paper reasons for it.


The question now is how to reduce the window.


For the time being, there are mainly 4 ways to reduce the window:

1.BSD Original way: This is also the "TCP/IP detailed" (not the second edition AH) described in the way, like an electrocardiogram, before the window will have a burr. It's already been deprecated, so if you want to see Steven's book.
2.rfc3517 way: This is a typical Newreno way, RFC3517 in a detailed description, relatively short is not complex, 10 minutes can be read, like a history book, understand its basic ideas can, the ins and outs, lead the future.
3.Linux rate halving way: this is a non-standard way, but it is an innovative way, it "commitment if" no longer steep window, it will slow down the window to the original half, and then enter the normal state and then slowly rise.
4.Linux PRR Way: Based on rate halving, it is not just "commitment", but a guarantee! And more flexible, the window is driven by the ACK, the specific window to where, depending on the Ssthresh, and the latter is determined by the specific congestion algorithm.

The above is a typical 4 way, the textbook school is basically 1 and 2, which is why casually asked a student, they can answer 1 or 2, and 1 and 2 there will be no rest debate. 3 and 4 belonged to the kernel community, and even the guys who knew the Linux kernel didn't know the details and asked them the details, most of them answered 1 or 2 ... This does not explain what, just said some non-standard implementation eclectic. From this eclectic, I myself also have a naïve idea, that is to let the fast recovery phase of the window to go an arc, starting in the original window, the current bandwidth allows the window to be closed. This is also the legendary 5.

This article is no longer elaborate on 1 and 2, after all, I am not a great people teachers, but also a programmer, not so much time to explain what everyone knows, but I am good at explaining what most people do not understand. Before passing 1 and 2, say their drawbacks:
1). Half Silence
2). Burst failed

Do not think that sudden is an advantage or disadvantage, if they as an advantage, then when the issuance of a large number of packets, will cause or aggravate the network congestion, when there is no such sudden, it will cause a conservative transmission! However, this is not a tune, but an inherent flaw in the algorithm itself!
In addition to 1 and 2, let's say one.
Linux rate halving algorithm Linux TCP protocol implementation, the use of a non-standardized window reduction algorithm, the rate halving algorithm, as the name implies, it is designed to reduce the window to the original window half! Goals and RFC3517 are exactly the same, and the difference lies in the process of execution.
The Linux rate halving algorithm treats congested windows by performing a slow-down process, rather than falling into half as fast as RFC3517, and then performing the process of a slow linear rise in windows throughout the rapid recovery process. On the contrary, the Linux rate halving algorithm is a reverse process, it does not steep down the window at first, at the end of the fast recovery, its goal is to slow down the window to the original half, after entering the normal state, only half of the original window to do congestion avoidance, linear slowly increase the window.
However, the reality often does not want the vision to be hoped, we will slowly geographical clear the clue. First, the algorithm for the Linux rate halving is as follows:

Initial state:
cwnd= congestion window size before entering fast recovery
Dec=0
Before returning to normal, for each ACK, execute Cwnd_down:
Cwnd_down ()
{
Dec = Dec + 1;
Dec = decr&1;
CWND = Cwnd-dec;
CWnd = Min (CWnd, in_flight+1);
}

To help understand, the following first gives a general idealized example, demonstrates the execution of the algorithm, and then gives an example of a slightly distorted point, demonstrating the problem of the algorithm:



It's perfect! Altogether 20 ack, minus half of the window (from 20 to 10), 20 ACK means that the window data is confirmed in a RTT at the time of the fast recovery, but please note that the above example is in the ideal case of the window-down process! If you look closely, you will find that it relies heavily on the in_flight variable, the number of packets "currently issued, unconfirmed, not marked as lost". Ideally, the value of the in_flight is the data of a window, that is, the size of one of the windows, but is it actually the case? We know that the data packets are sent, confirmed, and the token is missing between the three are decoupled:
1. You cannot guarantee that the sending path or retransmission path will send a window as many packages
2. You cannot guarantee that even if a window package is issued, the update scoreboard (see RFC3517 update routines) will not have a large number of packets marked as lost

Therefore, in the rate halving algorithm, the In_flight value may be steep drop, the effect is the CWnd steep drop, but there is no compensation, such as the following slightly perverted example:




If a large number of packets are marked as lost, as is the case with the fact that the window is not known, the fack mode of carrying a sack package may mark many packages as lost! In addition, the rate Halving window algorithm, the number of ACK instead of acked as the ruler, which seems to be a good thing, but because the number of acked bytes will be directly reflected in the in_flight variable, the result is, if an ACK confirms a large number of bytes, in_ Flight will become smaller, the final time to take min, or will cause the window steep drop! But it also caters to RFC2582 's advice:

Deflate The Congestion window by the amount for new data acknowledged, then add back one MSS and send a new
Segment if permitted by the new value of CWnd. This is "partial window deflation" attempts to ensure.
Fast Recovery eventually ends, approximately ssthresh amount of data is outstanding in the network.

But what's the difference? The difference is that Newreno in the fast recovery phase of the window is gradually increased from ssthresh to congestion avoidance, while the Linux rate halving algorithm in the fast recovery phase of the window is only not up, this is the fundamental difference, Newreno in received Patial The reason for the window drop in the ACK is not sufficient for Linux rate halving:
Note in Step 5, the Congestion window was deflated when a partial acknowledgement is received.
The congestion window is likely to having been inflated considerably when the partial acknowledgement
Was received. In addition, depending on the original pattern of packet losses, the partial acknowledgement
Might acknowledge nearly a window of data. In this case, if the congestion window is not deflated, the
Data sender might is able to send nearly a window of data back-to-back.

Now that we know the actual effect of the Linux rate halving algorithm, let's look at the impact of the steep window drop and the root cause of this effect. The effect of a steep window drop is that it drops below Ssthresh, thus entering a slow-start state at the end of the fast recovery phase. In fact, if you look at this algorithm carefully, you will find that it does not check the Ssthresh, but only limited the "halving" this action, so even if the window is not steep drop, the algorithm can not perceive the window and ssthresh the current relationship.
the root of the Linux rate halving is:
1. As the half word in its name, its goal is to halve the window, but first of all, not all congestion control algorithms are to halve the window, which is only the specification of early Reno/newreno;
2. Even if the window is halved, in the process of execution, the algorithm does not fulfill the commitment, but relies too much on an in_flight variable that it cannot control, and is controlled by it.
3. Step back, even if the window has dropped to less than half (it is no longer constrained by Ssthresh), the Linux rate halving algorithm does not have any compensation measures to pull the window up;
4. Where does the window drop off? The Linux rate halving algorithm, in the fast recovery phase, allows up to 1 data segments to be sent at a time, that is, the window in_flight+1.


Given the lack of Linux rate halving, Google has introduced its Linux PRR algorithm, which addresses the three issues mentioned above.

Google's Linux PRR window algorithm we first see what it solves, for the disadvantage of Linux rate halving, PRR algorithm:
1. No longer just halving, but completely based on the congestion algorithm computed Ssthresh, to the window approximation to it;
2. The process of execution is no longer controlled by the current in_flight, but is based on the total number of ACK sent/received since the fast retransmission, and the window is approximated to ssthresh in a proportional manner;
3. If the window drops below Ssthresh (such as no packet can be sent or a large number of packets are labeled lost), the algorithm performs a slow start to pull it up to Ssthresh;
4 The Fast recovery phase, the maximum "also" can send (or retransmission) how much data, no longer limited to 1, but depends on "the total amount of ack/sack currently received, the amount of data emitted, the window and the relationship between Ssthresh."

One of the most important is the above 2nd, which simply points out that the PRR algorithm is driven by ACK, which forms a feedback system, but unfortunately PRR only uses part of this feedback system, and the other part treats me
This is discussed in the next section describing one of my optimizations. Immediately following the 4 points above, the final effect of this PRR algorithm is:
1). In the fast recovery process, the congestion window converges very smoothly to the Ssthresh;
2). At the end of the fast recovery, the congestion window is near Ssthresh.

To achieve the above objectives, the PRR window algorithm must monitor the following variables in real time:
In_flight:It is a measure of the window, and the value of in_flight cannot be greater than the size of the congested window at any time.
(s) acked:The number of data segments that were either ACK or sack when the ACK was received into the window function. It measures which data segments are emptied from the network, thus affecting in_flight.
Out :How many packets have been sent after entering the fast recovery state. Incremented in the transmit routine and the retransmit routine.
to_be_out:How many more packets can be sent at this moment.

According to the principle of data Baoshou, the total amount of packets that can be sent is the total amount of packets confirmed in the ACK received, however, the process of being in a congested state is not the procedure of the conservative sending, but the process of reducing the window, so it is necessary to make a discount between the number of packets to be ACK and the number of packets that can The goals PRR would like to achieve are:
ssthresh/old_cwnd== the rate at which data is sent/the rate at which the data is ACK
Further:
ssthresh/old_cwnd== (Rate of data sent *t)/(Rate of data ack *t)
That
SSTHRESH/OLD_CWND==PKTS_OUT/ACKS_RCV
This is used to converge the target window to Ssthresh. At the time of the fast recovery, the window has not yet fallen, and the following inequalities are established before convergence ends:
SSTHRESH/OLD_CWND>=PKTS_OUT/ACKS_RCV
So:
acks_rcv* (ssthresh/old_cwnd) >=pkts_out
Taking into account the conservation of the packet, set
extra=acks_rcv* (ssthresh/old_cwnd)-pkts_out
This means that we can send extra so many packets before the convergence is over. The specific derivation process of the above conclusions is as follows:
In the data Baoshou principle we have to notice a recursive equation, the principle of conservation requires, how many packets are ack or sack, indicating how many packets have been sent out of the network by so many packets, the equation sequence is as follows:
Initial state when entering a fast recovery:
(s) acked0=0
Out0=0
To_be_out0=0

Thereafter, each receive an ACK (can carry sack block):
Receipt of the 1th ACK:
to_be_out1=this_acked= (s) acked1; OUT1=TO_BE_OUT1
Receipt of the 2nd ACK:
to_be_out2=this_acked= (s) acked2-(s) acked1= (s) acked2-out1; Out2=out1+to_be_out2
Receipt of the 3rd ACK:
to_be_out3=this_acked= (s) acked3-(s) acked2= (s) acked3-(TO_BE_OUT2+OUT1) = (s) acked3-out2; out3=out2+to_be_out3
...
Receipt of the 5th ACK:
to_be_out5= (s) acked5-out4; OUT5=OUT4+TO_BE_OUT3
Receipt of the 6th ACK:
to_be_out6= (s) acked6-out5; Out5=out5+to_be_out4
...
Receive Nth ACK:
to_be_out[n]= (s) acked[n]-out[n-1]; Out[n]=out[n-1]+to_be_out[n]
From the above equation, we can get a relationship, that is: PRR feedback system by To_be_out to complement the (s) acked empty out! This is like a producer/consumer
The process of trading, the current PRR algorithm requires that the trading process is not an equivalent transaction, but a discount transaction, to be discounted on the received (s) acked, which means that if the send terminated with n
Data segment is ACK, then only calculate Beta*n is ACK, which 0<beta<1. So the above equation becomes:
to_be_out[n]=beta* (s) acked[n]-out[n-1]
Compare the conclusions:
extra=acks_rcv* (ssthresh/old_cwnd)-pkts_out
We know that this discount is (Ssthresh/old_cwnd)

Well, now it's time to give the Linux PRR algorithm:

Initial state:
cwnd= congestion window size before entering fast recovery
Old_cwnd=cwnd
ssthresh= congestion algorithm calculation results, can be Old_cwnd 1/2,4/5, ...
Acked=0
Out=0
Before returning to normal, for each ACK, execute Cwnd_down:
CWND_DOWN_PRR (Tcp_sock)
{
CNT = 0;
tcp_sock.acked = tcp_sock.acked+ The total number of data segments for this ACK or sack;
if (Tcp_sock.in_flight > Tcp_sock.ssthresh) {
CNT = tcp_sock.acked* (Tcp_sock.ssthresh/tcp_sock.old_cwnd)-tcp_sock.out;
} else {
CNT = tcp_sock.ssthresh-tcp_sock.in_flight;
}
Tcp_sock. CWND = Tcp_sock.in_flight + cnt;
}


As with the introduction of the Linux rate halving, an example is given to demonstrate the execution of the algorithm, and similarly, the first example is the standard ideal environment, and the second one is perverted:




Here I give an example of a PRR with the initial state of a second example of Linux rate halving, and let's see how PRR compensates for the steep drop in the window:




Finally, let's take a comprehensive example with sack:




If the algorithm executes, there is no problem, and the window will perform a similar slow-start process if it does not descend to Ssthresh. The so-called "slow-start" process here refers to the size of the ACK data to increase the window, such as the ACK of N, then the window increases the size of n MSS.


From the example, we see that each step of the algorithm execution, the target congestion window is based on In_flight, (s) acked,out, such as real-time smoothing calculation, the results of its drop window fully reflects the current network situation, thus solving the Linux rate halving algorithm " In isolation, ' wishful thinking ' attempts to bring the window to the half of the original window in a RTT "so a basic commitment to make the most of the window seems to be reluctant to the problem of the algorithm.

A simple window-down optimization explosion! The problem of the congestion window is basically settled. Now I have a simple optimization idea based on the PRR algorithm.

My intentions are simple, I hope that the window does not have to descend to Ssthresh, but set a stop valve, down to this value will not have to continue down, to maintain, and then start from this window congestion avoidance. In fact, no matter how big the congestion control algorithm before entering the fast recovery, the calculation of the so-called Ssthresh all TMD is shot out of the brain! Taking the Linux current default congestion algorithm cubic as an example, the Ssthresh is reset to 717/1024 of the current window before entering fast recovery, which is a constant proportion, regardless of the severity of the congestion, or just another extreme, this drop is only derived from an occasional network noise , which are not mentioned at Cubic's algorithmic level.
The advantage of the PRR algorithm is that, based on the real-time value of the current TCP counter (out,acked, etc.), the window converges to ssthresh in a smooth manner, which is better than the rate halving, compared to prr,rate halving is a completely closed algorithm, There is no control over the current TCP counter value changes. However, PRR is not completely dynamic, after all, Ssthresh can be regarded as a fixed value, sometimes the window is not necessary to drop so much! So what is the appropriate level of the window? I think the moment to "data is the rate of ACK is equal to the rate at which the data is sent", which means that the pipe is just as smooth, how many packets, how many (s) are received ack!
The improved algorithm is as follows:
Cwnd_down_prr_pro (Tcp_sock)
{
CNT = 0;
tcp_sock.acked = tcp_sock.acked+ The total number of data segments for this ACK or sack;
TMP1 = tcp_sock.acked-tcp_sock.prior_acked;
tcp_sock.prior_acked = tcp_sock.acked;
TMP2 = Tcp_sock.out-tcp_sock.prior_out;
Tcp_sock.prior_out = Tcp_sock.out;

The difference between the number of confirmed and the number of sends is less than 1, indicating the rate is equal, maintaining the window, no longer falling.
if (tmp1-tmp2 absolute value is less than or equal to 1) {
tcp_sock.eq++;
tcp_sock.gt = 0;
if (Tcp_sock.eq >= 3) {
Tcp_sock.ssthresh = tcp_sock.cwnd-1;
Return
}
}
More than 3 consecutive confirmation volume is greater than the amount of the issue, add windows!
else if (Tmp1 > Tmp2) {
tcp_sock.gt++;
Tcp_sock.eq = 0;
if (tcp_sock.gt >= 3) {
additional windows;
Tcp_sock.ssthresh = tcp_sock.cwnd-1;
Return
}
}
or continue to drop the window.
else {
Tcp_sock.eq = 0;
tcp_sock.gt = 0;
Tcp_sock.ssthresh = Tcp_sock.ssthresh_by_cong_alg;
}

if (Tcp_sock.in_flight > Tcp_sock.ssthresh) {
CNT = tcp_sock.acked* (Tcp_sock.ssthresh/tcp_sock.old_cwnd)-tcp_sock.out;
} else {
CNT = tcp_sock.ssthresh-tcp_sock.in_flight;
}
Tcp_sock. CWND = Tcp_sock.in_flight + cnt;
}

You will notice that it sacrifices fairness, that is, if possible, no longer executes the MD, on the other hand, the algorithm also has a role, that is, it can simply distinguish between the occasional packet loss caused by the network noise or congestion caused by persistent packet loss. About fairness, but also to say that when others are robbing you of resources, who is friendly to others who is 2b!

The algorithm of window descent when TCP enters fast recovery

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.