The TCP implementation of Linux is complex and cumbersome, it is recommended not to go directly to the code, but instead take the time to take the TCP specification first. This article mainly describes the TCP implementation of the undo operation, and then by the way to spit the groove (do not feel that the spit groove is not good, a lot of good things from the beginning of the groove, from paper, steam engine, to the first Republic of France, then to Linux ...).
1.TCP Network congestion is a prediction-based TCP is part of the network, which is understandable, but when a person says he is proficient in the network, he is more likely to mean that he is proficient in network node behavior rather than end-to-end behavior. For example, a Cisco engineer who is proficient in various routing protocols can design complex OSPF networks, BGP networks, optimize the PPS of each node, enable spanning tree somewhere, and the flexible use of various algorithms, whether in research and development standards or standards in operation, is enough to make your fireworks sportive, Worship to the extreme, as if everything in the network in its hands, play to the pleasure, I have been exposed to such people, but also know a lot of such friends, once stood in the research and development of the point of view and confrontation, but also stood in their position on the code of the pain of the farmers in disregard ... These people guarantee that as long as the packets pass through the equipment they are in charge of, in the sense of efficiency, he can ensure that they pass as fast as possible, in the sense of security, any evil clues can not escape their discernment ...
However, once TCP is involved, they are incompetent. I have always felt that the behavior of a TCP connection is "relevant to the world," and that no administrator can control the behavior of all the computers in the world, even if it is immortal. TCP only defines the end-to-end behavior, as if the two communication process is adjacent, TCP is the first assumption, this is the "end-to-end sliding window," the original intention, that is, how much data sent to the receiver can be received only about how much. The Congestion control algorithm is added when it is found that the end-to-end is not directly adjacent but a transformation of the network with a butterfly effect.
TCP at the beginning of the design, there is no explicit congestion notification mechanism, there is no denying the response mechanism, so for the sender, the only two tools to predict the network situation is the ACK and RTT, and the RTT is measured, which means that it will be affected by the receiver delay send ACK, Although the TCP specification requires avoidance of stretch ack, if the ACK is lost, nothing is too good to say ... So for TCP networks, congestion is predicted, and here are examples of similar predictions in reality:
Cohabitation roommate to buy food, and then walked for 10 hours did not come back, you will think, you may call to confirm, but if nothing, you can only assume the roommate accident, then the police;
Cohabiting roommates and four strangers to the same market to buy food, four people back a long time, the first walk of the roommate has not come back, at this time you will think.
In either case, it may just mean that the roommate is playing a prank ...
2. When the prediction error is can be undo if we predict the network congestion, but then there are indications that the fact is not what we imagined, this time we can undo those to deal with the network congestion, the problem is what can undo, and which can not undo.
3. What can be undo once we have predicted the congestion of the network, there are two things we need to do:
a). Reduce Congestion window
b). Re-send the possible lost packets (people go to buy vegetables no more back, if only for the sake of food, you can send another person to buy food).
Reduce congestion window is to not be congested network congestion, this is undoubtedly a gentleman behavior, not suitable for Chinese-style drivers, of course, not suitable for China's TCP developers, although it will also reduce windows, but the original intention is not to add plugging, but do not reduce the window to send data is useless, net increase flow ... When the prediction error is found, the network is no congestion at all, the best TCP behavior is to compensate for themselves, in a "Everyone for me, I for Everyone" world, no one will because of your behavior of the gentleman to compensate you, only yourself can. So it's a self-compensating way to raise the congestion window to the value before the window is lowered.
4. Which can not undo in the pre-judgment as congestion, will re-send the possible loss of packets, this part is not undo, there is no mechanism to send out the data back! The only thing that can be done is to be more accurate in the next forecast.
5. When you can undo we know that TCP can pre-determine that a packet has been lost by reordering a duplicate ACK and timeout, but it does not mean that a duplicate ACK or timeout must have been lost or that the packet has been reordered, and that the delay order has reached the receiving end. Therefore, a repeated ACK or timeout is only a necessary and sufficient condition for packet loss.
In one case, we can conclude that our pre-judgment is wrong, that is, we resend the packet has been received two times, this is said to be the original send, again is the resend. This can be well discovered by dsack or a timestamp mechanism. Once we find that all the Dsack are received, we can undo and undo all the Undo.
6.undo where to go if you have entered the recovery state because of the wrong pre-judgment, then where is the undo?
If the sending window is clean enough to receive all the Dsack of the resend packet, it can go directly to the open state, but if there is still data marked as sack or lost after Una, it indicates that there is still the possibility of a disorderly order, then it is OK to enter the disorder state. We know that in the recovery state, the window is down, however in the disorder State, the window is not necessarily stiff (the stiffness is to confirm that there is enough repeat ack before, can not pre-judge whether the order is disorderly or lost, so we have to freeze the window), if we at the time of undo, The value of the reordering has become larger, which means that the network is more likely to be out of order, you can continue to increase the size of the congestion window.
is a general picture of the main idea:
Continuous optimization of 7.Linux kernel protocol stack If you look at the 2.6.8 kernel and the 4.4 kernel, you will find that there is a big difference in TCP congestion processing, 2.6.32 as a standard version, you will find that the 3.10 version adds the PRR window algorithm, and then in the next 3.11, introduced the " Reordering greater than the default in disorder state can not freeze and add window "algorithm, and then look at 4.3/4.4, you will find in the congestion state machine, mid switch this thing has actually completely broken the Newreno hypothesis, This is Newreno early exit. and seemingly linux in the process of continuous optimization, has become no longer a gentleman, sudden no longer be limited, grab the road queue sometimes ...
8. The details of a brush retransmission queue This section is irrelevant to the main idea of this article, but does not want to write another article.
In the RFC specification, in a congested state, the lost packet should be sent first, then the new packet, and the last "forward send" cover (that is, high seq) before there is neither a token lost nor a sack packet, the RFC is so prescriptive, The mathematical reasoning behind is simply not given in the RFC, the RFC only gives the conclusion! Please do not question the rationality of this scheme, please do not!
But if you look at Tcp_xmit_retransmit_queue, it's hard to find this logic:
Tcp_for_write_queue_from (SKB, SK) {__u8 sacked = TCP_SKB_CB (SKB)->sacked; if (SKB = = Tcp_send_head (SK)) break; if (hole = = NULL) Tp->retransmit_skb_hint = SKB; Window limit check if (tcp_packets_in_flight (TP) >= Tp->snd_cwnd) return; if (fwd_rexmitting) {begin_fwd://Highest forward sent to high sacked if (!before (TCP_SKB_CB) SKB,->seq ST_SACK_SEQ (TP))) break; } else if (!before (TCP_SKB_CB (SKB)->seq, Tp->retransmit_high)) {Tp->retransmit_high = Last_lost; The key is here! Tcp_can_forward_retransmit will judge that if there is new data ready, it will break! After entering the tcp_can_forward_retransmit, there will be tcp_may_send_now judgment, whether or not the new data is ready to be judged. Linux is not sent in a path by RFC-standard send priority, when there is new data//ready, simply exit, let the send path self-processing! if (!tcp_can_forward_retransmit (SK)) break; if (hole! = NULL) {SKB = hole; Hole = NULL; }//Only "forward send" fwd_rexmitting = 1 when tcp_can_forward_retransmit does not break; Goto BEGIN_FWD; } else if (! ( Sacked & Tcpcb_lost) {if (hole = = NULL &&!) ( Sacked & (tcpcb_sacked_retrans| tcpcb_sacked_acked))) hole = SKB; Continue } else {//Send lost data first last_lost = TCP_SKB_CB (SKB)->end_seq; }//will not send packets that have been sack, or have been re-transmitted! if (Sacked & (tcpcb_sacked_acked| Tcpcb_sacked_retrans)) continue; TCP_RETRANSMIT_SKB (SK, SKB); }
9. Again for openssl/openvpn/linux TCP spit Groove Although currently no longer engaged in openssl/openvpn work, but I still continue to pay attention, still can not tolerate this kind of people do not understand, even if you understand the bad code, If someone can teach me to understand the ASN.1 processing code of OpenSSL, I would like to send 100 dollars of red envelopes! OpenVPN is the same! For the TCP code, see the following fragment:
/* People celebrate: "We love our president!" */static int tcp_try_undo_recovery (struct sock *sk) { struct tcp_sock *tp = Tcp_sk (SK);..
Which great God can explain "people celebrate:" We love our President! "" The meaning! I dare say that if I write such comments in my own code, it will be considered to be a masterpiece after a two-port Erguotou, and then be approved and considered abnormal. Once I was in the design of Conntrack cache, in the code wrote a "rotating lift seat will explode", and then I deleted, I think this is not in line with our country's culture, so I deleted. In our culture, there will be no "talk is cheap Show me the code", so there will be no Hargreaves, no watts, no jobs, no Linus ... Who would it be? Anyway is a bunch of people to decide, cadence, was instilled in, and then grow up to instill in others. Explosion!
Undo operation in Linux TCP congestion control