Understanding TCP Fast retransmission mechanism by Packetdrill-structured packet sequence

Source: Internet
Author: User
Tags ack rfc


The logic of TCP is extremely complex, although the learning curve is very gentle but its every step is very difficult, fortunately these are physical activity, as long as willing to take the time is not easy. To thoroughly understand a TCP mechanism, there is a four-part song:
1. Read the RFCs associated with them;
2. See the TCP implementation of the Linux protocol stack;
3. Confirm the fact by grasping the package and other tools;
4. Solve a network problem related to it.

Through the above four steps, I believe that anyone can be in the relevant areas of a little bit of force ...
The content of this article is the TCP fast retransmission mechanism, but unlike other articles, this article does not dissect the source code implementation, also does not translate the RFC, is not the original reason introduction, but through a TCP message sequence to see how TCP is working, in the possible case, I will control the Linux protocol stack source code.
We know that TCP is a chaotic system related to the world, but it will eventually be implemented into every detail, these details are rule-based, and ultimately the specification of the RFC, in the scope of these microscopic details, TCP in the macro can be manifested as a human confusion of the chaotic system.
The best way to understand a mechanism is to explain it in an instance, so it's better to pick up a sequence of messages from the Internet, but this is just one aspect of things, and another aspect of things is, be sure to reproduce! It is because TCP is a chaotic system on the macro level that it is impossible to reproduce any TCP message sequence in reality. So we have to construct the sequence of messages by hand.
can be done to construct the message sequence of tools There are many, compared, I still chose Packetdrill, this is a tool for Google, it is relatively easy to use, its shortcomings in the different kernel version may have different weird problems, as its community admits, They have no time to maintain a stable version of the universal, but it does not matter, the small white bug itself repair is that no bug fixes the programmer is the manager. About the details of the Packetdrill I will not repeat, to their own GitHub to understand is, this article immediately began to enter the topic.
Before I go any further, I'll show you a little detail.


Since Packetdrill is opening a tun (a virtual network card device driven by Tun.ko) NIC device at run time, and then configuring it, in order to only discuss TCP itself, I need to eliminate any offload mechanism effects, And I don't know how to get rid of offload with parameters (not like it!) ), so I modified the code for the Packetdrill netdev.c:





/* Set the offload flags to be like a typical ethernet device */
static void set_device_offload_flags(struct local_netdev *netdev)
{
#ifdef linux
//      const u32 offload =
//          TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | TUN_F_TSO_ECN | TUN_F_UFO;
//      if (ioctl(netdev->tun_fd, TUNSETOFFLOAD, offload) != 0)
//              die_perror("TUNSETOFFLOAD");
#endif
}





Very silly than a modification, hereby stated.


First I give my first Packetdrill script and note the comments.

// establish connection
0 socket (..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt (3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind (3, ..., ...) = 0
+0 listen (3, 1) = 0

// Complete the three-way handshake
+0 <S 0: 0 (0) win 32792 <mss 1000, sackOK, nop, nop, nop, wscale 7>
+0> S. 0: 0 (0) ack 1 <...>
+.1 <. 1: 1 (0) ack 1 win 32792
+0 accept (3, ..., ...) = 4

// Send a segment. Note that it will not cause the congestion window to increase, because the initial congestion window has been changed to 10 segments according to Google's recommendations, and this less than 10 segments will be limited.
// Please use tcpprobe to confirm. For the code added in the restriction window, please participate in the code later in this script. Note that in the era when the congestion window started from 1, 2, or 3, this limitation did not exist.
+0 write (4, ..., 1000) = 1000
// The packet sending sequence is: +0> P. 1: 1001 (1000) ack 1

+.1 <. 1: 1 (0) ack 1001 win 32792

// write 4 segments, this purpose is to trigger fast retransmission
+0 write (4, ..., 4000) = 4000

// We get 3 SACKs, but please note that the difference between FACK and standard SACK must be reflected:
// Standard SACK: Only when the segments 4001-5001 and 2001-4001 have been received by SACK, can retransmission be triggered.
// FACK: Note that the 4001-5001 SACK received first has a distance of more than 3 from UNA. You do not need to receive the 2001-5001 SACK to trigger retransmission.
+.1 <. 1: 1 (0) ack 1001 win 257 <sack 4001: 5001, nop, nop>
+0 <. 1: 1 (0) ack 1001 win 257 <sack 2001: 4001, nop, nop>
// A small question, if the sequence of SACK becomes the following: +0 <. 1: 1 (0) ack 1001 win 257 <sack 2001: 3001, nop, nop> Will the retransmission be triggered?
// If you capture the packet at this time, you will find that retransmission has occurred

// confirm all
+.1 <. 1: 1 (0) ack 5001 win 257


Because this script is too simple, the comments have given everything, so in the text will no longer catch the packet confirmation, interested can self-confirm the difference between Reno,sack,fack, the method is very simple, respectively turn off Sack,sack can:
net.ipv4.tcp_sack = 1|0
Net.ipv4.tcp_fack = 1|0


Then grab the bag to value the timing of the trigger, if you do not understand the difference, please make sure to do the above-mentioned grasp the package to ensure that can be instantly understood. The code for the Limit window added to the comment is as follows:





int tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight)
{
    const struct tcp_sock *tp = tcp_sk(sk);
    u32 left;
    if (in_flight >= tp->snd_cwnd)
        return 1;

    left = tp->snd_cwnd - in_flight;
    if (sk_can_gso(sk) &&
        left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd &&
        left * tp->mss_cache < sk->sk_gso_max_size)
        return 1;
    return left <= tcp_max_burst(tp);
}


This function is called before the window is added.



Here I use an example of a complex point of complexity as the next part of this article, this example contains not only the fast retransmission, but also some details about the congestion window, please carefully analyze. The more complicated examples are:





// establish connection
0 socket (..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt (3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind (3, ..., ...) = 0
+0 listen (3, 1) = 0

// Complete handshake
+0 <S 0: 0 (0) win 65535 <mss 1000, sackOK, nop, nop, nop, wscale 7>
+0> S. 0: 0 (0) ack 1 <...>
+.1 <. 1: 1 (0) ack 1 win 65535
+0 accept (3, ..., ...) = 4

// Send 1 segment without increasing the congestion window
+0 write (4, ..., 1000) = 1000
+.1 <. 1: 1 (0) ack 1001 win 65535

// Send another segment, the congestion window is still the initial value of 10!
+0 write (4, ..., 1000) = 1000
+.1 <. 1: 1 (0) ack 2001 win 65535

// .....
+0 write (4, ..., 1000) = 1000
+.1 <. 1: 1 (0) ack 3001 win 65535

// No matter how it is sent, as long as it does not exceed init_cwnd-reordering each time, the congestion window will not increase. See the tcp_is_cwnd_limited function above for details.
+0 write (4, ..., 1000) = 1000
+.1 <. 1: 1 (0) ack 4001 win 65535

// Send more, what happened? Check it yourself with tcpprobe
+0 write (4, ..., 6000) = 6000
+.1 <. 1: 1 (0) ack 10001 win 65535

// Well, we send 10 segments, which can be confirmed with tcpprobe. After receiving the ACK, the congestion window will increase by 1, which is the effect of slow start!
+0 write (4, ..., 10000) = 10000
+.1 <. 1: 1 (0) ack 20001 win 65535

// It's time to get to the topic. In order to trigger a fast retransmission, we send enough data to send 8 segments at a time. Note that the congestion window at this time is 11!
+0 write (4, ..., 8000) = 8000

// The following is the received SACK sequence. Since I assume you have understood the difference between SACK and FACK through the simple packetdrill script above, here we enable FACK by default!
// The effect of sack 1: 27001-28001 is confirmed. Here, the distance from the ACK field 20001 is 8 segments. If it exceeds reordering 3, retransmission will be triggered immediately.
+.1 <. 1: 1 (0) ack 20001 win 257 <sack 27001: 28001, nop, nop> // ---- (sack 1)
+0 <. 1: 1 (0) ack 20001 win 257 <sack 22001: 23001 27001: 28001, nop, nop> // ---- (sack 2)
+0 <. 1: 1 (0) ack 20001 win 257 <sack 23001: 24001 22001: 23001 27001: 28001, nop, nop> // ---- (sack 3)
+0 <. 1: 1 (0) ack 20001 win 257 <sack 24001: 25001 23001: 24001 22001: 23001 27001: 28001, nop, nop> // ---- (sack 4)

// Received an ACK of 28001. Note that the reordering has been updated to 6. At this time, the ACK will also try to trigger the reordering update, but it is not successful. Why? See the analysis below for details.
+.1 <. 1: 1 (0) ack 28001 win 65535

// As a result of the rapid retransmission / fast recovery described above, the congestion window has dropped to 5, in order to confirm that reordering has been updated, we need to increase the congestion window to 10 or 11
+0 write (4, ..., 5000) = 5000
+.1 <. 1: 1 (0) ack 33001 win 65535

// Since the value of the congestion window is 5 at this time, we continuously write several data equal to the size of the congestion window, and induce the congestion window to increase to 10.
+0 write (4, ..., 5000) = 5000
+.1 <. 1: 1 (0) ack 38001 win 65535

+0 write (4, ..., 5000) = 5000
+.1 <. 1: 1 (0) ack 43001 win 65535

+0 write (4, ..., 5000) = 5000
+.1 <. 1: 1 (0) ack 48001 win 65535

+0 write (4, ..., 5000) = 5000
+.1 <. 1: 1 (0) ack 53001 win 65535

+0 write (4, ..., 5000) = 5000
+.1 <. 1: 1 (0) ack 58001 win 65535

+0 write (4, ..., 5000) = 5000
+.1 <. 1: 1 (0) ack 63001 win 65535

// Ok! At this time, repeat the sequence where the SACK occurred and write 8 segments. Let's see if the same SACK sequence will induce fast retransmission!
+0 write (4, ..., 8000) = 8000

// We construct the same SACK sequence as sack 1/2/3/4 above, but it is not the retransmission that is waiting for us, but ...
// what? Did not trigger a retransmission? This is impossible! You see, the segment 70001-71001 is 8 segments away from 63001. At this time, reordering is updated to 6, 8> 6, which still meets the trigger conditions. Why is it not triggered?
// The answer is that there is a premise that 8> 6 triggers fast retransmission, that is, FACK is turned on, but when reordering is updated, FACK has been disabled, and then the number of SACK segments is counted instead of the highest SACK segment It ’s worth it, the following 4 SACKs just choose to confirm 4 segments, and 4 <6 will not trigger fast retransmission.
+.1 <. 1: 1 (0) ack 63001 win 257 <sack 70001: 71001, nop, nop>
+0 <. 1: 1 (0) ack 63001 win 257 <sack 65001: 66001 70001: 71001, nop, nop>
+0 <. 1: 1 (0) ack 63001 win 257 <sack 67001: 68001 65001: 66001 70001: 71001, nop, nop>
+0 <. 1: 1 (0) ack 63001 win 257 <sack 68001: 69001 67001: 68001 65001: 66001 70001: 71001, nop, nop>

// Here, will this trigger a timeout retransmission? Depends on when packetdrill injects the following ACK
// If no timeout retransmission occurs, the following ACK will update reordering from 6 to 8 again
+.1 <. 1: 1 (0) ack 71001 win 65535

// From here on, the world of God ... 


Then before we look at the results of the capture package, allow me to reiterate, if not for the sake of analyzing the details of the protocol, please use tcpdump screen output directly, do not need to use Wireshark/tshark to show their own tall, we directly look at the output of tcpdump to confirm the details:






If you have mastered the details of TCP, you will say, explode! Exactly what is expected, but for others, what is expected, and how does that happen? What's the details? If you just want to understand the TCP fast retransmission of the principle, to here can not continue to see, in addition to the TCP fast retransmission, I also included the update on the reordering and congestion window changes and other details, is enough. But if you want to know the implementation details of the Linux stack, or if your work is related to the implementation of the protocol stack, keep looking down. I will write down these details, because it is also afraid of my own in the future for a period of time to forget these, after all, the human memory system is nothing more than a cache, not a permanent storage system, it is not always in the replacement operation, and computer, cloud and other systems, people themselves must have a permanent storage system, It used to be paper, now it's a hard disk ...
I think the way to show the details, although I am not too good drawing ability, but has been improving, the illustration is a two-dimensional description, more efficient than the text, more efficient than the diagram is a three-dimensional model, but that is now beyond my ability to scope. Let's first analyze the situation where multiple sack are received for the first time:





It is clear that if you go carefully than the Linux kernel protocol stack implementation, will be more clear, then the same TCP connection after the same sack sequence, the situation is different, there is no trigger fast retransmission, analysis:







What's the end? I still find that the above discussion does not have an inductive conclusion, like an example of parsing, this led me very disappointed! But I still want to summarize the two details about the reordering update:
Reordering update due to 1.SACK
if (SKB not selected confirmation && SKB not been re-transmitted)
{
if (SKB serial number < highest sack serial number)
Update reordering to the highest sack SKB serial number-the current SKB serial number
}

Reordering update due to 2.ACK
if (SKB not selected confirmation && SKB not been re-transmitted)
{
if (SKB serial number < highest sack serial number)
Update reordering to the highest sack SKB serial number-the current SKB serial number
}

and copy and paste, in fact, the above two cases are the same! Whether it is the choice of confirmation or ACK confirmation, as long as the reverse confirmation, it is possible to prove that the network is out of order, just whether it is really disorderly order to have a degree, this degree is measured by the reordering, explosion!
Summarize this Packetdrill script to explain the problem:
1. Congestion window's lifting mechanism. Slow-start behavior and window-down behavior after the initial window is set to 10.
2. Fast retransmission trigger timing for standard sack and Fack.
The update mechanism of the reordering within the 3.TCP connection and its relationship to the fack.
4. The relationship between the various counters that are maintained by TCP internally when triggering a fast retransmission.


Here, this article is also the end, the time is 2016/07/16 10:32, the distance to get up has been over 6 hours, this article is relatively slow, the reason is that the middle of the small do breakfast, and then repair the air conditioning ...
Finally, if you do not know Packetdrill work mechanism, do not panic, if you already understand the Packetdrill work mechanism, do not have to be insolent, these are irrelevant.


Understanding TCP Fast retransmission mechanism by Packetdrill-structured packet sequence


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.