TCP is a continuous trickle or rolling of the Yangtze River, but this is just the ideal situation! After a lot of intermediary network devices, eventually a TCP stream arrives at the receiving end, it will probably no longer maintain a stream form, and become bursts of bursts ... These bursts of ACK, in turn, feed back to the sending side, and thus affect the transmit timing of the sender, that is to say, shaping the data stream on the sending side, which is a typical turbo feedback system that is not normally considered as uncontrolled or another extreme, just end-to-end! It is not so difficult to harness it, and if someone says it can only be mastered by feeling, it is better to say that it is 1+1=2 and orderly. If you feel that TCP behavior is difficult to understand, it may only be because you are trying to break the rules of the network, the rules of the network are only two points: efficiency and fairness. Really, killing a person is easy, but expensive, and people get along, it is difficult, but can be a long life!
This article describes how the network's influence on the TCP transmitter sequence is reflected in the message sequences, using Wireshark.
1. How to Wireshark the size of TCP congestion window Wireshark is a powerful tool for network protocol analysis, and it can also be used to analyze "not reflected in the packet" information, such as TCP congestion window size.
TCP Congestion window is not reflected in the packet itself, it is the end-to-end use of network feedback information through the protocol stack itself, such feedback includes the ACK sequence and timeout events. So how to analyze the congestion window size of TCP by the packet sequence of a TCP stream shown by Wireshark.
In fact it is very simple, as long as the number of in flight can be calculated. The TCP Send window equals the minimum value in the Congestion window (CWnd) and the Peer Advertisement window (Awnd):
wnd=min (Awnd,cwnd)
If CWnd is larger than Awnd, then it can not pay attention to CWnd, so it has no impact on data transmission (TCP comes with congestion window throttling mechanism, so that it is not too large, this is to stop the burst), so only to focus on CWnd less than Awnd situation, at this time:
Wnd=cwnd
We know that after TCP forms a continuous ACK clock, the send is flat, so-called flat refers to the sending behavior based on the data on the network Baoshou principle, is confirmed one, sends one, therefore:
cwnd=in flight+tosend
Tosend is the number of packets we are going to send, according to the data Baoshou, it can be considered that tosend is the amount of data currently being ACK, counted according to the number of segments, if the receiver does not enable delay ACK, then each time will be ack one segment, that is:
cwnd=in flight+1
When you look at the formula above, temporarily forget the sudden change in the congestion window, which will be said later. Next, if the delay ACK is enabled on the side, the maximum ACK is 2 segments at a time, i.e.:
cwnd=in Flight+max (thisacked,2)
For a moment, forget about the missing ACK, the general possibility of a dispute between the number of ACK bytes on the TCP sender or the number of ACK acknowledgments, and I personally prefer this option to TCP itself, which takes into account the forward-backward asymmetry of bandwidth in expensive wireless links. Then considering the congestion avoidance phase, we can consider:
cwnd=cwnd+ (1/cwnd|1)
If in the slow start phase, then:
cwnd=cwnd+1
So, eventually we get the size of the congested window:
cwnd=in Flight+max (thisacked,2) + (1/cwnd|1)
The question now is how to find the size of the in flight. Very simple, the formula is as follows:
In
flight= currently sent to-the last confirmed
We confirm with a practical capture result that Wireshark can analyze most of the in flight messages for you, as long as it accurately confirms two values: the serial number currently being sent to and the last confirmed sequence number. So you don't have to go by yourself to calculate it yourself, but you can see it directly through Wireshark. I'll show you a confirmation package:
Then take a look at the serial number of the packet sent so far:
Finally we calculate the number of in flight and the corresponding value of the Wireshark display:
They are exactly the same!
Sometimes, you may find that clicking on a packet in Wireshark does not show the value of in flight because some packets are not caught and there is no ACK packet between the fetched packets and the current packet, so it is insufficient to provide the above calculation in Flight the required elements of the value, so it will not be calculated for you, not without, but the loss of information, calculation is not.
But sometimes, some people are completely superstitious about the results of wireshark (in fact, including myself), which has caused a regrettable and tragic result. What the hell is going on here? And look at the next section.
2. Where is the excess packet flight? I made a test by adding the following command to the TCP sender to simulate a 300MS packet delay:
tc Qdisc Add dev eth0 root netem delay 300ms
Then the same is done on the TCP send side tcpdump grab packet, and then use Wireshark to parse.
Let's look at the scenario where the No. 3064 ACK packet arrives at the sending end:
Before the ACK packet is received, the last packet sent by the sending side is the No. 3063 packet, and the size of in flight can be seen from the No. 3063 package (see previous calculation method), in the packet shows in flight only less than 5 packets, and this is almost impossible, I just simulated a 300ms delay on the gigabit network. At this point, the synchronization with Tcpprobe captures information about the kernel stack, the comparison of which is as follows:
In order to show the value of the in flight of the protocol stack, I modified the tcpprobe and added the statistics of the Tcp_packets_in_flight (TP). Can see 732*1460 this size and Wireshark show 5840 difference very far! Where are the missing data flight? According to the snd_nxt shown on the same line in Tcpprobe, we find the relevant information in Wireshark:
There are 570 packets between package No. 3633 and Pack No. 3063, plus the 4 packets that have been counted in No. 3063, and 574 to 575 packets, and Tcpprobe shows 732 packets in flight, where are those missing? This is because tcpdump in the clutch, due to the efficiency of the packet socket, buffer full, was kernel discarded. This is what the "142 packets dropped by kernel" means when the bag is captured.
To know the reason clearly, you need to understand the location of the clutch. Tcpprobe shows the scenario on the TCP stack, and the tcpdump captures the actual NIC boundary, separated by a "qdisc" logic, that is, queue management. That is, TCP does send out 732 pieces of data, so it will think that it has been in flight, but the data did not reach the network, but arrived in the Qdisc queue, considering the gigabit network of the same network segment simulation, the basic can ignore the transmission delay, So the so-called in flight of the tcpdump crawl is just qdisc back to the receiving end of in flight. If the end-to-end point of view, Qdisc is indeed part of the sky, but for tcpdump attached to the network card, Qdisc is just an island, I think this is the difference:
The above question should be explained. Then I'm going to start with a little bit of theoretical stuff.
3. Plug a Little theory: Congestion window Automatic speed adjustment The TCP sender sends the data based on the feedback signal that the ACK brings:
1). End-to-end data Baoshou feedback in the receiving end of the notification window to allow the implementation of the number of Jushou constant policy, the receiver to confirm how much data to send more data, recorded as E.
2). The network congestion status feedback increases the congestion window by increasing the rate at which the ACK is confirmed, the RTT calculated in its field, and the amount of data being confirmed, and performs additive multiplicative reduction logic to ensure convergence. Send the data in the notification window to the maximum extent possible within the congestion window, and write it as W.
In the above two types of feedback signals to allow the amount of data sent in E and W to take the minimum value as the sending amount, so that both can meet the limitations.
It should be remembered that the amount of data sent is not determined by the congestion window, but to the end of the notification window, the decision by the ACK clock flow feedback to the sender, received an ACK after the execution of data Baoshou, the output of the ACK of the data, the sender can theoretically also send the advertisement window size data, But to ask whether the network is allowed to send so much, this is the role of congestion window, its increase or decrease is a separate process! The only point of interaction with other logic is that its value minus in flight is the amount of data that can also be sent, when it is greater than the maximum allowable burst, the congestion window will no longer grow, or that sentence, congestion window can not burst growth, this is a feedback system, only the feedback signal can induce window growth, All congestion control algorithms are guaranteed to increase the window is additive, that is, slow, one is the convergence of the linear control system requirements, and secondly, it can let the congestion control system capture burst, thereby prohibiting the further growth of congestion window to aggravate the burst, resulting in network congestion. The feedback signal here is the two kinds of feedback signals that are described at the beginning of this section.
If you have read the code implemented by the TCP protocol of Linux, I think you should know that after receiving an ACK, the amount of data that can be sent is greater than a tolerable burst will not perform congestion avoidance logic, because the window is slowly increased, once added to more than a sudden rhythm, will be pulled back immediately, pull back in The location of the flight. If you haven't read it, just look at the RfC, and then ... Forget it, please remember this conclusion. I wanted to put this feedback system in accordance with the theory of cybernetics to draw a picture, but still feel superfluous, or directly look at the packet bar. And look at the next section.
4. Sending and sending the array before I describe the problem, I will give you a view of the Wireshark "statistical-tcp flow Graph-time series (tcptrace)" graph. I don't recommend looking at Stenves, because it's not full of information. In the Tcptrace diagram, we can see three lines:
1). To the end of the receiving end of the window accommodates the highest serial number line, ladder-shaped.
2). The serial number and length line emitted by the sender are typically "work"-shaped.
3). The serial number line of the receiver ack, ladder-shaped
As a simple description of the above three types of lines:
According to the Axis icon and explanation, I think it can be explained everything, all the situation, draw a line perpendicular to the horizontal plane, is a useful line, it is representative of the window. Next, prepare some environment and data. Grab a TCP stream, use Wireshark to see, click "Statistics-tcp flow Graph-time series (Tcptrace)", you can see a picture, far look, it looks like this:
But after zooming in, it might look like this:
Or it's like this:
What's going on here? Normally, if a TCP stream reaches a "balanced" position by conventional means, then the TCP data and the ACK sequence generated by the data form a closed ring, which should be the ACK that arrives at the equidistant interval to trigger a smooth data send, but why not? What caused the difference in the above two pictures? Let's continue to enlarge the graph. For that array of sent sequences, we amplify the part of the sending data:
Then combine the part that does not send the data:
Did you see anything? In the part where the data is sent, the ACK is continuous, and the ACK does not arrive at the moment the data is not sent! Because the TCP send is ACK-driven, we can assume that it is the ACK that caused the array to send the data! So why is the ACK going to arrive? In the sending side repeatedly adhere to the premise of gentle pacing sent, the reason is definitely the intermediate device block the packet sequence, and then batch release, resulting in batches of packets to the receiving end, and then produce an ACK in batches, resulting in a batch trigger new data, thus repeated cycle!
Specifically, you can look at the details of traffic shaping, which is no longer described in this article. The simplest is to set a delay, because TCP is started by slow start, so the first data volume is only the initial window size of a small segment, and then wait for this data ack, and then send another segment, this pattern will be maintained by the unified delay, even if the data is already full of network, Will keep going. In fact, because the unified delay is also a time wall, so just set a unified delay, and can not really simulate "long fat pipeline", although the long-fat pipeline of data packets is also a unified delay, but these delay is the time to expand, TC set the unified delay is the time block, so, set a unified delay, It's not like a long fat pipe, except you'll see a big increase in the window.
5. How to look at the network congestion from the RTT below we look at the impact of RTT on data transmission, this theory has been rotten street, so I do not want to repeat. The conclusions are as follows:
1). The RTT will not slow down (linear), but the steep drop (the number of points), which is the essence of network queuing decision;
2). Is the rate of change of the RTT rather than the RTT itself describes the situation of the network, the RTT may be intrinsic and needs to be concerned with its rate of change;
3). Observing the change rate of RTT should start with the following two aspects:
A. Derivative of the change rate, even the second derivative, find the inflection point, predict the queue emptying and continue to queue. When the event occurs, TCP receives feedback scared is too late to react;
When the B.rtt is flat, noise drops can be filtered out, and if the signal can be captured by the TCP sender, it will not need to be re-transmitted by the window.
Here's a comparison of the time/serial number diagram and the rtt/serial number from Wireshark:
6.ACK lost situation Finally, let's see how the ACK is lost. It is often hard to imagine that an ACK will be lost, or the consequences of an ACK being lost, and people are always just focusing on what is being done, such as the sending of TCP data. The ACK is likely to be lost, and the ACK is not ack, so it is not re-transmitted, and the TCP receiver remembers where it was received, so it can give the correct ACK at any time. The question is what this means for the sending side.
ACK is the clock of the sending side!
If a continuous ACK is lost, an ACK confirms a large chunk of the data scene, because the front ack continuous loss, the sender has not received the clock feedback for a long time to cause the data can not be sent, but in fact, like a burst token bucket, which need to send but not send the data may be accumulated, Once you receive a ack,tcp that confirms a lot of data, you want this ACK as an accumulation token to compensate for the empty window period caused by the ACK loss, but doing so is problematic and it can cause a burst. Therefore, TCP to the choice of this policy to the user, TCP has two choices, one is based on the amount of ACK confirmed data feedback, one is based on the number of simple ACK packets to feedback, the previous choice is more granular, but the latter is more realistic feedback network situation.
It is worth noting that the entire Internet has a considerable percentage of packets is a simple bare ACK packet, as a control signaling, it has been as the data itself in the same aspect of the Internet in the various links.
Impact of traffic shaping, latency, and ACK loss on timing of TCP sends