To improve network data transmission efficiency and reduce page loading time, we performed network link optimization a year ago. We mainly modified Linux kernel parameters such as congestion window and slow start according to network conditions and specific applications, someone asked me two days ago and I will explain it again here.
TCP proposes a series of congestion control mechanisms to prevent network congestion. Initially by V. the TCP Congestion control proposed by Jacob in his paper in 1988 is composed of "Slow start" and "Congestion avoidance, later, the TCP Reno version added "Fast retransmit" and "Fast Recovery" algorithms, later, the "quick recovery" algorithm was improved in TCP NewReno. In recent years, the selective acknowledgement (SACK) algorithm has emerged, and there are other improvements in many aspects, it has become a hot topic in network research.
The main principle of TCP congestion control relies on a congestion window (cwnd) for control. We have discussed earlier that TCP also has a peer notification receiving window (rwnd) for traffic control. The size of the window value indicates the maximum data packet segment that can be sent but has not received the ACK. Obviously, the larger the window, the faster the data transmission speed, however, the more likely it is to cause network congestion. If the window value is 1, it is simplified to a stop protocol. Each time a data is sent, the second data packet can be sent only after confirmation from the other party. Obviously, the data transmission efficiency is low. The TCP congestion control algorithm is to balance the two and select the best cwnd value, so that the network throughput is maximized without congestion.
Due to the need to consider the congestion control and traffic control, the real sending window of TCP is min (rwnd, cwnd ). However, rwnd is determined by the peer and has no impact on the network environment. Therefore, when we consider congestion, we generally do not consider the value of rwnd, for the moment, we will only discuss how to determine the size of the cwnd value. The unit of cwnd is in bytes in TCP. We assume that each transmission of TCP sends data according to the size of MSS, therefore, you can think that cwnd is a unit based on the number of data packets, so sometimes we say that increasing cwnd by 1 is equivalent to increasing the size of one MSS by the number of bytes.
Slow start: the initial TCP will send a large number of data packets to the network after the connection is established successfully. This will easily cause the router cache space to run out in the network, resulting in congestion. Therefore, the newly established connection cannot send a large number of data packets at the beginning, but can only gradually increase the data volume sent each time according to network conditions to avoid the above phenomenon. Specifically, when a new connection is established, cwnd is initialized to the maximum message segment (MSS) size. The sender starts to send data according to the congestion window size. Each time a message segment is confirmed, cwnd increases the size of one MSS. In this way, the value of cwnd increases exponentially with the Round-Trip Time (RTT). In fact, the slow start speed is not slow, but its starting point is a little lower. We can simply calculate the following:
Start-> cwnd = 1
After 1 RTT-> cwnd = 2*1 = 2
After 2 RTTs-> cwnd = 2*2 = 4
After 3 RTTs-> cwnd = 4*2 = 8
If the bandwidth is W, the bandwidth can be fully occupied after the RTT * log2W time.
Congestion avoidance: As you can see from the slow start, cwnd can grow rapidly to maximize the use of network bandwidth resources. However, cwnd cannot continue to grow infinitely and must be limited. TCP uses a variable named slow start threshold (ssthresh). When cwnd exceeds this value, the slow start process ends and enters the congestion avoidance stage. For most TCP implementations, The ssthresh value is 65536 (also calculated in bytes ). The main idea to avoid congestion is to increase addition, that is, the value of cwnd does not rise exponentially and begins to increase addition. In this case, when all the packets in the window are confirmed, the value of cwnd is increased by 1 and the value of cwnd increases linearly with the RTT, so as to avoid network congestion caused by excessive growth, gradually increase and adjust to the optimal value of the network.
The two mechanisms discussed above are not able to detect the behavior of congestion. How can we adjust the behavior when the cwnd is found to be congested?
First, let's take a look at how TCP determines that the network is in a congested state. TCP believes that the main reason for network congestion is that it retransmits a packet segment. As mentioned above, TCP has a timer for each packet segment, called RTO. When RTO times out and data has not been confirmed, TCP will re-transmit the packet segment. When timeout occurs, congestion may occur, and a packet segment may be lost somewhere in the network, in addition, there is no message in the subsequent message segment. In this case, the TCP response is "strong ":
1. Reduce ssthresh to half of the cwnd Value
2. Reset cwnd to 1.
3. Restart the slow startup process.
In general, the principle of TCP congestion control window change is the AIMD principle, that is, increasing addition and decreasing multiplication. It can be seen that this principle of TCP can better ensure the fairness between streams, because once packet loss occurs, it will be halved immediately to avoid it, and leave enough space for other new streams, this ensures the overall fairness.
In fact, another case of TCP will be re-transmitted: it is to receive three identical ACK. TCP sends ACK immediately when it receives packets in disordered order. TCP uses three identical ACK to determine the packet loss. At this time, it performs fast retransmission. The following are the tasks of fast retransmission:
1. Set ssthresh to half of cwnd.
2. Set cwnd to the value of ssthresh (some implementations are ssthresh + 3)
3. Enter the congestion avoidance phase again.
Later, the "Fast Recovery" algorithm was added after the "Fast retransmission" algorithm. When three duplicate ACK packets were received, TCP did not enter the congestion avoidance phase, it is the quick recovery phase. Fast retransmission and fast recovery algorithms are generally used at the same time. The idea of rapid recovery is the "data packet conservation" principle, that is, the number of data packets in the network at the same time is constant, only when the "old" data packet leaves the network, to send a "new" packet to the network. If the sender receives a duplicate ACK, the tcp ack Mechanism indicates that a packet has left the network, so cwnd adds 1. If we can strictly follow this principle, there will be very few congestion in the network. In fact, the purpose of congestion control is to correct the violation of this principle.
Specifically, the main steps for quick recovery are:
1. when three duplicate ACK packets are received, set ssthresh to half of cwnd, add 3 to the value of ssthresh for cwnd, and re-transmit the lost packet segment, the reason for adding 3 is that three duplicate ACK packets are received, indicating that three "old" packets have left the network.
2. When a duplicate ACK is received, the congestion window is increased by 1.
3. When receiving the ACK of the new data packet, set cwnd to the value of ssthresh in step 1. The reason is that the ACK confirms the new data, indicating that the data from the duplicate ACK has been received, and the recovery process has ended. You can return to the previous status, that is, it enters the congestion avoidance status again.
The fast retransmission algorithm first appeared in the Tahoe version of 4.3BSD, and quickly restored the Reno version of 4.3BSD for the first time. It is also called the Reno TCP congestion control algorithm.
It can be seen that the Reno fast retransmission algorithm is applicable to the retransmission of a packet. However, in reality, a Retransmission timeout may cause the retransmission of many data packets, therefore, when multiple data packets are lost from one data window and the algorithm for fast retransmission and quick recovery is triggered, the problem arises. Therefore, NewReno appears. It is slightly modified based on the rapid recovery of Reno to restore the loss of multiple packages in a window. Specifically, when a new data ACK is received, Reno exits the quick recovery status, newReno needs to receive the confirmation of all the packets in the window before exiting the quick recovery status, thus further improving the throughput.
SACK is to change the TCP validation mechanism. The initial TCP only confirms the data received continuously, and the SACK will tell the other party all the information such as out-of-order information, thus reducing the blindness of Data sender retransmission. For example, if data from numbers 1, 2, 3, 5, and 7 is received, the normal ACK will only confirm the serial number 4, SACK will inform the Peer of the information received in the SACK option to improve performance. When SACK is used, the NewReno algorithm may not be used, because the information carried by the SACK itself enables the sender to have enough information to know which packets need to be re-transmitted, rather than which packets need to be re-transmitted.