TCP network congestion control

Source: Internet
Author: User

1. Internet Overview

TCP, that is, the transmission control protocol, is currently the most widely used transmission protocol on the network. We know that, the architecture of the entire Internet is based on the connectionless end-to-end packet transmission service provided by the IP protocol. In this architecture, end-to-end data transmission needs to ensure data reliability on its own, TCP provides end-to-end data reliability transmission. Of course, there is no 100% reliability guarantee on the Internet. Because of the contribution of TCP, it has become a standard transmission protocol for the Network since its proposal.

First, let's take a look at how TCP ensures reliable data transmission. TCP marks the number of transmitted data, and the number increases according to the number of bytes, the TCP receiver sends an ACK to the peer after receiving the data. Ack contains a serial number, which indicates that all the data before N has been received, we are looking forward to the arrival of data with serial number n. One fact we must know is that any packet sent from the host to the network may be discarded on the network, data packets are discarded due to factors such as limited router processing capability and Link errors in the network. If Ack is discarded, re-transmission is required. TCP reserves a timer for all outgoing packets. If the timer is reached and the packet is not received, TCP retransmits the packet. TCP uses the validation and timeout retransmission mechanisms to ensure reliable data transmission.

In terms of traffic control, because the data sender and receiver do not necessarily have the same data processing capabilities, TCP uses a traffic control mechanism to avoid sending data too quickly and exceeding the recipient's receiving capabilities, the receiver advertises the sender's receiving window in the TCP packet header, that is, the most data packet that can be received, so that TCP will not overwrite the packet and exceed the recipient's receiving capacity.

It seems that TCP is already perfect. It provides end-to-end data reliability assurance and also considers the receiving capability of the Peer end. In fact, the initial design of TCP is such a mechanism, for more information, see rfc793. It was noted that this document was dated 1981, and TCP began to transmit data on the Internet. In October 1986, one thing enabled TCP to open a new field. The data throughput from LBL to UC Berkeley in the United States dropped from 32 Kbps to 40 BPS. For details, see v. in his paper, "congestion avoidance and control", remember this article and we will mention it multiple times later. Why is the data throughput so seriously reduced? In the past, the TCP Control Mechanism only considered the receiver's acceptance capability, while ignoring a very important aspect, that is, the network's own transmission capability was not taken into account, this causes the entire network to crash. Since then, the research topic of TCP has started to have another direction, that is, congestion control, because the congestion control algorithm plays a very important role in ensuring the stability of the Internet. the paper in Jacob son created a job in the internet congestion control field.

 

Ii. Overview of congestion

What is congestion?

When there are too many packets in the network, the network performance will decrease accordingly, and this phenomenon will become congested. In a copy paper, we will explain the following:

 

For example, when the load is small, the increase in throughput is basically linearly related to the load, and the delay (that is, the vertical coordinate of the second figure: Response Time) is slow, however, when the load exceeds the knee point, the throughput growth is very slow, but the latency increases rapidly. When the load exceeds the Cliff Point, the throughput decreases sharply and the latency increases sharply. The cliff point is the maximum load of the network. Once the network performance exceeds the limit, the overall performance is greatly reduced. When the load is near knee, the network usage efficiency is the highest. In this case, the throughput is high and the response time is fast. The idea of congestion control is that the nodes in the network should take certain measures to ensure that the network load is kept at the knee location as much as possible, so it is necessary to avoid congestion or respond to congestion, it can be restored to the knee location again to maximize the overall performance of the network.

Compared with the TCP traffic control described above, we can find that the traffic control mainly considers the receiving end and does not send too fast, exceeding the recipient's receiving capability, congestion Control takes into account the entire network environment, so that its load cannot exceed the maximum network capacity. Obviously, the cause of congestion is that the "demand" is greater than the "supply". The limited resources in the network are shared and used by multiple users. The network itself cannot limit certain users based on the resource utilization, with the development of the Internet, the number of Internet users and applications also increases. In this way, congestion is inevitable if some measures are not taken to coordinate the use of resources.

In general, the congestion control algorithm includes two aspects: congestion avoidance and congestion control. Congestion avoidance is a mechanism to prevent the network from entering the congestion state, try to keep the network at high throughput and low latency. The corresponding congestion control is the recovery mechanism. Once the network is congested, it needs to be restored from the congestion state and re-enter the high-throughput and Low-latency state. It looks easy, and then things are not as simple as you think.

To see why congestion control is a difficult task, especially to maximize the network utilization during congestion control.

The first is the Internet model. Currently, the Internet uses packet-switched networks, packet Exchange greatly improves the resource utilization of the network (for this, you can see why IP phones are cheap ). However, the packet exchange network makes the entire network distributed, and there is no connection concept in the middle of the network, resulting in incomplete information obtained by each node, however, it is very difficult to complete congestion control with incomplete information.

The second is that the network environment is very complex, and the network performance varies greatly across the Internet. For example, the packet loss rate between China Netcom and China Telecom is very large, and there are bottlenecks in the network, therefore, algorithms must be well adapted to handle packet loss and disordered sorting.

The third is the performance requirements of algorithms. The entire process includes fairness, efficiency, stability, and convergence. Fairness mainly refers to the usage of bandwidth. A single connection cannot occupy most of the bandwidth, and other connections cannot run applications. Efficiency refers to making full use of the bandwidth when the bandwidth is sufficient to avoid bandwidth waste. Stability is to be able to run for a long time, and some performance requirements described above cannot appear after a period of time. Convergence is to quickly respond to the dynamic changes of the network, so as to adjust the entire network to reach a new balance.

The fourth concern is the algorithm overhead. The congestion algorithm must minimize the additional network traffic, especially when the congestion is restored. This requires as few communications as possible between nodes, which makes algorithm design very difficult. At the same time, the algorithm must calculate the complexity of the network node, otherwise it will reduce the network node's ability to process other data packets.

 

Iii. TCP congestion control algorithm

TCP proposes a series of congestion control mechanisms to prevent network congestion. Initially by V. the TCP congestion control proposed by Jacob in his paper in 1988 is composed of "slow start" and "Congestion Avoidance, later, the TCP Reno version added "Fast retransmit" and "Fast Recovery" algorithms, later, the "quick recovery" algorithm was improved in TCP NewReno. In recent years, the selective acknowledgement (sack) algorithm has emerged, and there are other improvements in many aspects, it has become a hot topic in network research.

The main principle of TCP congestion control relies on a congestion window (cwnd) for control. We have discussed earlier that TCP also has a peer notification receiving window (rwnd) for traffic control. The size of the window value indicates the maximum data packet segment that can be sent but has not received the ACK. Obviously, the larger the window, the faster the data transmission speed, however, the more likely it is to cause network congestion. If the window value is 1, it is simplified to a stop protocol. Each time a data is sent, the second data packet can be sent only after confirmation from the other party. Obviously, the data transmission efficiency is low. The TCP congestion control algorithm is to balance the two and select the best cwnd value, so that the network throughput is maximized without congestion.

Due to the need to consider the congestion control and traffic control, the real sending window of TCP is Min (rwnd, cwnd ). However, rwnd is determined by the peer and has no impact on the network environment. Therefore, when we consider congestion, we generally do not consider the value of rwnd, for the moment, we will only discuss how to determine the size of the cwnd value. The unit of cwnd is in bytes in TCP. We assume that each transmission of TCP sends data according to the size of MSs, therefore, you can think that cwnd is a unit based on the number of data packets, so sometimes we say that increasing cwnd by 1 is equivalent to increasing the size of one MSS by the number of bytes.

Slow start: the initial TCP will send a large number of data packets to the network after the connection is established successfully. This will easily cause the router cache space to run out in the network, resulting in congestion. Therefore, the newly established connection cannot send a large number of data packets at the beginning, but can only gradually increase the data volume sent each time according to network conditions to avoid the above phenomenon. Specifically, when a new connection is established, cwnd is initialized to the maximum message segment (MSS) size. The sender starts to send data according to the congestion window size. Each time a message segment is confirmed, cwnd increases the size of one MSS. In this way, the value of cwnd increases exponentially with the round-trip time (RTT). In fact, the slow start speed is not slow, but its starting point is a little lower. We can simply calculate the following:

Start ---> cwnd = 1

After 1 RTT ---> cwnd = 2*1 = 2

After 2 RTTs ---> cwnd = 2*2 = 4

After 3 RTTs ---> cwnd = 4*2 = 8

If the bandwidth is W, the bandwidth can be fully occupied after the RTT * log2w time.

Congestion avoidance: As you can see from the slow start, cwnd can grow rapidly to maximize the use of network bandwidth resources. However, cwnd cannot continue to grow infinitely and must be limited. TCP uses a variable named slow start threshold (ssthresh). When cwnd exceeds this value, the slow start process ends and enters the congestion avoidance stage. For most TCP implementations, The ssthresh value is 65536 (also calculated in bytes ). The main idea to avoid congestion is to increase addition, that is, the value of cwnd does not rise exponentially and begins to increase addition. In this case, when all the packets in the window are confirmed, the value of cwnd is increased by 1 and the value of cwnd increases linearly with the RTT, so as to avoid network congestion caused by excessive growth, gradually increase and adjust to the optimal value of the network.

The two mechanisms discussed above are not able to detect the behavior of congestion. How can we adjust the behavior when the cwnd is found to be congested?

First, let's take a look at how TCP determines that the network is in a congested state. TCP believes that the main reason for network congestion is that it retransmits a packet segment. As mentioned above, TCP has a timer for each packet segment, called RTO. When RTO times out and data has not been confirmed, TCP will re-transmit the packet segment. When timeout occurs, congestion may occur, and a packet segment may be lost somewhere in the network, in addition, there is no message in the subsequent message segment. In this case, the TCP response is "strong ":

1. Reduce ssthresh to half of the cwnd Value

2. Reset cwnd to 1.

3. Restart the slow startup process.

In general, the principle of TCP congestion control window change is the AIMD principle, that is, increasing addition and decreasing multiplication. It can be seen that this principle of TCP can better ensure the fairness between streams, because once packet loss occurs, it will be halved immediately to avoid it, and leave enough space for other new streams, this ensures the overall fairness.

In fact, another case of TCP will be re-transmitted: it is to receive three identical ack. TCP sends ack immediately when it receives packets in disordered order. TCP uses three identical ack to determine the packet loss. At this time, it performs fast retransmission. The following are the tasks of fast retransmission:

1. Set ssthresh to half of cwnd.

2. Set cwnd to the value of ssthresh (some implementations are ssthresh + 3)

3. Enter the congestion avoidance phase again.

Later, the "Fast Recovery" algorithm was added after the "Fast retransmission" algorithm. When three duplicate ACK packets were received, TCP did not enter the congestion avoidance phase, it is the quick recovery phase. Fast retransmission and fast recovery algorithms are generally used at the same time. The idea of rapid recovery is the "data packet conservation" principle, that is, the number of data packets in the network at the same time is constant, only when the "old" data packet leaves the network, to send a "new" packet to the network. If the sender receives a duplicate ACK, the tcp ack Mechanism indicates that a packet has left the network, so cwnd adds 1. If we can strictly follow this principle, there will be very few congestion in the network. In fact, the purpose of congestion control is to correct the violation of this principle.

Specifically, the main steps for quick recovery are:

1. when three duplicate ACK packets are received, set ssthresh to half of cwnd, add 3 to the value of ssthresh for cwnd, and re-transmit the lost packet segment, the reason for adding 3 is that three duplicate ACK packets are received, indicating that three "old" packets have left the network.

2. When a duplicate Ack is received, the congestion window is increased by 1.

3. When receiving the ACK of the new data packet, set cwnd to the value of ssthresh in step 1. The reason is that the ACK confirms the new data, indicating that the data from the duplicate ack has been received, and the recovery process has ended. You can return to the previous status, that is, it enters the congestion avoidance status again.

The fast retransmission algorithm first appeared in the Tahoe version of 4.3bsd, and quickly restored the Reno version of 4.3bsd for the first time. It is also called the Reno TCP congestion control algorithm.

It can be seen that the Reno fast retransmission algorithm is applicable to the retransmission of a packet. However, in reality, a Retransmission timeout may cause the retransmission of many data packets, therefore, when multiple data packets are lost from one data window and the algorithm for fast retransmission and quick recovery is triggered, the problem arises. Therefore, NewReno appears. It is slightly modified based on the rapid recovery of Reno to restore the loss of multiple packages in a window. Specifically, when a new data Ack is received, Reno exits the quick recovery status, newReno needs to receive the confirmation of all the packets in the window before exiting the quick recovery status, thus further improving the throughput.

Sack is to change the TCP validation mechanism. The initial TCP only confirms the data received continuously, and the sack will tell the other party all the information such as out-of-order information, thus reducing the blindness of Data sender retransmission. For example, if data from numbers 1, 2, 3, 5, and 7 is received, the normal ack will only confirm the serial number 4, sack will inform the Peer of the information received in the sack option to improve performance. When sack is used, the NewReno algorithm may not be used, because the information carried by the sack itself enables the sender to have enough information to know which packets need to be re-transmitted, rather than which packets need to be re-transmitted.

For more information, see V. Jacob's paper rfc2001, rfc2018, rfc2581, rfc2582, and rfc2883.

 

4. other TCP congestion control algorithms

In 1994, brakmo proposed a new congestion control mechanism, TCP Vegas, to implement congestion control from another perspective. We can see from the above that TCP congestion control is based on packet loss. Once packet loss occurs, the congestion window is adjusted. However, because packet loss is not necessarily caused by network congestion, however, because the RTT value is closely related to the network operation, TCP Vegas uses the RTT value to determine whether the network is congested and adjust the congestion control window. If it is found that the RTT is increasing, Vegas considers the network to be congested, and then begins to reduce the congestion window. If the RTT becomes smaller, Vegas considers the network congestion to be gradually removed, and adds the congestion window again. Because Vegas uses RTT changes instead of packet loss to determine the available bandwidth of the network, it can more accurately detect the available bandwidth of the network and improve efficiency. However, Vegas has a defect, which can be said to be fatal. The final impact on TCP Vegas is not widely used on the Internet. This problem is that the bandwidth competitiveness of the stream using TCP Vegas is lower than that of the stream without using TCP Vegas. This is because as long as the data is buffered by the router in the network, the RTT will become larger, if the buffer zone does not overflow, there will be no congestion, but the cache data will lead to processing latency, resulting in increased RTT, especially on a network with a small bandwidth, As long as data is transmitted at the beginning, the RTT will increase sharply, which is especially obvious in wireless networks. In this case, TCP Vegas lowers its congestion window, but as long as there is no packet loss, the standard TCP will not lower its window from the above, so the two are unfair, in this way, the efficiency of TCP Vegas is very low. In fact, if all TCP uses the Vegas congestion control mode, the fairness between the streams will be better, and the competition capability is not a problem of the Vegas algorithm itself.

In addition, we will introduce limited transmit. This algorithm is efficient when multiple packets are lost in a transmission window when the congestion window is small. As mentioned earlier, TCP has a fast recovery mechanism, and the premise of fast recovery is to receive three duplicate ack. However, the recipient needs to send the duplicate ack before it can be triggered. TCP will immediately send a duplicate ack to the sender each time it receives an unordered packet. What happens when the congestion window is small? The sender and receiver enter a phase of mutual wait. When the receiver waits for another packet to receive the packet, a duplicate ack occurs, while the sender waits for 3rd duplicate ack. If the window is small, for example, 3, if the first packet is lost at this time, the receiver sends duplicate ack to the second and third packets respectively, and two duplicate ACK packets exist in total. At this time, the sender cannot send data due to the window relationship, at this time, the two sides enter the mutual wait until the sender's Retransmission timeout timer reaches, the deadlock can be broken. Obviously, if so, the efficiency will be significantly reduced, because the Retransmission timeout value is set to RTT + 4 × rttvar, this value is generally relatively large.

Limited transmit is used to solve this problem. The method is simple, that is, when two duplicate ACK packets are received, two conditions are detected:

1) Does the receiver's announcement window rwnd allow new data packets to be transmitted, that is, does it meet rwnd> cwnd?

2) is the number of data packets staying in the network less than or equal to cwnd + 2?

If both conditions are met, TCP then sends new data packets. In fact, the second condition means that in this case, two data packets can be sent at most beyond the congestion window. Assuming that the new data packets and the corresponding ack are not lost, the two new data packets are available, so that both parties can immediately recover from the deadlock, and the sender then enters the standard quick recovery. Note that although two new data packets can be sent, the value of cwnd should remain unchanged, rather than increasing it by 2. Obviously, the limited transmit algorithm is more robust than timeout retransmission in packet out-of-order.

In addition, due to the initial TCP protocol design, it is generally assumed that the network is rarely out of order, but with the increase of the Internet out of order (two articles have discussed in detail: packet Reordering is not pathological network behavior, measurement and classification of out-of-sequence packets in a tier-1 IP backbone). TCP mistakenly identifies packet loss as a result of disorderly sorting, this reduces the rate of occurrence and affects performance. To address this problem, see the new improved algorithm (on making TCP more robust to packet reordering.

In addition, there is an eiel algorithm. For more information, see rfc3522 and rfc4015. The Eiffel algorithm is mainly used by the TCP sender to better distinguish pseudo-retransmission. the Eifel algorithm uses the TCP timestamp option.

Due to the importance of network congestion control, there are many researches and improvements on TCP congestion control. For standard TCP congestion control, it is time to come here.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.