Principle of TCP/IP reliability Sliding Window congestion window and TCP Window

Source: Internet
Author: User
Tags ftp commands

Principle of TCP/IP reliability Sliding Window congestion window and TCP Window

TCP and UDP are on the same layer-transport layer, but the most difference between TCP and UDP is that TCP provides a reliable data transmission service, which is connection-oriented, that is, the two hosts that use TCP communication first need to go through a "call" process, wait until the communication preparation is complete before data transmission and end the call. Therefore, TCP is much more reliable than UDP. UDP sends data directly, and no matter whether the recipient is receiving the message, even if UDP cannot be delivered, it will not generate ICMP error packets, this was reiterated many times.

Extract the simple working principle of TCP to ensure reliability as follows:

  • Application Data is divided into data blocks that TCP considers to be the most suitable for sending. This is completely different from UDP, and the length of the datagram generated by the application will remain unchanged. The unit of information transmitted to an IP address by TCP is a segment or segment (segment) (see Figure 1-7 ). In section 1 8.4, we will see how TCP determines the length of the packet segment.
  • When TCP sends a segment, it starts a timer and waits for the destination to confirm receiving the segment. If a confirmation message cannot be received in time, the message segment will be resold. In chapter 2, we will understand the self-adaptive timeout and retransmission policies of TCP.
  • When TCP receives data from the other end of the TCP connection, it sends a confirmation message. This confirmation is not sent immediately and will usually be postponed by a few minutes, which will be discussed in section 1 9.3.
  • TCP will maintain its header and data validation. This is an end-to-end test to detect any changes in data during transmission. If verification and error are received, t p discards the packet segment and does not confirm receipt of the packet segment (meaning that the initiator times out and resends the packet segment ).
  • Since the TCP packet segment is transmitted as an IP datagram, the arrival of the IP datagram may be out of order, so the arrival of the TCP packet segment may also be out of order. If necessary, TCP sorts the received data again and delivers the received data to the application layer in the correct order.
  • TCP can also provide traffic control. Each side of a TCP connection has a fixed buffer space. The TCP receiving end only allows the other end to send data that can be accepted by the receiving end buffer. This will prevent the buffer overflow of the slow host caused by the fast host.

From this section, we can see that the way to maintain reliability in TCP is timeout and re-transmission, which makes sense, although TCP can also use a variety of ICMP packets to process these, however, this is not reliable,The most reliable way is to resend the datagram as long as it is not confirmed until it is confirmed by the other party.

The same as the UDP header, the TCP Header has the sending port number and the receiving port number. However, the TCP header information is obviously more than that of UDP. As you can see, TCP provides all the necessary information required for sending and confirming. This is detailed in the P171-173. It can be imagined that the process of sending TCP data should be as follows.

  • Establish connection between the two parties
  • The sender sends the TCP datagram to the receiver and waits for the Peer to confirm the TCP datagram. If no, the sender resends the datagram. If yes, the sender sends the next datagram.
  • The receiver waits for the sender's datagram. If the received datagram is correct and verified, the receiver sends an ACK (confirmed) datagram and waits for the next TCP datagram to arrive. Wait until FIN is received (send complete datagram)
  • Abort connection

To establish a TCP connection, the system may establish a new process (the worst is also a thread) for data transmission.


TCP is a connection-oriented protocol. Therefore, a connection must be established before both parties send data. This is totally different from the preceding protocol. All the Protocols mentioned above only send data. Most of them do not care whether the sent data is sent or not, especially UDP. From a programming perspective, UDP programming is also much simpler-UDP does not need to consider data sharding.

TCP data streams can be divided into two types: Interactive Data streams and block data streams. Interactive Data streams are the data streams that send control commands, such as relogin, telnet, and ftp commands. block data streams are the packages used to send data. Most TCP packets on the network are such packages.

Obviously, TCP has different efficiency when transmitting these two types of packets. Therefore, to improve the transmission efficiency of TCP, we should adopt different algorithms for these two types of packets.

In short, the principle of TCP transmission is to minimize the number of small group transfers.

Interactive Data Stream over TCP

? Validation technology that is subject to latency

TCP interactive data streams generally use the latency validation technology. Generally, when the Server receives data sent from the Client, it does not immediately send ACK. Instead, it waits for a short period of time to see if the local machine has data to be fed back to the Client. If yes, the data is included in this ACK package and previously sent to the Client. Generally, the latency is 200 ms. Note that when the MS timer is compared with the kernel clock, it is jeffs. After a Data Group is added, the timer has passed 100 ms, and then 100 ms ACK will be sent. If there is data to be fed back within ms, after ms, the ACK will be sent together with the data.

? Analyze the Nagle algorithm.

The Nagle algorithm is mainly used to prevent the generation of small groups. On the wide area network, a large number of TCP small groups are very likely to cause network congestion.

For each TCP connection in the Nagle hour. It requires that a TCP connection can have at most one unconfirmed small group. Other small groups cannot be sent before the confirmation of the change group arrives. TCP will collect these small groups, and then combine and send the small points just collected after the small group is confirmed.

Sometimes we have to disable the Nagle algorithm, especially in some interactive operation environments with high latency requirements, all small groups must be sent out as soon as possible.

We can program to cancel the Nagle algorithm and use the TCP_NODELAY option to disable the Nagle algorithm.


TCP block data stream

There are many things related to TCP block data streams, such as traffic control, emergency data transmission, and data window size adjustment.

? Normal Data Flow

TCP usually does not confirm each arriving data segment. Generally, one ACK packet can confirm multiple block data segments. Generally, two block data segments need one ACK packet to confirm. It is usually caused by the following Original: After receiving a packet, the TCP connection is marked with no unfinished delay confirmation. After receiving a data packet again, this connection has two unconfirmed packet segments. TCP immediately sends an ACK. When the third data packet arrives, the TCP connection has a ms delay before the fourth packet arrives, therefore, an ACK is sent, and this cycle repeats, so that an ACK is used to confirm two data packets. Of course, the generation of ACK is closely related to the time when it receives data packets, that is, the frequency when the Client sends data, and the network congestion, it is related to the processing capabilities of the Client and the Server. It is always determined by multiple factors.


? TCP Sliding Window Protocol

TCP uses the Sliding Window Protocol for traffic control. Note that a sliding window is an abstract concept. It targets every TCP connection and has a direction. a TCP connection should have two sliding windows, each data transmission direction has one, not for each end of the connection.

Sliding along the left side of the window is called Window aggregation, indicating that the sender has sent data or received confirmation. sliding along the right side of the window is called window opening, indicates that the data has been received by the user space process and the cache has been released. If the left side of the window is moved to the left side, the ACK is a duplicate ACK and should be discarded. If the right side of the window is moved to the left side, it is called a window contraction, generally, no one will do this.

When the left and right sides overlap, the window size is 0. At this time, the sender should not send data because the receiving buffer of the receiver is full and the user process has not yet received the data. After the user process receives the ACK, the receiver should send an ACK, indicating that the receiving window has been restored, and the ACK sequence number is the same as that of the previous win 0 ACK.

Similarly, in implementation, the sender does not have to send a full-window data, but of course it can. ACK always slides the window to the right, and the size of the window can be reduced. The receiver does not have to wait until the window is filled (that is, it changes to 0) before sending the ACK ), many implementations are to send ACK immediately after receiving two data packets.


? Adjust the TCP Window Size

The size of the TCP window is usually determined by the receiving end, that is, the Win field of the second SYN + ACK packet established in TCP.

Of course, the program can change the size of the window (cache) at any time. The default window size is 4096 bytes, but this is not an ideal number for file transmission. If the main purpose of the program is to transfer files, it is best to set this cache to the maximum, however, this may cause the sender to report an ACK after sending multiple data packet segments consecutively. Of course, there is nothing wrong with this, as long as no timeout occurs, it is not an error.

? Tcp push package

PUSH is a flag in the TCP Header. The sender can set this flag when sending data. This flag notifies the recipient to submit all received data to the receiving process. The data mentioned here includes the data transmitted together with the PUSH package and the data previously transmitted for the process.

When the Server receives the data, it immediately submits the data to the application layer process, instead of waiting for additional data to arrive.

So should we set the PUSH flag properly? In fact, the current TCP protocol stack can basically handle this problem on its own, rather than handing it over to the application layer. If the data to be sent is cleared and the sending cache is cleared, the stack automatically sets the PUSH flag for this package. The stack from BSD generally does this, bsd tcp stack never delays the submission of received data to the application. Therefore, in bsd tcp stack, the PUSH bit is ignored because it is useless at all.


? TCP Slow Start (congestion window)

The efficiency of TCP in the LAN environment is very high, but the situation is different in the WAN environment. There may be multiple routers and slow links between the sender and receiver, in addition, some relay routers must cache groups and may also shard data. Therefore, the TCP efficiency may be faulty in the WAN environment.

To solve this problem, the current TCP stack supports the "slow start" algorithm, that is, the congestion window control algorithm. This algorithm works by observing that the rate at which the new group enters the network is the same as the rate at which the other end returns ACK. In fact, the congestion window is a traffic control algorithm used by the sender.

The slow start adds a congestion window for the TCP sender. When the connection is established, the congestion window is initialized to the size of a packet segment, each time an ACK is received, the congestion window adds a packet segment. The sender takes the congestion window and the minimum value of the window as the upper limit for sending.


? TCP block data throughput

The TCP window size, window traffic control, and slow start have a comprehensive effect on the block data transmission of TCP, which may have an unexpected impact on the data transmission of TCP.

RTT (Round-Trip Time): Round-Trip Time. It refers to the time a packet segment experiences from sending to receiving the ACK of this packet segment. Generally, the RTT of a packet segment is related to two factors: transmission latency and transmission latency.

This may occur during the transmission process, that is, the transmission "Pipeline" at both ends of TCP is filled, that is, data is running on the entire pipeline, at this time, no matter how many congestion windows and announcement windows are, the MPs queue cannot accommodate more data. At this time, whenever the receiver moves a packet segment from the network, the sender sends one, but the ACK on the pipeline is always fixed. This situation is the ideal and stable state of the connection.

Generally, the bandwidth * latency is the capacity of a line. Therefore, reducing the RTT can increase the capacity of a line. When increasing the RTT, the transmission time is reduced!

When data is transmitted from a large pipeline to a small pipeline, congestion may occur. For example, when several input streams reach a vro, when the output bandwidth of the router is smaller than the total bandwidth of these input streams, congestion will occur. This situation is generally seen at the interfaces of the LAN and WAN. If the sender is on a LAN without a slow start and uses the bandwidth of the LAN to send packets as soon as possible, the interval between the returned ACK and the slowest WAN link is the same. In addition, because the router forwarding packet is slow, the router may take the initiative to lose the packet.


? TCP emergency mode

TCP provides an "emergency" data transmission method. one end of TCP can tell the other end that some emergency data has been put in a common data stream, the receiver can handle the issue on its own. In emergency mode, the living room is set by setting the offset between the tcp urg flag and the emergency pointer. This emergency Pointer Points to the last byte of the emergency data (or possibly the next byte of the last byte ).

There are many implementations that call emergency methods "out-of-band data". In fact, this is incorrect.

Currently, an emergency pointer is used to disable FTP data transmission. But in general, it is not used much.

For data transmission, if we use emergency data to transmit a large amount of data, this method is obviously not feasible. Isn't it simpler and more effective to establish a TCP connection?


========================================================== ======================================


TCP proposes a series of congestion control mechanisms to prevent network congestion. Initially by V. the TCP Congestion control proposed by Jacob in his paper in 1988 is composed of "Slow start" and "Congestion avoidance, later, the TCP Reno version added "Fast retransmit" and "Fast Recovery" algorithms, later, the "quick recovery" algorithm was improved in TCP NewReno. In recent years, the selective acknowledgement (SACK) algorithm has emerged, and there are other improvements in many aspects, it has become a hot topic in network research. The main principle of TCP congestion control relies on a congestion window (cwnd) for control. We have discussed earlier that TCP also has a peer notification receiving window (rwnd) for traffic control. The size of the window value indicates the maximum data packet segment that can be sent but has not received the ACK. Obviously, the larger the window, the faster the data transmission speed, however, the more likely it is to cause network congestion. If the window value is 1, it is simplified to a stop protocol. Each time a data is sent, the second data packet can be sent only after confirmation from the other party. Obviously, the data transmission efficiency is low. The TCP congestion control algorithm is to balance the two and select the best cwnd value, so that the network throughput is maximized without congestion. Due to the need to consider the congestion control and traffic control, the real sending window of TCP is min (rwnd, cwnd ). However, rwnd is determined by the peer and has no impact on the network environment. Therefore, when we consider congestion, we generally do not consider the value of rwnd, for the moment, we will only discuss how to determine the size of the cwnd value. The unit of cwnd is in bytes in TCP. We assume that each transmission of TCP sends data according to the size of MSS, therefore, you can think that cwnd is a unit based on the number of data packets, so sometimes we say that increasing cwnd by 1 is equivalent to increasing the size of one MSS by the number of bytes. Slow start: the initial TCP will send a large number of data packets to the network after the connection is established successfully. This will easily cause the router cache space to run out in the network, resulting in congestion. Therefore, the newly established connection cannot send a large number of data packets at the beginning, but can only gradually increase the data volume sent each time according to network conditions to avoid the above phenomenon. Specifically, when a new connection is established, cwnd is initialized to the maximum message segment (MSS) size. The sender starts to send data according to the congestion window size. Each time a message segment is confirmed, cwnd increases the size of one MSS. In this way, the value of cwnd increases exponentially with the Round-Trip Time (RTT). In fact, the slow start speed is not slow, but its starting point is a little lower. We can simply calculate the following: start ---> cwnd = 1 after 1 RTT ---> cwnd = 2*1 = 2 after 2 RTT ---> cwnd = 2*2 = 4 after 3 RTT ---> cwnd = 4*2 = 8 If the bandwidth is W, after RTT * log2W, the bandwidth can be fully occupied. Congestion avoidance: As you can see from the slow start, cwnd can grow rapidly to maximize the use of network bandwidth resources. However, cwnd cannot continue to grow infinitely and must be limited. TCP uses a variable named slow start threshold (ssthresh). When cwnd exceeds this value, the slow start process ends and enters the congestion avoidance stage. For most TCP implementations, The ssthresh value is 65536 (also calculated in bytes ). The main idea to avoid congestion is to increase addition, that is, the value of cwnd does not rise exponentially and begins to increase addition. In this case, when all the packets in the window are confirmed, the value of cwnd is increased by 1 and the value of cwnd increases linearly with the RTT, so as to avoid network congestion caused by excessive growth, gradually increase and adjust to the optimal value of the network. The two mechanisms discussed above are not able to detect the behavior of congestion. How can we adjust the behavior when the cwnd is found to be congested? First, let's take a look at how TCP determines that the network is in a congested state. TCP believes that the main reason for network congestion is that it retransmits a packet segment. As mentioned above, TCP has a timer for each packet segment, called RTO. When RTO times out and data has not been confirmed, TCP will re-transmit the packet segment. When timeout occurs, congestion may occur, and a packet segment may be lost somewhere in the network, in addition, there is no message in the subsequent message segment. In this case, the TCP response is "strong": 1. reduce ssthresh to half the value of cwnd. reset cwnd to 13. restart the slow startup process. In general, the principle of TCP congestion control window change is the AIMD principle, that is, increasing addition and decreasing multiplication. It can be seen that this principle of TCP can better ensure the fairness between streams, because once packet loss occurs, it will be halved immediately to avoid it, and leave enough space for other new streams, this ensures the overall fairness. In fact, another case of TCP will be re-transmitted: it is to receive three identical ACK. TCP sends ACK immediately when it receives packets in disordered order. TCP uses three identical ACK to determine the packet loss. At this time, it performs fast retransmission. The following are the tasks of fast retransmission: 1. set ssthresh to half of cwnd. set cwnd to the value of ssthresh (the specific implementation is ssthresh + 3) 3. enter the congestion avoidance phase again. Later, the "Fast Recovery" algorithm was added after the "Fast retransmission" algorithm. When three duplicate ACK packets were received, TCP did not enter the congestion avoidance phase, it is the quick recovery phase. Fast retransmission and fast recovery algorithms are generally used at the same time. The idea of rapid recovery is the "data packet conservation" principle, that is, the number of data packets in the network at the same time is constant, only when the "old" data packet leaves the network, to send a "new" packet to the network. If the sender receives a duplicate ACK, the tcp ack Mechanism indicates that a packet has left the network, so cwnd adds 1. If we can strictly follow this principle, there will be very few congestion in the network. In fact, the purpose of congestion control is to correct the violation of this principle. Specifically, the main steps for quick recovery are: 1. when three duplicate ACK packets are received, set ssthresh to half of cwnd, add 3 to the value of ssthresh for cwnd, and re-transmit the lost packet segment, the reason for adding 3 is that three duplicate ACK packets are received, indicating that three "old" packets have left the network. 2. When a duplicate ACK is received, the congestion window is increased by 1. 3. When receiving the ACK of the new data packet, set cwnd to the value of ssthresh in step 1. The reason is that the ACK confirms the new data, indicating that the data from the duplicate ACK has been received, and the recovery process has ended. You can return to the previous status, that is, it enters the congestion avoidance status again. The fast retransmission algorithm first appeared in the Tahoe version of 4.3BSD, and quickly restored the Reno version of 4.3BSD for the first time. It is also called the Reno TCP congestion control algorithm. It can be seen that the Reno fast retransmission algorithm is applicable to the retransmission of a packet. However, in reality, a Retransmission timeout may cause the retransmission of many data packets, therefore, when multiple data packets are lost from one data window and the algorithm for fast retransmission and quick recovery is triggered, the problem arises. Therefore, NewReno appears. It is slightly modified based on the rapid recovery of Reno to restore the loss of multiple packages in a window. Specifically, when a new data ACK is received, Reno exits the quick recovery status, newReno needs to receive the confirmation of all the packets in the window before exiting the quick recovery status, thus further improving the throughput. SACK is to change the TCP validation mechanism. The initial TCP only confirms the data received continuously, and the SACK will tell all information such as out-of-order information to the other party,

What layer does the Sliding Window Protocol belong?

A sliding window is the content in the TCP protocol of the transport layer. It is a traffic control method that can accelerate data transmission.

The following is a website on the sliding window of Baidu Encyclopedia:

If you do not know, you can directly ask, I wish you good luck ~

Significance of Sliding Window Algorithm in TCP/IP

1. reliable transmission of frames on unreliable links (core functions)
2. Used to maintain the Transmission sequence of frames
3. It sometimes supports traffic control, which is a feedback mechanism that the receiver can control the sender.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.