In-depth understanding of TCP protocol

Source: Internet
Author: User

In-depth understanding of TCP protocol

TCP is a connection-oriented transmission layer-by-layer protocol that can provide reliable data transmission services for the application layer. The so-called connection orientation is not really a connection, but a handshake is required before sending data, that is, the receiver knows that you want to send data to it. UDP is a connectionless transport layer protocol and does not provide reliable data transmission. There is a very appropriate analogy: UDP transmission is similar to writing a letter, the receiver does not know you want to write to him in advance, and TCP transmission is like making a call, you must wait for the recipient to press the access key to make the call easier.

So how does TCP implement connection-oriented and reliability services ?? Before discussing TCP's reliable data transmission, let's take a look at the simplest transport layer service UDP.

1. UDP

  

Source Port Number/destination port number: Same as the port number in the TCP Header

Header Length: the number of bytes in the packet segment (header plus data ).

Checksum: Error Detection, used to determine whether the bit changes when the UDP packet segment moves from the source to the destination.

Test and how to calculate ??

It consists of three parts: UDP pseudo header, UDP header, and UDP data part. The pseudo header is as follows:

  

The protocol field: TCP is 6, UDP is 17, and UDP length is the total length of UDP (including the UDP header and data part.

  • First, add the UDP pseudo header to the front of UDP, then fill in the test field in the UDP header with 0, and divide all the digits into 16 characters.
  • Add all 16-bit characters. If carry is encountered, add the value of the carry part higher than 16 bytes to the carry bit, for example:
    • 1011 1011 0101 1110 + 1111 1100 1110 1100 = 1 1011 1000 0100
    • Then, add 1, 1011, 1000, 0100, and 1011, the highest bit, to the lowest Bit. The value is 1011, 1000, 0100, and 1011.
  • The result of adding all words is a 16-bit number. If this number is reversed, it is used as a test field.

We can see from the UDP header that UDP is a very simple transport layer protocol, which is only responsible for receiving data from the application layer of the sending end, encapsulating the layer UDP packet segment, and then sending it to the lower layer to the receiving end; at the receiving end, UDP receives data from the lower layer and then delivers the data to the application layer. During the transmission process, UDP provides a basic error detection service. If no error is detected, it is directly sent to the application layer; otherwise, it is discarded directly.

Let's take a look at the reliable transmission service provided by TCP:

2. TCP

Source Port Number/destination port number: used for Multiplexing/decomposing data from or sent to upper-layer applications. What does it mean? There may be many processes at the application layer, and each process may send data to the Internet or receive data from the Internet through the transport layer. Which process at the application layer should the transport layer receive data from the Internet? Or how do I know which service is the data received from the application layer ?? In fact, these implementations are identified by the port number. Each network service in the application layer corresponds to a port number, which identifies the corresponding service. Therefore, the port number is the adhesive that binds the transport layer to the application layer.

  

Serial number and validation number: used for reliable data transmission.

Receiving Window field: indicates the remaining size of the receiver's receiving buffer for traffic control.

Header length field: the existence of an option field in the TCP Header, that is, the length of the TCP header is variable, so you need to specify the length of the header.

Option field: used when the sender negotiates the maximum message segment length (MSS) with the receiver, or when used as a window adjustment factor in a high-speed network environment. A timestamp option is also defined.

RST, SYN, and FIN bits: used for establishing and removing connections.

PSH bit: When the PSH bit is set, it indicates that the receiver should immediately hand over the data to the upper layer.

URG bits and emergency data pointers: URG bits indicate that there is data in the upper layer of the message segment which is set to "urgent; the last byte of emergency data is pointed out by the 16-bit emergency Data Pointer field. When the emergency data exists and the end of the emergency data is given, TCP must immediately notify the upper-layer entity of the receiving end.

Test Field: Same as UDP test and error detection.

How does TCP ensure the reliability of data transmission ??

(1) perform three handshakes before sending data to ensure reliable communication with the acceptor. Here is a three-way handshake process:

In the initial state, both the client and the server are in the CLOSED state. The server opens the listen listener and the client enters the LISTEN state. Then, the client sends a SYN packet whose serial number is j, and the client enters the SYN_SENT state; when the server receives the SYN packet, the server enters the SYN_RECV status and sends an ACK with SYN. Check that the number is j + 1 and the serial number is k; when the client receives the syn ack, the client enters the ESTABLISHED status. for the client, it has confirmed that it can communicate with the server, so the client can send data to the server, at this time, the client sends an ACK (ACK can contain data information) to the server, and the confirmation number is k + 1. Before the server receives the ACK, the three handshakes are not completed yet, although the client can send data to the server, it can only be contained in ACK, but the server cannot send data to the client. Only when ACK is received, the server enters the State ESTABLISHED. Since then, the three handshakes have been completed, and the client can establish a connection with the server to send data to each other.

Are you sure you want to perform three-way handshakes ??

In fact, the essence of this problem is that the Internet channel is unreliable, but to transmit data reliably on this unreliable channel, the three-way handshake is the minimum theoretical value.

If only two handshakes are performed, the client sends a SYN group in two cases:

Scenario 1: The server receives the SYN and returns ACK. Whether or not the client receives ACK, the server considers that it has established a connection with the client, and then sends data to the client. However, if the customer segment does not receive the ACK, the client will think that there is no connection with the server and will not receive the data sent from the server, that is, directly discard the data sent from the server, when a message sent by the server times out, data is repeatedly sent, which leads to a deadlock.

Case 2: the first connection request message segment sent by the client is not lost, but is stuck at a certain network node for a long time, so that it will arrive at the server at a certain time after the connection is released. This is a long-overdue packet segment. However, after the server receives the invalid Connection Request Message segment, it is mistaken for a new connection request sent by the client again. So I sent an ACK to the client, but at this time the client did not send a request, so I did not ignore this ACK, and the server began to send data to the client again. At this time, the client discards the data, and the server sends the data repeatedly when the message sent by the server times out, resulting in a deadlock.

 

(2) ensure data integrity and delivery in order through validation and retransmission mechanisms

TCP regards data as a non-structured and ordered byte stream. Therefore, the serial number of the packet segment mentioned above is the byte stream number of the first byte of the message segment, the validation number in the packet segment is the sequence number of the next byte that the host expects to receive from the client. Here is an example:

Assume that TCP receives 3000 bytes of data from the application layer, and the maximum TCP Message length (MSS) is 1460, the data must be segmented. the first data segment is 0 ~ 1459 bytes, the second segment is 1460 ~ 2919 bytes, the third segment is 2920 ~ 2999 bytes. The serial numbers of the three segments are 0, 1460, and 2920, respectively.

Assume that the server receives the first packet segment sent from the client from 0 ~ 1459 bytes, the serial number of the next byte it expects to receive is 1460, then the ACK returned to the client is 1460, and then the server receives the 2920 ~ 2999 bytes of message, but not 1460 ~ 2919 bytes, the server continues to expect the next receiving byte to be 1460, so the confirmation number in the returned ACK is still 1460. TCP only confirms the bytes until the first byte is not received, so TCP provides a cumulative confirmation. The receiver retains the out-of-order bytes and waits for the missing bytes to fill the gap.

Of course, in such a complex network, even if the three-way handshake establishes a connection, it is impossible to send data to the destination each time. Each time the client sends a message number to the network, it will continue to cache the message and instruct the client to receive the ACK from the server to confirm that the server has received the message and then discard it. However, when packet segment packet loss occurs in the network, bit errors occur, or the server returns ACK loss, the client will not receive ACK. So what should we do? Can't you always wait?

The client uses a timer timeout mechanism to ensure that the client does not wait without limit. That is, when a packet segment is sent, the timer is started. When a timeout occurs and the server has not received the ACK, the client resends the packet segment. But how long does it take ?? Sending a packet from the client to receiving ACK is equivalent to a round-trip. We use the round-trip time RTT to indicate that the set timer time must be at least greater than the RTT. If ACK is lost, if the server receives the resend message, will the data be duplicated ?? The server ensures no data redundancy through serial numbers. When the server receives this duplicate data packet, it will know that the client has not received the ACK timeout and directly discards it, then return the latest ACK to the client.

(3) TCP provides traffic control and congestion control

Traffic control is actually a speed matching service. That is to say, the rate at which the sender sends data must match the read rate of the receiver application to eliminate the possibility of buffer overflow at the receiver side. There is a field in the TCP Header called the Receiving Window field, which is used to notify the sender server of the remaining buffer size (rwnd.

The congestion control provided by TCP is not network-aided congestion control, but end-to-end congestion control, because the IP layer does not provide explicit network congestion feedback to the end system. So how does the TCP sender limit its transmission rate? How does the sender know whether the path is congested?

As mentioned above, when data packets are lost in the network, timeout may occur, and the server segment may receive redundant data packets. Of course, the client is no exception and can also receive redundant ACK. Therefore, we define a packet loss event as either timeout or three redundant ACK packets from the receiving end. When a packet loss event occurs, the client will know that there is congestion on the link.

The sender maintains a congestion window (cwnd). The amount of unconfirmed data in the buffer of a sender does not exceed the value of cwnd and rwnd (receiving window field in traffic control, the minimum value of the remaining buffer size on the server. This constraint limits the amount of unconfirmed data by the sender and indirectly limits the sending rate.

In fact, TCP sets the sending rate according to the following principles:

  • A lost packet segment means congestion. Therefore, when a packet segment is lost, the rate of the TCP sender should be reduced.
  • One validation packet segment indicates that the network is delivering the sender's packet segment to the receiver. Therefore, the sender rate can be increased when the confirmation of the previous unconfirmed packet segment arrives.
  • Because the IP layer does not provide explicit network congestion feedback to the upper layer, TCP uses ACK and packet loss events as implicit signals for bandwidth detection.

The question is, how can I set the value of cwnd ??

Through TCP congestion control algorithms: slow start, congestion avoidance, and fast recovery.

Detailed implementation process of the congestion control algorithm

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.