Comprehensive Analysis of TCP protocol intractable diseases (1)

Source: Internet
Author: User
Tags benchmark

Comprehensive Analysis of TCP protocol intractable diseases (1)

1. network protocol design

ISO puts forward the OSI layered network model, which is theoretical. TCP/IP finally implements a layered protocol model, each layer corresponds to a set of network protocols to complete a set of specific functions. The network protocols under this group are reused and reused. This is the essence of the layered model. In the end, all logic is encoded into cables or electromagnetic waves.

The layered model is well understood, but it is not so easy to design the Protocol for each layer. The beauty of TCP/IP is that the more complex the Protocol is, the more complex it is. We define a network as a device that is connected to each other. The essence of the network is "end-to-end" communication. However, devices that want to communicate with each other do not have to be connected directly, therefore, some intermediate devices must be responsible for data forwarding. Therefore, the protocol used to connect the cables of these intermediate devices is defined as the link layer protocol. In fact, the so-called link is actually initiated with a device, ends on another device through one wire. We call a link a "Hop ". Therefore, an end-to-end network contains many hops ".

2. TCP and IP protocols

End with the IP protocol, we can already complete an end-to-end communication. Why do we still need the TCP protocol? This is a problem. After understanding this problem, we can understand why the TCP protocol has become so "complex" and why it is so simple.

As the name shows, TCP is used to control transmission, that is, to control end-to-end transmission. Why is this control not implemented in the IP protocol. The answer is simple, that is, this will increase the complexity of the IP protocol, and what the IP protocol needs is simple. What is the cause?

First, let's take a look at why the IP protocol is the waist of the hourglass. Its underlying layer is a wide range of link layer protocols. These links provide completely different semantics for each other, and to interconnect these heterogeneous networks, we need a network-layer protocol to provide at least some adaptive functions. In addition, it must not provide too many "guaranteed services", because the upper-layer guarantees depend on the lower-layer's more restrictive guarantees, you can never implement an IP protocol over a m throughput link to ensure a m throughput...

The IP protocol is designed as a packet forwarding protocol. Each hop must go through an intermediate node. The routing design is another major innovation of the TCP/IP network. In this way, no direction is required for the IP protocol, the routing information and protocols are no longer strongly correlated. They are only associated by IP addresses. Therefore, the IP protocol is simpler. As an intermediate node, the router cannot be too complex, which involves cost issues. Therefore, the router is only responsible for routing and packet forwarding.

Therefore, the transmission control protocol must be implemented at the endpoint. Before going into details about the TCP protocol, we should first look at what it cannot do. Because the IP Protocol does not provide guarantees, TCP cannot provide such guarantees that depend on the underlying link of the IP, such as bandwidth, such as latency, these are determined by the link layer. Since the IP Protocol cannot be repaired or the TCP protocol cannot, it can modify some "unguaranteed properties" that begin with the IP layer. These properties include the unreliability of the IP layer, the IP layer is not sequential, And the IP layer has no direction or connection.

This section summarizes the TCP/IP model from the bottom up, with more features and fewer devices to be implemented. However, the complexity of devices is increasing, which minimizes costs, as for the performance or factors, it depends on the software. TCP protocol is such a software. In fact, TCP did not consider performance, efficiency, and fairness at the very beginning, TCP protocol is complicated.

3. TCP protocol

This is a software-only protocol. Why do we design two endpoints? For more information, see the previous section. This section describes the TCP protocol and briefly discusses it in the middle.

3.1.TCP

Specifically, the TCP protocol has two identities. As a network protocol, it makes up for the shortcomings of the IP protocol in the best effort for service, and achieves connection, reliable transmission, and packets arrive in order. As a host software, it isolates host services and networks from UDP and the transmission layer protocol between the left and right. They can be seen as a multiplexing/demultiplexing, multiplexing/demultiplexing of host process data to the IP layer. It can be seen that TCP exists as an interface from any angle. As a network protocol, it implements the control logic of TCP with the peer TCP interface, and serves as a multiplexing/demultiplexing, it implements the protocol stack function with the lower-layer IP protocol interface, which is the basic definition of the hierarchical network protocol model (two types of interfaces: one class and the lower layer interfaces, and the other class and the peer layer interfaces ).

We are used to taking TCP as the top of the protocol stack, instead of using the application layer protocol as a part of the protocol stack. This is partly because the application layer is reused by TCP/UDP, there is a complicated situation. The application layer protocol is interpreted in a different way. The application layer protocol is used to be encapsulated in a way similar to the ASN.1 standard, this reflects the importance of the TCP protocol as a multiplexing/demultiplexing. Because of its direct and application interfaces, it can be easily directly controlled by applications to implement different transmission control policies, this is one of the reasons why TCP is designed not too far away from the application.

In short, there are four key points of TCP: connection, reliable transmission, arrival, and end-to-end traffic control. Note: TCP is designed to only ensure these four points. At this time, although it has some problems, it is very simple, but the bigger problems are quickly presented, so that it has to consider what is related to the IP network, for example, fairness and efficiency increase congestion control, so TCP is now like this.

. TCP with connection, reliable transmission, and data arrival in order

The IP protocol has no direction, and the data transmission can reach the peer end all by routing. Therefore, it is to arrive at the peer end in one hop, as long as one hop does not reach the peer route, data transmission will fail, in fact, routing is also one of the core of the Internet. In fact, the core basic skills provided by the IP layer can have two points: Address Management and routing. TCP uses the simple IP routing function, so TCP does not have to consider routing. This is another reason why it is designed as an end-to-end protocol.

Since the IP address has tried its best to allow separate data packets to reach the peer end, TCP can implement other stricter control functions on this network with the best effort. TCP adds connectivity to the communication of unconnected IP networks, confirms the status of the data that has been sent, and ensures the Data Order.

3.2.1. Connected

This is the basis of TCP, because the reliability and data sequence of subsequent transmission depend on a connection, which is the simplest implementation method. Therefore, TCP is designed as a stream-based protocol, since TCP needs to establish a connection in advance, it doesn't matter how much data is transmitted afterwards, as long as the data of the same connection can be identified.

FAQ: handshake and 4 waves

TCP uses three handshakes to establish a connection. The handshake initializes the information required for transmission reliability and data sequence. The information includes the initial serial numbers in both directions, and the validation numbers are generated by the initial serial numbers, the three handshakes are used because the three handshakes have prepared the required information for transmission reliability and data sequence. The 3rd handshakes do not need to be transmitted separately, it can be transmitted together with data.

Why is it necessary for TCP to use four waves to remove a connection? Because TCP is a full-duplex Protocol, each channel must be removed separately. Note that the meaning of the four and three handshakes is different. Many people will ask why the three handshakes are established and the four handshakes are removed. The purpose of three handshakes is to allocate resources and initialize the serial number. data transmission is not involved at this time. Three handshakes are enough to terminate data transmission, and reclaim resources. At this time, the serial numbers of the two endpoints are no longer related. You must wait for the two ends to have no data transmission before removing the virtual link. This is not as simple as initialization, if the SYN sign is found, a serial number is initialized and the SYN serial number is confirmed. Therefore, data transmission in this direction must be terminated separately.

Intractable disease 2: TIME_WAIT status

The reason for this is that the serial number is randomly generated every time a connection is established, and the serial number is 32 bits and will be rewound. Now I want to explain how this has something to do with TIME_WAIT.

Any TCP segment must be transmitted over the best-effort IP network. The middle router may cache any IP datagram randomly. It does not care what data is carried on the IP data report, however, based on experience and the size of the Internet, an IP datagram can survive at most MSL (this is calculated based on the earth's surface area, the transmission rate of electromagnetic waves in various media, and the TTL of the IP protocol, etc, if it is on Mars, this MSL will be much larger ...).

Now we want to terminate the connection when the passive party sends a FIN and then the active party replies an ACK. However, this ACK may be lost, which will cause the passive party to resend the FIN, this FIN may survive MSL on the Internet.

If there is no TIME_WAIT, it is assumed that connection 1 is disconnected, but the final FIN sent by the passive side (or any TCP segment sent before the FIN) is still on the network, however, connection 2 re-uses all the five elements of connection 1 (Source IP, destination IP, TCP, source port, and destination port) and will just establish a connection, when the FIN that is late for connection 1 arrives, the FIN will terminate the connection at a low but indeed possible probability.

Why is the probability low? This involves a matching problem. The serial number of the late FIN segment must fall within the range of the expected serial number of the connected party 2. Although this coincidence rarely happens, it does happen. After all, the initial serial number is randomly generated. Therefore, the active party that terminates the connection must wait for 2 * MSL time after accepting the passive party and replying to the ACK to enter the CLOSE state. The reason is multiplied by 2 because this is a conservative algorithm. In the worst case, the ACK for the passive side is lost when it reaches the passive side immediately after the longest route (going through an MSL) through the internet.

To cope with this problem, RFC793 has a suggestion for generating the initial serial number, that is, setting a benchmark. the benchmark is random based on the benchmark, and the benchmark is time, we know that time increases monotonically. However, there is still a problem, that is, the Round-Robin problem. If a round-robin occurs, the new serial number will fall into a very low value. Therefore, the best way is to avoid "Overlap", which means that a random range must be set on the benchmark.

You know, many people do not like to see a lot of TIME_WAIT connections on the server, so they set the value of TIME_WAIT very low, although this is feasible in most cases, however, it is also an adventure. The best way is not to reuse a connection.

Difficulty 3: Reuse a connection and reuse a socket

This is fundamentally different. It is generally not a problem to reuse a socket separately, because TCP is connection-based. For example, if a TIME_WAIT connection occurs on the server, the connection identifies a five-element connection. As long as the client does not use the same source port, the connection to the server is normal, because the late FIN will never reach this connection. Remember, a five element identifies a connection, not a socket (of course, for BSD socket, the accept socket of the server actually identifies a connection ).

3.2.2. Transmission reliability

Basically, the transmission reliability is achieved by the confirmation number. That is to say, each time a segment is sent, the receiving end must send a confirmation before the sending end can send the next byte after receiving the confirmation. This principle is the simplest. The "stop-Wait" protocol in textbooks is the byte version of this principle, but TCP uses the sliding window mechanism to make it not necessarily send a byte each time, however, this section only describes the timeout mechanism for confirmation.

How can we know that the data has arrived at the peer end? That is, how long does the sender send a confirmation message to the peer end? If you keep waiting, you will not be able to find data loss and the Protocol will be unavailable. If the waiting time is too short, you may be sure that it is still on the road. Therefore, waiting time is a problem, in addition, how to manage the timeout time is also a problem.

Difficulty 4: timeout Calculation

You must provide an accurate algorithm to calculate the timeout time. Undoubtedly, the return time of a TCP segment is the round-trip time of a datagram. Therefore, the standard definition of a new term RTT represents the round-trip time of a TCP segment. However, we know that the IP network does what we can, and the routes are dynamic, and the router will cache or discard any datagram without warning. Therefore, this RTT needs to be dynamically measured, that is to say, measurement should be done at least once every time. If everything is the same, everything is fine. However, the world is not as expected. Therefore, we need to find a "average" instead of an accurate value.

This average value is not appropriate if it is only used to calculate the arithmetic average value multiple times, because we must consider the instantaneous jitter of path latency for data transmission latency, otherwise, if the measured values are 2 and 98 respectively, the timeout value is 50. This value is too large for 2, the result is that the data delay is too large (the retransmission should have waited for a long time to be re-transmitted), but for 98, it is too small, and the result is excessive re-transmission (the road is far away, it should have been slow, results A large number of TCP segments that have been correctly confirmed but are late for retransmission ).

Therefore, in addition to the deviation of each two measured values, the change rate should also be taken into account. If the change rate is too large, the RTT is calculated based on the function with the change rate as the independent variable (if it increases sharply, the value is a relatively large positive number. If it decreases sharply, the value is a relatively small negative number, and then weighted sum with the average value). If the change rate is small, the average measurement value is obtained. It is self-evident that this algorithm still works well.

Difficulty 5: management of time-out timer-single timer per connection

Obviously, generating a timer for each TCP segment is the most direct method. Each timer expires after the RTT time. If no confirmation is received, the timer is re-transmitted. However, this is only theoretically reasonable. For most operating systems, this will bring huge memory overhead and scheduling overhead. Therefore, the design of a single timer for each TCP connection is a default choice. But how can a single timer manage so many outgoing TCP segments? How can we design a single timer.

There are two principles for designing a single Timer: 1. Each packet must be able to time out if it is not received for a long time; 2. The long-term failure cannot be too far away from the tested RTT. Therefore, RFC2988 defines a simple set of principles:

A. When sending TCP segments, if the retransmission timer is not enabled, enable it.

B. When a TCP segment is sent, if a retransmission timer is enabled, it is no longer enabled.

C. When receiving a non-redundant ACK, if there is data in transmission, re-enable the retransmission timer.

D. When a non-redundant ACK is received, if no data is being transmitted, the retransmission timer is disabled.

Let's take a look at how these four rules achieve the above two points. According to a and c (in c, note that ACK is non-redundant), as long as any TCP segment is not confirmed, the timeout timer always times out. But why c? If Rule a exists, rule 1 can also be implemented. This is actually true, but rule c is added to avoid premature retransmission. If there is no rule c, in case some data is sent before the retransmission timer expires, in this way, after the timer expires, all the data sent earlier will not receive the ACK, so the data will be re-transmitted. With rule c, the retransmission timer is reset as long as there is a segmentation ACK, which is reasonable. Therefore, in most normal cases, there is no big difference between the time from data sending to ACK, the calculated RTT, And the timeout time of retransmission timer, when an ACK arrives, the timer can be reset to prevent the data from being re-transmitted too early.

Some details need to be explained here. The arrival of an ACK indicates that the subsequent ACK may come in turn, that is to say, the possibility of loss is not great. In addition, even if the TCP segment loss occurs, it will also be re-transmitted within a maximum of 2 times the timer timeout period (assuming that the message is sent immediately after the first packet is sent and started, it is lost, after the ACK of the first packet arrives, the timer is restarted and re-transmitted only after a timeout period ). Although congestion control is not involved yet, network congestion may cause packet loss, packet loss may cause retransmission, and excessive retransmission in turn will increase network congestion. The result of Setting Rule c can alleviate excessive retransmission, after all, the Retransmission timeout time of the data sent after the timer is started is increased by about one time. The timeout deviation between at most one time and so on is implemented in principle 2, that is, "This long-term reception cannot be too far away from the measured RTT in the long term ".

Another point is that if the last segment of a sending sequence is lost, the redundant ACK will not be received in the future, so that the timeout can only be reached, and the timeout time is almost certainly longer than the timer timeout time. If the segment is sent at the back time of the sending sequence and is far away from the previous sending time, the timeout time will not be very large, otherwise it will be relatively large.

Intractable disease 6: When to measure RTT

At present, many TCP Timestamps are implemented, which is much more convenient. The sender no longer needs to save the time of sending segments. Instead, it only needs to be placed in the timestamp field of the protocol header, then, the receiver can display it back to ACK. Then, after receiving ACK, the sender obtains the timestamp and performs arithmetic difference with the current time to complete an RTT measurement.

3.2.3. Data Sequence

Basically, the transmission reliability is achieved by serial numbers.

Difficult miscellaneous 7: Confirmation Number and timeout retransmission

The confirmation number is a very strange thing, because the TCP sending end only needs to receive a confirmation number for a data sequence sent out, and the data before the confirmation number is considered to have been received, even if a previous confirmation number is lost, that is, the sender only recognizes the last confirmation number. This is reasonable because the validation number is sent by the receiving end, and the receiving end only confirms the last TCP segment that arrives in order.

In addition, the sender resends a TCP packet and receives the confirmation number of the TCP segment. This does not indicate that the re-sent packet has been received, or the data has been received for a long time, the timeout is only caused by the loss of ACK or the arrival of ACK delay. It is worth noting that the acceptor will discard any duplicate data. Even if the duplicate data is discarded, its ACK will still be normal.

Standard early TCP implementation is that as long as one TCP segment is lost, even if all subsequent TCP segments are completely received, the sender will re-transmit all packets starting from the loss segment, this will lead to a problem, that is, a retransmission storm, a segment loss, resulting in a large number of retransmission. This storm is not necessary, because in most TCP implementations, the receiving end has cached out-of-order segments. After these segments are lost after retransmission, it is highly likely that the device is discarded. This will be mentioned after the introduction of congestion control (the problem is described as fast first: the timeout caused by the loss of packets indicates that the network is likely to be congested, retransmission storms can only increase the congestion level ).

Difficulty 8: out-of-order data caching and selection confirmation

TCP ensures data order, but does not mean that it will always discard unordered TCP segments. Whether or not it will be discarded is related to the specific implementation. RFC recommends that if the memory permits, we still need to cache these disordered segments, and then implement a mechanism to splice the cached segments into a sequential one, which is similar to the fragments in the IP protocol, however, IP datagram is uncertain. Therefore, the implementation of the IP protocol must cache any part received and cannot discard it. Because an IP part is discarded, it will never come again.

Now, TCP implements a method called selection confirmation. The receiving end explicitly tells the sending end which segments need to be re-transmitted without re-transmitted. This undoubtedly avoids retransmission storms.

Difficulty 9: TCP serial number redirection

The serial number redirection of TCP may cause many problems. For example, after a segment with the serial number s is sent, after m seconds, the segment with the serial number smaller than s is sent as j, in this case, j is a circle more than the previous s, which is a round-robin problem. If the next segment arrives at the receiving end, this will lead to a complete disordered order-originally j should be behind s, but the result will arrive at the front, which cannot be checked by the TCP protocol. Let's take a closer look at this situation. data segments are not sent by one byte or one byte. If a network with a speed of 1 Gbps exists, the TCP sending end will send MB of data in one second, and the 32-bit serial number space can transmit 2 to the power of 32 bytes, that is to say, it will take about 32 seconds to bypass, we know that this value is much smaller than the MSL value, so it will happen.

Some details may cause misunderstanding, that is, the TCP Window Size Space is half of the serial number space, so that in the case of full load, data can fill the sending window and receiving window, the serial number space is enough. However, in fact, the initial serial number of TCP does not start from 0, but is randomly generated (of course, we need to assist some more sophisticated algorithms). Therefore, if the initial serial number is close to the power of 2, then it will be rolled back soon.

Of course, now we can use the timestamp option to help identify a part of the serial number. When the receiver encounters a loop, we need to compare the timestamp. We know that the timestamp is monotonically increasing, although it will also be rolled back, it takes a lot of time. This is just a strategy and will not be discussed in detail here. There is another very practical problem. Theoretically, the serial number will be bypassed, but in fact, how many TCP endpoint hosts are directly deployed on both ends of the 1G network cable and the receiving and sending windows can be filled at the same time. In addition, even if a loop occurs, it is not a special task. It is too common in the computer. You only need to identify it. For the TCP serial number, at both ends of a high-speed network (point-to-point network or Ethernet), there is little possibility of disordered data. Therefore, when a serial number suddenly changes to 0 or the ending serial number is smaller than the starting serial number, it is easy to tell that you only need to compare it with the previous confirmed segments. If you are on the two ends of a router, the IP datagram will be reordered. For TCP, although there is still a loop, it will be much slower, and considering that the congestion window (not introduced yet) is generally not too large, it is difficult to fill the window to 65536.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.