Incomplete TCP protocol Defects

Source: Internet
Author: User
Tags quic

Incomplete TCP protocol Defects
Zero. Preface

TCP, after being invented in 1974 and over 30 years of development, has become the most important basic Internet protocol. In the wired network environment, TCP performance is even more powerful. However, in the mobile Internet and IOT environments, TCP performance is slightly inadequate.

The outstanding features of the mobile Internet are unstable: the signal is unstable, and the network connection is unstable. Although the mobile phone network bandwidth has increased to 4G, the signal is not so stable due to its flow characteristics: when taking a long-distance bus or taking a bus, or an environment such as a nearby password set, the real environment is very complicated.

The following describes the Linux server environment, which is assumed to be a mobile Internet environment. Record some of the TCP limitations that I currently know and make some deviations. please correct me.

I. Three-way handshake

Three handshakes are required before data transmission. Obviously, this is redundant. The TCP Fast Open (TFO) extension mechanism is proposed in the industry. After two handshakes, normal business data can be sent. But this requires support from both the client and the server kernel: Linux kernel 3.6 client and 3.7 server.

Read more: TCP Fast Open: expediting web services

II. Slow Start

For an HTTP request, when the application layer sends data on a large HTML page, the congestion window must be extended to the maximum value after several Round-Trip times, the intermediate process is quite redundant. This parameter is directly related to the system throughput. The throughput is high and the system latency is low. However, you need to make a decision based on your business.

3.0 the size of the initialization congestion window (initcwnd) before the kernel is 3. Three MSS can be transferred when a connection is established for initial data transmission. If one MSS is 1400, 4 K data can be transferred at a time. If the value is 10, 13 K data can be transferred at a time.

According to a survey conducted by Google, it is recommended that initcwnd be set to 10 in the mobile Internet WEB environment, and the default value after Linux kernel version 3.0 is 10. In case of a low kernel, You need to manually set it.

If the LAN environment requires transmission of similar big data or files, you can consider relaxing it.

If a persistent connection is used to transmit small messages after it is established, and the binary value is less than 4 kb each time, it is irrelevant to the change of the slow start.

Advanced reading:

  • Tuning initcwnd for optimum performance
  • Optimizing Your Linux Stack for Maximum Mobile Web Performance
  • An Argument for Increasing TCP's Initial Congestion Window
3. Head-of-line blocking (HOL)

Data transmission over TCP requires Sequential transmission. It can be understood as a FIFO first-in-first-out queue. After the current data transmission is lost, subsequent data units can only wait, after the lost data is re-transmitted and confirmed to be received, the subsequent data packets will be delivered to the client device. This is the so-called HOL, head-of-line blocking. This is a waste of server bandwidth, which reduces system performance and is not efficient.

1. Unsatisfactory multiplexing

Although the service-level multiplexing proposed by HTTP/2 solves the problem of HTTP/1. * single-channel transmission to a certain extent, it is still subject to the fault of the TCP line header blocking. It is built on the multi-channel multiplexing of the TCP upper-layer protocol. Once the line header is blocked, you need to be careful about the problem of multi-channel service data transmission failure.

2. the TCP Keepalive mechanism is invalid.

Theoretically, the Keepalive of TCP is guaranteed to be extended. When the line header is blocked, it is always blocked when the transmission fails.

Similar to the NFS file system, two-way TCP Keepalive mechanism is generally used to prevent the failure of Keepalive caused by line header blocking at one end, so as to detect the survival of one end in a timely manner.

3. Line header blocking timeout prompt

When the packet is sent, start the receiving confirmation timer. After the timeout, it will be resending. The re-sending is still unconfirmed, and subsequent data will be accumulated into the waiting queue. Here there will be a blocking timeout, and the algorithm is very complicated. The upper-layer application receives the "No route to host" error message from the kernel protocol stack. The default value is No more than 16 minutes. The terminal is forcibly disconnected before the server (without support for business heartbeat) sends data. By the way, the system uses TCPDUMP to intercept the packet and waits for about 15 minutes for the kernel to warn of the "EHOSTUNREACH" error, you can see the "No route to host" notification at the application level.

4. Four moves

After a successful connection is established between the two ends, four interactions are required when the connection needs to be closed, which is redundant in the mobile Internet environment. Fast shutdown, fast response, redundant interaction, and network bandwidth occupation.

5. Is the confirmation mechanism notified to upper-layer applications?

This is a good wish. The upper-layer application sends a large data segment by calling the kernel-layer interface, the kernel completes sending and receives the complete confirmation from the other party, and then notifies the upper-layer application that the sending is successful, in some environments, you can save a lot of interaction steps at the business level.

Sat. NAT gateway timeout

IPV4 is limited, and the number of access terminal devices is increased by means of NAT routing devices in the LAN environment. When a TCP persistent connection is established, the NAT device needs to maintain the ing between the internal IP: PORT used by an internal terminal to connect to the external server and the outgoing IP: PORT. This relationship needs to be maintained, which consumes memory resources and has timed-out timer cleaning. Otherwise, the memory will pop up.

Different NAT devices have different timeout values. Therefore, heartbeat assistance is required to ensure that the connection through the NAT device remains unchanged, so as to avoid being kicked out for a long time. For example, the network connection duration of China Mobile is generally set to no more than 5 minutes. Various networks are slightly different, so it is appropriate to introduce the smart heartbeat mechanism.

7. Terminal IP roaming

Mobile phone terminals often switch between 2G/3G/4G and Wi-Fi, resulting in frequent changes in IP addresses. The consequence of this is that the existing network request-response is abandoned and terminated, and manual intervention or re-request is required, resulting in a waste of resources.

Terminal devices supporting Multipath TCP can establish a Mutlpath connection using 2G/3G/4G and WiFi at the same time, download through multi-point optimized network, and backup each other. If multiple networks coexist, one network interruption will not cause global request processing interruption, and the connection stability and reliability of the devices are enhanced.

Of course, you can also use multiple networks of Multipath TCP to increase network throughput between servers.

The status quo is:

  1. Currently, only IOS 7 and later versions are supported.
  2. It can be seen on the Linux kernel 3.10 lab branch, but it is unknown when it is merged to the master branch.

Read more: A closer look at the scientific literature on Multipath TCP

8. TCP cache Expansion

When the packet received by the router exceeds the queue length, packet loss is generally performed randomly to reduce expansion. For upper-layer applications, latency increases, or mistakenly identifies data loss or connection loss.

In this case, it is generally recommended that you send packets quickly to avoid data loss. Upgrade the kernel to the latest version this morning, no less than 3.6.

Advanced Reading: Bufferbloat

9. TCP is not absolutely reliable
  1. Both the IP address and TCP protocol have the check sum error checksum mechanism in the header, which is a 16-bit representation. The backend code is added and the result is reversed. For details, refer to the Principle and Implementation of TCP Checksum. Generally, errors can be easily detected, but the result remains unchanged after two 16-digit numbers are added.
  2. The CRC32 check of Ethernet frames is usually OK, but problems may occur when multiple routers are isolated at both ends. For example, a picture provided by Mr Chen Shuo:

    The Client sends a TCP segment to the Server, which is first encapsulated into an IP packet and then encapsulated into an ethernet frame and sent to the router (message a in the figure ). The Router receives the ethernet frame (B) and forwards it to another CIDR Block (c). The Server finally receives the d and notifies the application. Ethernet CRC can ensure that a and B are the same, c and d are the same, and the strength of TCP header check sum is not enough to ensure that the content of sending and receiving payload is the same. In addition, if you replace the Router with NAT, the NAT itself constructs c (replace the source address). At this time, the payload of a and d cannot be verified using tcp header checksum.

  3. The router may encounter Hardware/memory faults, resulting in multiple bit/single bit inversion or dual-byte switching of IP packets sent and received. If this inversion occurs in the payload area, the check sum of the link layer, network layer, and transport layer cannot be used for detection. The check sum of the application layer can only be used for detection. Therefore, we recommend that you add the data verification function to the application layer.

  4. Added verification for large file downloads to ensure data integrity. MD5 is generally used to prevent security tampering.


  • Paper When the CRC and TCP checksum disagree
  • The Limitations of the Ethernet CRC and TCP/IP checksums for error detection
  • Online single-bit reverse events in Amazon S3
10. Summary

In a world full of TCP environments, it is unlikely to perform a major TCP operation because it has been solidified into the existing system kernel and firmware. For example, upgrading the system/firmware of a terminal (such as Android/IOS), the Linux Server kernel, and intermediate devices/intermediary devices (such as routers) is a huge project, which is not realistic at present.

TCP is located at the system kernel layer. It is the most troublesome to upgrade and repair the kernel space. Server upgrades are a bit difficult. The upgrade and transformation of user space/user core applications are relatively controllable. Based on this, Google experts directly build and run the QUIC protocol in the user space on the UDP protocol, considering the lightweight nature of UDP and the reliability of TCP, UDP is a novel direction.

If you have expectations for the underlying transmission protocol in the future:

  • Custom protocols appear in the user space (User core), similar to QUIC
  • Traditional TCP/UDP can run in user space and directly skip the kernel
  • The complete protocol stack is provided to upper-layer applications in the form of a static Link Library
  • Upper-layer applications can include the so files of their dependent protocol stack static Link Library during compilation and packaging.
  • Dpdk, netmap, and other Packet I/O frameworks + user space protocol stacks. Data is directly delivered to upper-layer applications from the network adapter.
  • Reduced Linux kernel importance and regular SSH System Maintenance

Although TCP has such a problem, it still cannot bypass the network infrastructure, but it may be helpful to understand some shortcomings.

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.