Network Protocol for all-people-everyone should understand TCP

Source: Internet
Author: User

Network Protocol for all-people-everyone should understand TCP

Even if you do not need to know TCP in your work, you do not need to know the specific TCP/IP instance. You should also understand some basic TCP knowledge. This article will tell you why.

I used to write a TCP stack in Python when I was working in the Recurse Center (I also wrote a blog post about what I can learn about implementing the TCP stack using Python ). This is an interesting lesson, and I have learned about TCP.

One year later, I encountered difficulties in my work. Some colleagues asked on Slack: "Hey, there is always a latency of 40 ms when I push messages to NSQ. I don't know why ." I have no idea about this problem after a week.

NSQ is a message sending queue. The sending method is to send an HTTP request to localhost. This action cannot take 40 ms. It must have been an error. However, NSQ does not have a high CPU priority or occupy a large amount of memory, so the problem is not due to garbage collection.

Later, I remember an article I read a week ago-how we saved 200 ms for each POST request (In search of performance-how we shaved ms off every POST request ). This article discusses the reason why each POST will spend ms at the beginning, which is somewhat strange. The content in this article is as follows.

ACK latency and TCP_NODELAY

Ruby's Bet: HTTP divides the POST request into two TCP packets-one header and one body. curl. In contrast, it is more appropriate to combine them into one. However, Net: HTTP does not set TCP_NODELAY for the TCP socket opened by HTTP. Therefore, after sending the first packet, the second packet will be sent after confirmation. Ultimately, this is caused by the Nagle algorithm. On the other end of the connection, HAProxy determines the method used to confirm the two packets. In 1.4.18 (the official version we use), TCP latency validation is used. latency validation is poor in the Nagle algorithm, causing the request to be paused in this place until the timeout occurs.

Let me summarize this section:

TCP is an algorithm for packaging the data you want to send

Their HTTP needs to send POST requests with two packets

The entire process is like the following:

Application: Hi! Give you the first package HAProxy: Hush ...... We have to wait for the second package HAProxy: Yes, we have to confirm it, but it's no big deal. We will talk about application later ...... Wait until the first package is confirmed and the second package is confirmed. Maybe the network is congested. Wait for a while. HAProxy: get bored. Let's confirm the first package. application: receive the confirmation, release the second package !!!! HAProxy: Done!

During this period, HAProxy and application both waited passively until more than 200 ms. Application waits because of the Nagle algorithm, and HAProxy waits for ACK delay.

As far as I know, delayed ACK is enabled by default in all Linux systems. So this is not a special case. As long as you send more than one TCP packet, you will also encounter such a problem.

Finally solved the problem.

After reading this article, I think there is nothing remarkable. But we struggled for a long time in the mysterious 40 ms. I remembered this article.

I thought: could this be my problem? Is it possible ?? Is it possible ?! I sent an email to the team saying, "It may be that I am crazy, but it may be a problem with TCP ."

So I opened TCP_NODELAY and -- BOOM!

All 40 ms of latency disappears, and the world is perfect. I am a genius!

Should ACK delay be completely disabled?

In an episode, I saw this comment on HN:

The real problem lies in ACK latency. It was a bad idea to set a latency of Ms. The people who worked on BSD in Berkeley in didn't understand the problem at all. ACK latency means that the gambling application layer will certainly receive a reply within Ms. Although I lose almost every time, the ACK delay is still in use.

He discussed in his comment that ACK is very low-cost, which leads to more serious problems than it solves.

If you do not understand TCP, you cannot solve this problem.

In the past, I always thought that TCP is something quite underlying, and I never need to know about it. Although this is almost the case, in real life, you may still encounter bugs related to the TCP algorithm. At this time, it is crucial to understand some TCP knowledge. (This article can also be extended to saying that system calls and operating systems are important. This principle applies to many things .)

ACK latency/TCP_NODELAY is terrible-it may affect anyone who writes HTTP Request Code. But you don't have to be a genius in system programming. knowing a little about TCP helps me solve this problem, and makes me realize that I also have a responsibility for this problem. I am also using strace. Long live strace!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.