In-depth introduction to send and recv of TCP

Source: Internet
Author: User

First, define a concept: Each TCP socket has a sending buffer and a receiving buffer in the kernel, the full duplex mode of TCP and the Sliding Window of TCP depend on the two independent buffers and the filling status of the buffer. The receiving buffer caches the data into the kernel. If the application process does not call read to read the data, the data is always cached in the receiving buffer of the corresponding socket. Again, no matter whether the process reads the socket or not, the data sent from the peer end will be received by the kernel and cached to the socket's kernel receiving buffer. The read operation is to copy the data in the kernel buffer to the user's buffer at the application layer. When the process calls send to send data, it is the simplest case (also in general cases), copy the data into the socket kernel sending buffer, and then send will return in the upper layer. In other words, when sending the returned data, the data may not necessarily be sent to the peer end (similar to the write File). Sending only copies the data in the buffer layer of the application layer to the kernel sending buffer of the socket. In the future, I will use an article to introduce the kernel actions associated with read and send. Each UDP socket has a receiving buffer without a sending buffer. In terms of concept, data is sent as long as there is data. No matter whether the other party can correctly receive the data, no buffering is required.

The receiving buffer is used by TCP and UDP to cache data on the network and is kept until the application process reads it. For TCP, if the application process has not been read and the buffer is full, the action is to notify the peer TCP to close the window. This is the implementation of sliding windows. This ensures that the TCP interface receiver buffer does not overflow, thus ensuring reliable transmission of TCP. Because the recipient is not allowed to send data that exceeds the size of the advertised window. This is TCP traffic control. If the recipient ignores the window size and sends data that exceeds the window size, the receiver TCP will discard it. UDP: when the receiving buffer of the interface is full, the new datagram cannot enter the receiving buffer, and the datagram is discarded. There is no traffic control for UDP. A fast sender can easily drown out slow recipients, causing the receiver to discard the UDP datagram.

The above is the implementation of TCP reliability and UDP reliability.

TCP_CORK TCP_NODELAY

The two options are mutually exclusive. enable or disable the nagle algorithm of TCP.

A typical webserver responds to the client. The application-layer code implementation process is roughly as follows:

If (condition 1 ){

Fill in the Protocol content "Last-Modified: Sat, 04 May 2012 05:28:58 GMT" to buffer_last_modified ";

Send (buffer_last_modified );

}

If (condition 2 ){

Fill in the Protocol content "Expires: Mon, 14 Aug 2023 05:17:29 GMT" to buffer_expires ";

Send (buffer_expires );

}

...

If (condition N ){

Fill in the Protocol content "..." To buffer_N;

Send (buffer_N );

}

For this implementation, when the current http Response executes this code, assuming that M (M <= N) conditions are met, there will be M continuous send calls, does the lower layer send m tcp packets to the client in turn? The answer is no. The number of packages cannot be controlled at the application layer, and the application layer does not need to be controlled.

I will explain this answer in the following four hypothetical scenarios:

Because TCP is stream-oriented, for TCP, each TCP connection only ends with syn and fin, and the data sent in the middle has no boundaries. The only thing that multiple continuous send does is:

If the file descriptor of the socket is set to the blocking mode, and there is enough space in the sending buffer to accommodate all the data in the buffer indicated by the send application layer, the data will be transferred from the buffer at the application layer, copy it to the sending buffer of the kernel and return it.

If the file descriptor of the socket is set as the blocking method, but the sending buffer does not have enough space to accommodate all the data in the buffer of the application layer indicated by the send, the copy will be as much as the copy value, then the process hangs and waits until the TCP peer receives the buffer with free space, and notifies the TCP local end through the sliding window protocol (another function of the ACK packet-Open the window): "dear, I am ready. Now you can continue to send X bytes of data to me. "Then, the Local Kernel wake-up process continues to copy the remaining data to the sending buffer zone, in addition, the kernel sends TCP data to the TCP peer. If the data in the buffer in the application layer indicated by send cannot be completely copied this time, the process is repeated... Returns the result after all data is copied.

Please note that I used "Copy once" for sending behavior. There is no relationship between send and whether the lower layer sends data packets.

If the file descriptor of the socket is set to a non-blocking mode, and there is enough space in the sending buffer to accommodate all the data in the buffer indicated by the send application layer, the data is transferred from the buffer at the application layer, copy it to the sending buffer of the kernel and return it.

If the file descriptor of the socket is set to a non-blocking mode, but the sending buffer does not have enough space to accommodate all the data in the buffer of the application layer indicated by the send, the file descriptor can be copied as much as possible, then, the number of copied bytes is returned. One more point is involved. There are two processing methods after the return:

1. endless loop, always calling send, continuous testing, until the end (basically not so ).

2. Non-blocking with epoll or select, use these two things to test whether the socket has reached the active status that can be sent, and then call send (the processing method required by the high-performance server ).

To sum up, please refer to SO_RCVBUF and SO_SNDBUF mentioned above. You will find that in actual scenarios, how many TCP packets can be sent and how much data each packet carries, in addition to being affected by your server configurations and environmental bandwidth, the receiving status of the Peer end also affects your sending status.

The reason for saying "the application layer does not need to control sending behaviors" is:

The software system performs hierarchical processing and is divided into modules to process various software behaviors. The purpose is to perform their respective duties and divide their work. The application layer only cares about business implementation and controls the business. Data transmission is processed by a dedicated layer. In this way, the development scale and complexity of the application layer are greatly reduced, and the development and maintenance costs are also reduced.

Back to the sending topic: Previously, the application layer was unable to precisely control and completely control the sending behavior. Is that just not controlled? None! Although it cannot be controlled, it should be controlled as much as possible!

How to control as much as possible? Now we will introduce the topic TCP_CORK and TCP_NODELAY in this section.

Cork: Seed

Nodelay: Do not delay

TCP_CORK. This is a rough explanation of the load. If there is only one byte of data in each packet, to send this byte of data, and then wrap a thick layer of TCP packet outside the byte, then almost all the packets on the network are run, effective data only accounts for a small portion of the data. Many servers with high access volumes can easily exhaust their bandwidth. So, in order to increase the server load, we can use this option to indicate the TCP layer to collect as much data as possible during sending, fill them into a TCP packet and then send it out. This is in conflict with the improvement of sending efficiency. space and time are always a bunch of friends !!

TCP_NODELAY: Try not to wait. As long as there is data in the sending buffer and the sending window is open, try to send the data to the network.

Obviously, the two options are mutually exclusive. In actual scenarios, how do I select these two options? Example again

Webserver, download server (ftp Sending File Server), requires a server with a large bandwidth, using TCP_CORK.

TCP_NODELAY must be used for interaction-related servers, such as ftp-receiving command servers. The default value is TCP_CORK. Imagine that the user knocks on several bytes of command every time, and the lower layer is collecting the data and wants to wait until the data volume is too large to be sent, so that the user will wait until the User goes crazy. This bad scenario has a special term to describe ----- stick (nian pinyin) package.

Blog: http://blog.chinaunix.net/uid-29075379-id-3895700.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.