Tcp_nodelay and Tcp_cork Nagle algorithms and CORK algorithms

Source: Internet
Author: User
Tags ack

Tcp_nodelay

By default, the sending data takes the Nagle algorithm. This improves network throughput, but is less real-time, and is not allowed in some highly interactive applications, and using the Tcp_nodelay option disables the Nagle algorithm. At this point, each packet that the application submits to the kernel is immediately sent out. It is important to note that although the Nagle algorithm is forbidden, the transmission of the network is still affected by the TCP acknowledgement delay mechanism. Tcp_cork

The so-called cork is the meaning of the plug, the image of the understanding is to use the cork will plug the connection, so that the data first not sent out, wait until the plug after the hair out. When this option is set, the kernel will try to stitch the small packet into a large packet (an MTU) and send it, of course, if after a certain amount of time (typically 200ms, the value remains to be confirmed), the kernel still does not combine an MTU and must send the existing data (it is impossible to keep the data waiting).
However, the implementation of tcp_cork may not be as perfect as you might think, and Cork will not completely plug the connection. The kernel does not really know when the application layer will send the second batch of data to the first batch of data to achieve the size of the MTU, so the kernel will give a time limit, when the time is not stitched into a large package (trying to approach the MTU), the kernel will be sent unconditionally. That is, if the application layer program to send packet data is not short enough, Tcp_cork does not have a bit of effect, but the real-time loss of data (each packet data will delay a certain time to send).


Nagle algorithm

In the TCP/IP protocol, no matter how much data is sent, always precede the data with the protocol header, and the other side receives the data, and also needs to send an ACK to indicate confirmation. To make the best use of network bandwidth, TCP always wants to send enough data as large as possible. (The MSS parameter is set for a connection, so TCP/IP wants to be able to send data in MSS-sized chunks each time). The Nagle algorithm is designed to send large chunks of data as much as possible to avoid flooding the network with small chunks of data.
The basic definition of the Nagle algorithm is at any time, there can be at most one unconfirmed small segment。 The so-called "small paragraph", refers to the size of the data block is less than MSS, so-called "unconfirmed", refers to a data block sent out, did not receive an ACK sent by the other party to confirm that the data has been received.
The rules of the Nagle algorithm (refer to the Tcp_nagle_check function note in the tcp_output.c file): (1) If the packet length reaches MSS, it is allowed to send, (2) If it contains fin, it is allowed, and (3) The Tcp_nodelay option is set. (4) When the Tcp_cork option is not set, if all packets sent out (packet length is less than MSS) are confirmed, it is allowed to send, (5) The above conditions are not satisfied, but a timeout (typically 200ms) is sent immediately.
The Nagle algorithm only allows a packet that is not ACK to exist in the network, it does not control the size of the packet, so it is actually an extended stop-and-wait protocol, except that it is based on packet stop-and so on, not based on byte-stop-and so on. The Nagle algorithm is entirely determined by the TCP protocol's ACK mechanism, which leads to some problems, such as if the end ACK reply is very fast, Nagle actually does not splice too many packets, although the network congestion is avoided, the overall utilization of the network is still very low. The Nagle algorithm is a half-set of silly window syndrome (SWS) prevention algorithms. The SWS algorithm prevents the sending of a small amount of data, the Nagle algorithm is its implementation in the sender, and the receiver does not advertise a small increase in buffer space, without notifying the small window, unless there is a significant increase in buffer space. The significant growth here is defined as a full-sized segment (MSS) or growing to half the maximum window.
Note: The BSD implementation is allowed to send large write operations on the idle link to the last small segment, that is, when more than 1 MSS data sent, the kernel first sent a packet of N MSS, and then sent the tail of the small packets, in the meantime no longer wait for delay. (Assuming the network is not blocking and the receiving window is large enough) As An example ,, such as experiments in previous blogs, the write operation of a client-side call to the socket writes an int data (called a block) to the network, since the connection is idle (that is, there are no small segments that have not been confirmed), So the int data is sent to the server immediately, and then the client calls the write operation to the ' \r\- N ' (for short, b), this time, the ACK of a block does not return, so it can be thought that there is an unconfirmed small segment, so B block is not immediately sent, waiting for a block ACK received (after about 40ms), B is sent. The whole process: here also hides a problem, is a block of data ack why 40ms before receiving? This is because TCP/IP does not only have the Nagle algorithm, but also a TCP acknowledgement delay mechanism。 When the server side receives the data, it does not immediately send an ACK to the client side, but rather delays the sending of the ACK for a period of time (assuming T), and it expects that the server side will send the answer data to the client side within the T time, so that the ACK can be sent with the reply data. It's like answering data with an ACK in the past. In the time before me, T is probably 40ms. This explains why ' \ r \ n ' (Block B) is always 40ms after a block.
Of course, the TCP acknowledgement delay of 40ms is not constant, the TCP connection delay confirmation time is generally initialized to a minimum of 40ms, and then based on the connection retransmission time-out (RTO), the last received packet and the time interval of the received packet and other parameters are continuously adjusted. You can also cancel the acknowledgement delay by setting the Tcp_quickack option.


Cork algorithm

The Nagle algorithm and the cork algorithm are very similar, but their focus is different, Nagle algorithm mainly avoids the network because too many small packets (the proportion of the protocol head is very large) and congestion, and cork algorithm is to improve the network utilization, so that the overall protocol head occupies as small proportion as possible. So it seems that the two in order to avoid sending packets on the same, at the user control level, the Nagle algorithm is completely unaffected by the user socket control, you can simply set Tcp_nodelay and disable it, Cork algorithm is also set or clear tcp_cork enable or disable the , however, the Nagle algorithm is concerned about the network congestion problem, as long as all the ACK back to the contract, while the cork algorithm can be concerned about the content, before and after the packet send interval is very short premise (very important, otherwise the kernel will help you distribute the scattered packets), even if you are scattered to send multiple small packets, You can also do this by enabling the Cork algorithm to stitch the contents into a package, and if you use the Nagle algorithm at this point, you may not be able to do so.


In Java programming, you can use the

Settcpnodelay
Settcpnodelay (Boolean on)                   Throws SocketException
Enable/disable Tcp_nodelay (Enable/disable Nagle algorithm).

Parameters:
on-to true indicate that Tcp_nodelay is enabled; false
Thrown:
SocketException-If there is an error in the underlying protocol, such as a TCP error.
Start from the following versions:
JDK1.1
See also:
getTcpNoDelay()

To set whether Tcp_nodelay is enabled, increasing network throughput while reducing the network's real-time nature.












Tcp_nodelay and Tcp_cork Nagle algorithms and CORK algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.