TCP Transmit small packet efficiency issues (translated from MSDN)

Source: Internet
Author: User
Tags ack socket

TCP Transmit small packet efficiency issues (translated from MSDN)

Http://www.ftpff.com/blog/?q=node/16

Summary: When using TCP to transmit small packets, the design of the program is quite important. If the TCP packet is not in the design scenario, the
Delay response, Nagle algorithm, and the importance of Winsock Buffering will seriously affect the performance of the program. This article discusses these
Problem, two cases are listed, and some optimized design schemes for transmitting small packets are given.

Background: When a packet is received by the Microsoft TCP stack, a 200-millisecond timer is started. When the ACK Acknowledgement packet
is issued, the timer resets, and when the next packet is received, a timer of 200 milliseconds is started again. To enhance the application's
transport performance on the intranet and the Internet, the Microsoft TCP stack uses the following policy to decide when to send ACK acknowledgement packets after a packet is received
:
1, if the timer expires in 200 milliseconds, When the next packet is received, an ACK acknowledgement packet is sent immediately.
2, if there is currently a packet that needs to be sent to the receiving end of the ACK acknowledgement, the ACK acknowledgement is attached to the packet and sent immediately.
3, when the timer expires, the ACK acknowledgement message is sent immediately.
To avoid a small packet congestion network, the Microsoft TCP stack is enabled by default with the Nagle algorithm, which is capable of stitching the data sent by the application multiple times
call send, and sends it together when the ACK acknowledgment information for the previous packet is received. The following are exceptions to the Nagle
algorithm:
1, if the Microsoft TCP stack is stitching up packets that exceed the MTU value, this data is sent immediately without waiting for the ACK acknowledgement of the previous data
package. In Ethernet, the MTU (Maximum transmission Unit) value of TCP is 1460 bytes.
2, if the Tcp_nodelay option is set, the Nagle algorithm is disabled, and the packets sent by the application call send are immediately
posted to the network without delay.
to optimize performance at the application layer, Winsock copies the data sent by the application call send from the application's buffer to the Winsock
kernel buffer. The Microsoft TCP stack uses a method similar to the Nagle algorithm to decide when to actually post data to the network. The default size of the
kernel buffer is 8K, and you can change the size of the Winsock kernel buffer using the SO_SNDBUF option. If necessary,
Winsock can buffer data larger than the SO_SNDBUF buffer size. In most cases, the application completing the Send call simply indicates that the data
was copied to the Winsock kernel buffer and does not indicate that the data was actually posted to the network. The only exception is if the Winsock kernel buffer is disabled by the
by setting So_sndbut to 0.

Winsock uses the following rules to indicate to the application the completion of a send call:
1. If the socket is still within the SO_SNDBUF limit, Winsock copies the data sent by the application to the kernel buffer, completing the send call.
2. If the socket exceeds the SO_SNDBUF limit and there is only one buffered send data in the kernel buffer, the Winsock copy is sent
Data to the kernel buffer, complete the send call.
3. If the socket exceeds the SO_SNDBUF limit and the kernel buffer has more than one buffered send data, Winsock replicates the data to be sent
To the kernel buffer, and then post the data to the network until the socket drops to the SO_SNDBUF limit or only one of the remaining data to be sent
Complete the send call.

Case 1
A Winsock TCP client needs to send 10,000 records to the Winsock TCP server side and save to the database. Record size from 20 bytes to 100
bytes are not equal. For simple application logic, the possible design scenarios are as follows:
1. The client is sent in blocking mode and the server is received in blocking mode.
2, the client set SO_SNDBUF to 0, disable the Nagle algorithm, let each packet sent separately.
3. The service side calls recv in a loop to receive the packet. Pass a 200-byte buffer to recv for each record in a recv call
Be acquired to.

Performance:
In the test found that the client can only send 5 data per second to the service segment, a total of 10,000 records, about 976K bytes, spent more than half an hour
Are all uploaded to the server.

Analysis:
Because the client does not have the Tcp_nodelay option set, the Nagle algorithm forces the TCP stack to wait for the ACK acknowledgement of the previous packet before sending the packet
Information. However, the client setting So_sndbuf is 0, and the kernel buffer is disabled. Therefore, 10,000 send calls can only one packet of one data
The packet is sent and acknowledged for the following reasons, each ACK acknowledgment information is delayed by 200 milliseconds:
1. When the server gets to a packet, start a 200-millisecond timer.
2, the server does not need to send any data to the client, so, ACK confirmation information can not be sent back to the packet to carry.
3. The client cannot send a packet until it has received a confirmation from the previous packet.
4. After the timer on the server expires, the ACK acknowledgement message is sent to the client.

How to Improve performance:
There are two problems in this design. First, there is a delay problem. The client needs to be able to send two packets to the server within 200 milliseconds.
Because the client uses the Nagle algorithm by default, the default kernel buffer should be used, and the SO_SNDBUF should not be set to 0. Once the TCP
If the stack is connected to a packet that exceeds the MTU value, the packet is immediately sent without waiting for the previous ACK acknowledgement. Second, this design
The scenario calls send once for every packet that is so small. It's not very efficient to send such a small packet. In this case, you should
Each record is replenished to 100 bytes and each call to send sends 80 records. In order for the server to know how many records were sent at a time,
The client can bring a header message in front of the record.

Case TWO:
A Winsock TCP client program opens two connections and a Winsock TCP server-side communication that provides a stock quote service. First connection
Used as a command channel to transfer stock numbers to the service side. The second connection is used as a data channel to receive stock quotes. After two connections are established,
The client sends the stock number to the server via the command channel and waits for the returned stock quote information on the data channel. The client receives the first
Stock quote information After sending the next stock number request to the service side. Neither the client nor the server is set SO_SNDBUF and Tcp_nodelay
Options.

Performance:
The test found that the client can only get 5 quote information per second.

Analysis:

This design allows only one stock information to be obtained at a time. The first stock number message is sent to the server via the command channel, immediately receiving
The stock quote information returned by the service side through the data channel. The client then immediately sends the second request message, and the send call returns immediately,
The sent data is copied to the kernel buffer. However, the TCP stack cannot immediately post this packet to the network because no previous packet was received
ACK acknowledgement information. After 200 milliseconds, the server-side timer expires, and the ACK acknowledgment information for the first request packet is sent back to the client, the client
A second request package is posted to the network. The quotation information for the second request is immediately returned from the data channel to the client, because at this point the client's
The timer has timed out and the ACK acknowledgement information for the first quote message has been sent to the server. This process occurs in cycles.

How to Improve performance:
Here, the design of two connections is not necessary. If a connection is used to request and receive quotation information, the ACK confirmation information for the stock request is
The returned quote information is immediately carried back. To further improve performance, the client should call send once to send multiple stock requests, the server
Returns more than one quote information at a time. If two one-way connections must be used for some special reason, both the client and the server should be set Tcp_nodelay
option that allows small packets to be sent immediately without waiting for the ACK acknowledgment information of the previous packet.

Recommendations for improved performance:
The two cases above illustrate some of the worst cases. When designing a solution to solve a large number of small packets sent and received, the following recommendations should be followed:
1, if the data fragments do not need urgent transmission, the application should be stitching them into a larger block of data, and then call send. Because the Send buffer
is likely to be copied to the kernel buffer, so the buffer should not be too large, usually a little bit smaller than 8K is very efficient. As long as the Winsock kernel buffer
To get a block of data greater than the MTU, a number of packets are sent, leaving the last packet. The sender, in addition to the last packet, will not
triggered by a timer of 200 milliseconds.
2, if possible, avoid one-way socket data stream.
3. Do not set SO_SNDBUF to 0 unless you want to ensure that packets are posted to the network immediately after the call to send is complete. In fact, a 8K buffer is suitable for most
Situation, it does not need to be re-changed unless the newly set buffer is tested and is indeed more efficient than the default size.
4, if the data transmission is not guaranteed reliability, using UDP.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.