Analysis of "stick package" for TCP Data Transmission

Source: Internet
Author: User
Tags unpack

In the past two days, I have seen some questions about socket stick packages and Socket buffer settings in csdn. I found that I am not very clear about these issues, so I will check the information and record it:

I. Two simple concepts of persistent connections and short connections:
1. persistent connection

The client establishes a communication connection with the server. After the connection is established, the connection is continuously enabled, and then the packets are sent and received.

2. transient connection

The client and server communicate with each other every time a message is sent and received. After the transaction is completed, the connection is closed immediately. This method is usually used for one-to-multiple-point operations.
Communication, such as connecting multiple clients to a server.
 

2. When do I need to consider sticking packets?

1: If TCP is used to send data each time, a connection is established with the other party. After both parties send a piece of data, the connection is closed, in this way, there will be no packet Sticking Problem (because there is only one packet structure, similar to the HTTP protocol ). To close a connection, both parties must send a close connection (refer to TCP close protocol ). For example, if a needs to send a string to B, A establishes a connection with B, and then sends the default Protocol characters of both parties, such as "hello Give me something abour yourself ", after receiving the packet, B receives the data in the buffer zone and closes the connection. Therefore, the packet sticking problem does not need to be considered because it is known to be a piece of character.
2: If the sent data has no structure, such as file transmission, the sender only sends the data, and the receiver only receives the storage and does not need to consider sticking packets.
3: if both parties establish a connection, send different structured data within a certain period of time after the connection. For example, there are several structures after the connection:
(1) "hello Give me something abour yourself"
2) "Don't give me something abour yourself"
In this case, if the sender sends the two packets consecutively, the receiver may receive the packet "hello Give me something abour yourselfdon't give me something abour yourself", and the receiver will be stupid, what is it? I don't know, because the Protocol does not stipulate such a strange string, it is necessary for both parties to organize a better package structure for packet subcontracting, therefore, a packet such as the data length may be added to the header to ensure receipt.
 

3. Cause of package sticking: UDP does not stick packets during stream transmission because it has message boundary (refer to Windows Network Programming)
1. The sender needs to wait until the buffer zone is full before sending it out, resulting in sticking packets.
2. the receiver does not receive packets in the buffer zone in time, resulting in receiving multiple packets.

Solution:
To avoid sticking packets, you can take the following measures. First, you can use programming settings to avoid the packet sticking problem caused by the sender. TCP provides an operation command to force data to be transmitted immediately. After the TCP software receives the operation command, this section of data is immediately sent out, without waiting for the sending buffer to be full; second, for the sticky packet caused by the receiver, you can optimize the program design, streamline the workload of the receiving process, and increase the priority of the receiving process so that it can receive data in a timely manner, so as to avoid the phenomenon of sticking packets. The third is controlled by the recipient, A packet of data is manually controlled and received multiple times based on the structure field, and then merged to avoid sticking to the packet.

The three measures mentioned above all have their shortcomings. Although the first programming setting method can avoid the packet sticking caused by the sender, It disables the optimization algorithm, reduces the network sending efficiency, and affects the application performance. This method is generally not recommended. The second method can only reduce the possibility of sticking packets, but it cannot completely avoid sticking packets. When the sending frequency is high, or the network burst may cause a packet to arrive at the receiver quickly for a certain period of time, the receiver may still be too late to receive the package, resulting in sticking to the package. Although the third method avoids sticking packets, the application is less efficient and is not suitable for real-time applications.
SELF: http://blog.csdn.net/binghuazh/archive/2009/05/28/4222516.aspx
========================================================== ======================================

Package and package splitting for network communication
For TCP-based communication programs, a very important problem needs to be solved, namely packaging and unpacking.

I. Why does TCP-based communication program need to package and unpack.

TCP is a "stream" Protocol. A stream is a string of data with no boundaries. you can think about the flow of water in the river. There is no line between them. however, generally, communication program development requires defining independent data packets, such as the data packets used for login and the data packets used for cancellation. due to the characteristics of TCP "stream" and network conditions, the following situations may occur during data transmission.
Assume that we call two consecutive sends to send two data segments data1 and data2 respectively, and there are the following receiving conditions at the receiving end (of course, there are more than these situations, only representative cases are listed here ).
A. Receive data1 first and then data2.
B. Receive some data of data1 first, and then the rest of data1 AND ALL OF data2.
C. receive all data of data1 and some data of data2, and then the remaining data of data2.
D. All data of data1 and data2 is received at one time.

This is exactly what we need for a and we will not discuss it any more. for B, C, and D, we often say "stick packets". We need to split the received data into independent data packets. in order to unpack the package, the package must be sent.

In addition, there is no packet splitting problem for UDP, because UDP is a "packet" protocol, that is, there is a line between the two data segments, either the receiver cannot receive data or a complete segment of data, so it will not receive less or more.

2. Why is B .C. D.
The "stick package" can be sent to the sender or the receiver.
1. sticky package of the sender caused by the Nagle algorithm: The Nagle algorithm is an algorithm that improves network transmission efficiency. to put it simply, when we submit a piece of data to send to TCP, TCP does not send this segment of data immediately, but waits for a short period of time to see if there is any data to be sent during the waiting period, if yes, the two data segments will be sent at a time. this is a simple explanation of the Nagle algorithm. For more information, see relevant books. such as C and D may be caused by the Nagle algorithm.
2. when the receiving end fails to receive packets in a timely manner, TCP will store the received data in its own buffer, and then notify the application layer to fetch the data. when the application layer cannot fetch TCP data in time for some reason, several data segments are stored in the TCP buffer.

3. How to package and unpack.
When I first encountered a "stick package" problem, I used to call sleep between two send requests to sleep for a short period of time. the disadvantage of this solution is obvious, which greatly reduces the transmission efficiency and is not reliable. later, it was solved through the method of response. Although most of the time it was feasible, it could not solve the problem like B, and the method of response increased the communication volume and increased the network load. the next step is to package and unpack data packets.
Packets:
A packet is to add a packet header to a piece of data. In this way, the data packet is divided into two parts: the packet header and the packet body. (In the future, when an illegal packet is filtered, the packet will be added with the "packet end" content ). the header is actually a fixed-size struct, where a struct member variable represents the length of the package. This is a very important variable. Other struct members can be defined as needed. A complete data packet can be correctly Split Based on the variable with Fixed Header Length and packet length.
Currently, the following two methods are most commonly used for package Splitting.
1. Dynamic Buffer storage method. The reason why the buffer is dynamic is that when the data length to be buffered exceeds the buffer length, the buffer length will be increased.
The process is described as follows:
A. dynamically allocate a buffer for each connection and associate the buffer with the socket. It is commonly used to associate the buffer with a struct.
B. when the data is received, the data segment is first stored in the buffer zone.
C. Check whether the data length in the cache area is sufficient for a packet header. If not, no package splitting is performed.
D. parse the variable representing the length of the package according to the header data.
E. Determine whether the length of the data except the header in the cache is sufficient for a package. If not, no package splitting is performed.
F. Retrieve the entire data packet. here, "Fetch" means not only copying data packets from the buffer zone, but also deleting this data packet from the cache zone. the deletion method is to move the data after this package to the starting address of the buffer.

This method has two disadvantages. 1. dynamically allocates a buffer for each connection to increase memory usage. 2. there are three areas where data needs to be copied. One place is to store data in the buffer zone, the other is to extract the complete data packet from the buffer zone, and the other is to delete the data packet from the buffer zone. the second method will solve and improve these shortcomings.

The shortcomings of this method have been mentioned earlier. an improvement method is provided below, that is, ring buffering. however, this improvement method still does not solve the first drawback and the first data copy, but only solves the data copy in the third place (this place is where the most data is copied ). these two problems are solved in the 2nd unpacking methods.
The implementation scheme of the circular buffer is to define two pointers, pointing to the header and tail of valid data respectively, and moving the head and tail pointer only when storing and deleting data.

2. Use the underlying buffer zone for unpacking
TCP also maintains a buffer, so we can use the TCP buffer to cache our data, so we do not need to allocate a buffer for each connection. on the other hand, we know that both Recv and wsarecv have a parameter to indicate the length of data to be received. with these two conditions, we can optimize the first method.
To block a socket, we can use a loop to receive data with the header length, parse the variable representing the length of the package body, and then use a loop to receive data with the length of the package body.
The related code is as follows:

Char packagehead [1024];
Char packagecontext [1024*20];

Int Len;
Package_head * ppackagehead;
While (m_bclose = false)
{
Memset (packagehead, 0, sizeof (package_head ));
Len = m_tcpsock.receivesize (char *) packagehead, sizeof (package_head ));
If (LEN = socket_error)
{
Break;
}
If (LEN = 0)
{
Break;
}
Ppackagehead = (package_head *) packagehead;
Memset (packagecontext, 0, sizeof (packagecontext ));
If (ppackagehead-> ndatalen> 0)
{
Len = m_tcpsock.receivesize (char *) packagecontext, ppackagehead-> ndatalen );
}
}

M_tcpsock is a variable that encapsulates the socket class. The receivesize is used to receive data of a certain length. It is returned only when a certain length of data or network error is received.

Int winsocket: receivesize (char * strdata, int ilen)
{
If (strdata = NULL)
Return err_badparam;
Char * P = strdata;
Int Len = ilen;
Int ret = 0;
Int returnlen = 0;
While (LEN> 0)
{
Ret = Recv (m_hsocket, P + (ilen-len), ilen-returnlen, 0 );
If (ret = socket_error | ret = 0)
{

Return ret;
}

Len-= ret;
Returnlen + = ret;
}

Return returnlen;
}
For non-blocking sockets, such as the completion port, we can submit a request to receive data with the header length. When getqueuedcompletionstatus returns, we can determine whether the received data length is equal to the header length. If it is equal, the system submits a request to receive data of the length of the package body. If the value is not equal to the value, the system submits a request to receive the remaining data. when receiving the package body, a similar method is used.

SELF: http://blog.csdn.net/fjcailei/archive/2009/06/17/4276463.aspx
========================================================== ====================================
Several questions: http://www.qqgb.com/Program/VC/VCJQ/Program_200509.html
This problem arises from several programming problems:
1. When data is sent using a TCP socket, an error occurs. wsaewouldblock does not ensure that the sent data can be securely sent to the receiving end in TCP? There is also a window mechanism to prevent sending speeds from getting too fast. Why are there still errors?

2. In the TCP protocol, when a packet is sent every time a socket is used to send data, will the receiving end completely accept a packet? If one packet is sent, one packet is accepted. Why is there a problem of sticking to the package? How is it running?

3. Is the actual sending smaller than the specified sending result only in the non-blocking status? In the blocking status, will the actual sending result be smaller than the specified sending result? That is to say, the actual sending result can only be either full or not sent? In the non-blocking status, if some data is sent, how can this problem be solved? After the send function is called, the returned value is smaller than the specified one. How can this problem be solved?

4. What is the relationship between the TCP/IP protocol and socket? Is socket a TCP/IP implementation? Why does an error occur when a socket using the TCP protocol is sent? (back to the first problem, Khan)

It is a bit dizzy. If you have any questions or scores, please point out. Thank you.

--------------------------------------------------------------------------------

1st answers to this question:
1. Your buffer zone is not large enough,
2. TCP is a stream with no boundaries. It is also called a packet.
3. Blocking will also cause this phenomenon. If this phenomenon occurs, it will continue to be sent.
4. TCP is the protocol, and socket is an interface, which is not necessarily related. The error depends on the interface you are using. It has nothing to do with TCP.

--------------------------------------------------------------------------------

2nd answers to this question:
1. Your buffer zone is not large enough,
2. TCP is a stream and there is no limit. It doesn't matter if there is no limit.
3. Blocking will also cause this phenomenon. If this phenomenon occurs, it will continue to be sent.
4. TCP is the protocol, and socket is an interface, which is not necessarily related. The error depends on the interface you are using. It has nothing to do with TCP.

--------------------------------------------------------------------------------

3rd answers to this question:
1. It should not be about the buffer size. I tried to set the buffer size, but there is a problem here, that is, even if I set the buffer to several GB, it will return success, but how can it be set to be so big ,,,

3. Do you need to manually send the message when the message is not sent? Is there any specific code implementation?

4. When the TCP socket is selected to send data, does the window mechanism in TCP prevent sending speed from being too fast? Why is the socket not processed after wsaewouldblock?

--------------------------------------------------------------------------------

4th answers to this question:
1. When the non-blocking mode is used, if the system sends a buffer that is full and shows that it is sent to the peer in a timely manner, this error will occur and you can continue to try again.
3. If the sending is not complete, send the subsequent part.

--------------------------------------------------------------------------------

5th answers to this question:
1. In non-blocking mode, if the current operation cannot be completed immediately, a failure is returned. The error code is wsaewouldblock. This is normal. The program can execute other tasks first, after a while, retry the operation.
2. Sending and receiving are not one-to-one correspondence. TCP reassembles the sent data and may merge or split the data, but the sending order remains unchanged.
3. In various cases, you must determine how much data is sent based on the return value of the send statement, and then send the data after it is sent.
4. socket is a network programming interface provided by windows, and TCP/IP is a network transmission protocol. Multiple protocols can be used using socket, including TCP/IP.

--------------------------------------------------------------------------------

6th answers to this question:
Up

--------------------------------------------------------------------------------

7th answers to this question:
The sending process is: to the buffer and from the buffer to the network
Both wsaewouldblock and sticky packets are sent to the buffer zone.

========================================================== ====================================
Other articles: Solve the TCP network transmission "stick packet" problem http://blog.csdn.net/michelsn/archive/2008/01/02/2009894.aspx

This article is from: I love R & D network (52rd.com)-R & D Base Camp
Detailed Source: http://www.52rd.com/Blog/Archive_Thread.asp? SID = 1, 22621

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.