Socket programming practices in Linux (4) TCP packet sticking problems and common solutions

Source: Internet
Author: User
Tags socket error

Socket programming practices in Linux (4) TCP packet sticking problems and common solutions

Generation of TCP packet sticking problems

Because the TCP protocol is a byte stream-based and borderless transmission protocol, it is very likely to cause a packet sticking problem. In addition, the adhesive packet caused by the sender is caused by the TCP protocol itself. To improve transmission efficiency, the sender often needs to collect enough data before sending a TCP segment. If few data needs to be sent several times in a row, TCP will usually combine the data into a TCP Segment Based on the optimization algorithm and send it once, however, the receiver does not know how many bytes of data to be received at a time, so that the receiver receives the sticky packet data. For details, see:

 

Assume that host A sends two messages (M1 and M2) each 10 KB to host B. because the number of bytes extracted by host B is unknown, the receiver may extract data:

? Extract 20 k data at a time
? Extracted twice, 5 k for the first time, 15 k for the second time
? Extracted twice, 15 k for the first time and 5 k for the second time
? Extract twice, the first 10 k, and the second 10 k (this is correct only)
? Three times of extraction, the first 6 k, the second 8 k, the third 6 k
? Any other possibilities

There are multiple causes of the stick package problem:

 

1. SQ_SNDBUF socket itself has a buffer size limit (sending buffer and receiving buffer)

2. limits on the size of the MSS client transmitted over TCP

3. The link layer also has the MTU size limit. If the packet size is greater than> MTU, the data must be split at the IP layer, resulting in data splitting.

4. TCP traffic control and congestion control may also lead to packets sticking

5. TCP latency validation mechanism mentioned at the beginning of the article

 

Note:About MTU and MSS

MSS refers to a concept in TCP. MTU is a concept that is not fixed to a specific OSI Layer and is not restricted by other specific protocols. That is to say, the second layer has MTU, the third layer has MTU, and the second layer protocol such as MPLS also has its own MTU value. There are associations between different layers. For example, if you want to migrate data, You need to pack the data and use the car to transport the data. In this case, the car size is limited by the road width; the box size is limited by the car; the size of the items that can be carried is limited by the box. At this time, we can understand the width of the road as the MTU of the second layer, the size of the car as the MTU of the third layer, the size of the box as the MTU of the fourth layer, and the carrying things as MSS.

Solution to the packet Sticking Problem (essentially the boundary between messages and messages must be maintained at the application layer)

 

(1) Fixed Length package

This method is not practical: If the defined length is too long, it will waste network bandwidth and increase the network burden. If the defined length is too short, one message will be split into multiple messages, the merging overhead is added only at the layer of TCP applications.

(2) Add \ r \ n at the end of the package (FTP usage Plan)

If the message itself contains \ r \ n characters, the boundary of the message is also unclear;

(3) packet length + packet content, custom packet structure

(4) more complex application layer protocols

Note: simply enable TCP_NODELAY with setsockopt settings to disable Nagle's Algorithm can solve the above 5th problems (delay confirmation mechanism ).

 

static void _set_tcp_nodelay(int fd) {    int enable = 1;    setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (void*)&enable, sizeof(enable));}
The famous Nginx server has enabled this option by default .....

 

 

Because the TCP protocol is stream-oriented, the returned values of read and write calls are usually smaller than the number of bytes specified by the parameter. For read calls (socket flag is blocked), if the receiving buffer contains 20 bytes and the request reads 100 bytes, 20 is returned. For write calls, if the request writes 100 bytes, the sending buffer contains only 20 bytes of free space, so the write will be blocked until all 100 bytes are handed over to the sending buffer. After the signal is interrupted, it needs to be processed to continue reading and writing; to avoid interfering with the logic of the main program and ensure the number of bytes requested for reading and writing, we have implemented two packaging functions: readn and writen, as shown below.

 

/** Implementation: the two functions only call read and write systems multiple times as needed until count data is read/written. ** // ** return value description: = count: indicates that the returned result is correct. count bytes have been actually read =-1: An error occurred while reading. <count: read to the end **/ssize_t readn (int fd, void * buf, size_t count) {size_t nLeft = count; ssize_t nRead = 0; char * pBuf = (char *) buf; while (nLeft> 0) {if (nRead = read (fd, pBuf, nLeft) <0) {// if the read operation is interrupted by a signal, it means you can continue to read if (errno = EINTR) continue; // otherwise, else return-1;} // read else if (nRead = 0) return count-nLeft; // read nLeft-= nRead; pBuf + = nRead;} return count ;}

/** Return value description: = count: indicates that the return value is correct. count bytes have been actually written =-1: An error occurred while writing. **/ssize_t writen (int fd, const void * buf, size_t count) {size_t nLeft = count; ssize_t nWritten = 0; char * pBuf = (char *) buf; while (nLeft> 0) {if (nWritten = write (fd, pBuf, nLeft) <0) {// if the write operation is interrupted by a signal, it means you can continue to write if (errno = EINTR) continue; // otherwise, else return-1 is another error ;} // if = 0, it indicates that nothing is written. You can continue to write else if (nWritten = 0) continue; // nLeft-= nWritten; pBuf + = nWritten;} return count ;}

 

Packet Length + packet content (custom packet structure)

When a message is sent: the first four bytes + the message content is sent at one time;

When receiving a message: Read the first four bytes to obtain the message content length. Read data according to the length.

Custom package structure:
Struct Packet {unsigned int msgLen; // The length of the Data part (Note: This is the network byte order) char text [1024]; // The data part of the message };
// Echo client sending and receiving code... struct Packet buf; memset (& buf, 0, sizeof (buf); while (fgets (buf. text, sizeof (buf. text), stdin )! = NULL) {/** write part **/unsigned int lenHost = strlen (buf. text); buf. msgLen = htonl (lenHost); if (writen (sockfd, & buf, sizeof (buf. msgLen) + lenHost) =-1) err_exit ("writen socket error");/** read part **/memset (& buf, 0, sizeof (buf )); // first read the header ssize_t readBytes = readn (sockfd, & buf. msgLen, sizeof (buf. msgLen); if (readBytes =-1) err_exit ("read socket error"); else if (readBytes! = Sizeof (buf. msgLen) {cerr <"server connect closed... \ nexiting... "<endl; break;} // then read the data section lenHost = ntohl (buf. msgLen); readBytes = readn (sockfd, buf. text, lenHost); if (readBytes =-1) err_exit ("read socket error"); else if (readBytes! = LenHost) {cerr <"server connect closed... \ nexiting... "<endl; break;} // print the data part to the output cout <buf. text; memset (& buf, 0, sizeof (buf ));}...

// The improved code void echo (int clientfd) {struct Packet buf; int readBytes; // read the first while (readBytes = readn (clientfd, & buf. msgLen, sizeof (buf. msgLen)> 0) {// network byte order-> host byte order int lenHost = ntohl (buf. msgLen); // then read the data part readBytes = readn (clientfd, buf. text, lenHost); if (readBytes =-1) err_exit ("readn socket error"); else if (readBytes! = LenHost) {cerr <"client connect closed... "<endl; return;} cout <buf. text; // then write it back to socket if (writen (clientfd, & buf, sizeof (buf. msgLen) + lenHost) =-1) err_exit ("write socket error"); memset (& buf, 0, sizeof (buf ));} if (readBytes =-1) err_exit ("read socket error"); else if (readBytes! = Sizeof (buf. msgLen) cerr <"client connect closed..." <endl ;}
Note:It is necessary to convert the network byte sequence to the local byte sequence. Read by row (determined by \ r \ n)
ssize_t recv(int sockfd, void *buf, size_t len, int flags);  ssize_t send(int sockfd, const void *buf, size_t len, int flags);  
Compared with read, recv can only be used for socket file descriptors, but with one more flags, this flags can help us solve the problem of sticking packets.

MSG_PEEK (data can be read but not read from the cache area [just a glimpse]. This feature allows you to easily read data by row; read data by character, it is inefficient to call the read method multiple times, but '\ n' can be determined ').

This flag causes the receive operation to return data from the beginning

The receive queue without removing that data from the queue. Thus, a subsequent

Receive call will return the same data.

 

Readline Implementation ideas:

In the readline function, we first use recv_peek to see the number of characters in the buffer zone and read the pBuf, and then check whether there is a linefeed '\ n '. If yes, use readn to read the data together with the linefeed (equivalent to clearing the socket buffer). If no, clear the buffer and move the position of pBuf to the beginning of the while loop. Note: When we call readn to read data, the buffer is cleared because readn calls the read function. Note that if '\ n' is read for the second time, returnCount is used to save the number of characters read for the first time, and the returned ret must be added with the original data size.


 

/** Example: encapsulate a recv_peek function through MSG_PEEK (only view data, but not take it away) **/ssize_t recv_peek (int sockfd, void * buf, size_t len) {while (true) {int ret = recv (sockfd, buf, len, MSG_PEEK); // If the recv is interrupted by a signal, continue) view if (ret =-1 & errno = EINTR) continue; return ret ;}}/** use recv_peek to read readline by row (only for socket) ** // ** return value description: = 0: end-to-end close =-1: read error other: number of bytes in a row (including '\ n ') **/ssize_t readline (int sockfd, void * buf, si Ze_t maxline) {int ret; int nRead = 0; int returnCount = 0; char * pBuf = (char *) buf; int nLeft = maxline; while (true) {ret = recv_peek (sockfd, pBuf, nLeft); // if the query fails or the peer is disabled, if (ret <= 0) return ret; nRead = ret; for (int I = 0; I <nRead; ++ I) // The buffer currently viewed contains '\ n ', it indicates that a row of if (pBuf [I] = '\ n') {// read the buffer content // note that it is I + 1: read '\ n' from ret = readn (sockfd, pBuf, I + 1); if (ret! = I + 1) exit (EXIT_FAILURE); return ret + returnCount;} // If '\ n' is not found in the message, it indicates that one message is not met, // after reading this message from the buffer, You need to continue viewing ret = readn (sockfd, pBuf, nRead); if (ret! = NRead) exit (EXIT_FAILURE); pBuf + = nRead; nLeft-= nRead; returnCount + = nRead;} // if the program can go here, return-1 ;}

Client:
...      char buf[512] = {0};      memset(buf, 0, sizeof(buf));      while (fgets(buf, sizeof(buf), stdin) != NULL)      {          if (writen(sockfd, buf, strlen(buf)) == -1)              err_exit("writen error");          memset(buf, 0, sizeof(buf));          int readBytes = readline(sockfd, buf, sizeof(buf));          if (readBytes == -1)              err_exit("readline error");          else if (readBytes == 0)          {              cerr << "server connect closed..." << endl;              break;          }          cout << buf;          memset(buf, 0, sizeof(buf));      }  ...  

Server:
void echo(int clientfd)  {      char buf[512] = {0};      int readBytes;      while ((readBytes = readline(clientfd, buf, sizeof(buf))) > 0)      {          cout << buf;          if (writen(clientfd, buf, readBytes) == -1)              err_exit("writen error");          memset(buf, 0, sizeof(buf));      }      if (readBytes == -1)          err_exit("readline error");      else if (readBytes == 0)          cerr << "client connect closed..." << endl;  }  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.