A detailed description of the sticky packet problem in Python socket network programming

Source: Internet
Author: User
Tags ack python string format string back unpack
This article mainly introduces the Python socket network programming sticky packet problem, and now share to everyone, but also to make a reference. Come and see it together.

One, sticky bag problem details

1, only TCP has sticky packet phenomenon, UDP never sticky packet

Your program actually does not have the right to operate the network card directly, you operate the network card is through the operating system to the user program exposed interface, that every time your program to give remote data, in fact, the data from the user state copy to the kernel State, such operations are consumption of resources and time, Frequent exchange of data before the kernel State and user state is bound to lead to lower transmission efficiency, so the socket is more efficient, and the sender often collects enough data to send the data to the other party. If there are few data to send in a few consecutive times, the TCP socket will usually be sent out after the data is synthesized in a TCP segment based on the optimization algorithm, so that the receiving party receives the sticky packet data.

2, first need to master the principle of a Socket transceiver message

The sender can be 1k,1k send data and the receiving end of the application can 2k,2k extract data, of course, it may be 3k or more K extract data, that is, the application is not visible, so the TCP protocol is the interface to the flow of the Protocol, This is also prone to sticky packets and UDP is a non-connected protocol, each UDP segment is a message, the application must be in the message to extract data, not one byte of data extraction at a time, which is very similar to TCP. How do you define a message? Think the other one-time Write/send data for a message, the need to kill is when the other side send a message, no matter how fragmented Dingcheng, the TCP protocol layer will constitute the entire message of the data segment after the sorting is complete before rendering in the kernel buffer.

For example, the TCP-based socket client to the server to upload files, sent when the content of the file is sent in accordance with a paragraph of the byte stream, the receiver seems more stupid do not know where the text stream from the beginning, where the end.

3, the reason for the sticky bag

3-1 Direct reasons

The so-called sticky packet problem is mainly because the receiver does not know the boundary between the message, do not know how many bytes of data extracted at once caused by

3-2 root Causes

The sticky packets caused by the sender are caused by the TCP protocol itself, and TCP is often needed to collect enough data to send a TCP segment to improve transmission efficiency. If there are few data to send in a few consecutive times, TCP will usually send the data to a TCP segment based on the optimization algorithm , and the receiver receives the sticky packet data.

3-3 Summary

    1. TCP (Transport Control Protocol, transmission Protocol) is connection-oriented, stream-oriented and provides high reliability services. Both ends of the transceiver (client and server side) have one by one pairs of sockets, so the sending side in order to send multiple packets to the receiver, more efficient to the other side, the use of the optimization method (Nagle algorithm), the multiple interval small and small data volume data, combined into a large block of data, and then to the packet. In this way, the receiving end, it is difficult to distinguish out, must provide a scientific unpacking mechanism. That is, stream-oriented communication is a non-message-protected boundary.

    2. UDP (User Datagram Protocol, Subscriber Datagram Protocol) is non-connected, message-oriented, providing efficient service. The Block merging optimization algorithm is not used, because UDP supports a one-to-many pattern, so the receiver Skbuff (socket buffer) uses a chain structure to record each incoming UDP packet, in each UDP packet there is a message header (message source address, port and other information), so for the receiving end , it is easy to distinguish between the processing. that is, message-oriented communication is a message-protected boundary.

    3. TCP is based on data flow, so send and receive messages can not be empty, which requires the client and the server to add a null message processing mechanism to prevent the program stuck, and UDP is based on the datagram, even if you enter the empty content (direct carriage), it is not an empty message, the UDP protocol will help you encapsulate the message header, the experiment slightly

UDP Recvfrom is blocked, a recvfrom (x) must be the only one sendinto (y), after the X-byte data is completed, if the y>x data is lost, which means that UDP is not sticky packets, but will lose data, unreliable

TCP protocol data is not lost, no packets are received, the next time it is received, it continues to receive the last time, and the buffer content is always cleared when the ACK is received by the client. The data is reliable, but it will stick to the package.

Two, a sticky pack will occur in both cases:

1, the sending side needs to wait until the local buffer full after the issue, resulting in sticky packets (send data time interval is very short, the data is very small, Python uses the optimization algorithm, together, to produce sticky bag)

Client

#_ *_coding:utf-8_*_import socketbufsize=1024ip_port= (' 127.0.0.1 ', 8080) s=socket.socket (socket.af_inet,socket. SOCK_STREAM) res=s.connect_ex (ip_port) s.send (' Hello '. Encode (' Utf-8 ')) s.send (' Feng '. Encode (' Utf-8 '))

Service side

#_ *_coding:utf-8_*_from Socket Import *ip_port= (' 127.0.0.1 ', 8080) tcp_socket_server=socket (af_inet,sock_stream) TCP _socket_server.bind (Ip_port) Tcp_socket_server.listen (5) conn,addr=tcp_socket_server.accept () Data1=conn.recv (10) DATA2=CONN.RECV print ('-----> ', data1.decode (' Utf-8 ')) print ('-----> ', data2.decode (' Utf-8 ')) Conn.close ( )

2, the receiver does not accept the buffer in a timely manner, resulting in multiple packets accepted (the client sends a piece of data, the server only received a small portion of the service end of the next time or from the buffer to take the last remaining data, the resulting sticky packet) client

#_ *_coding:utf-8_*_import socketbufsize=1024ip_port= (' 127.0.0.1 ', 8080) s=socket.socket (socket.af_inet,socket. SOCK_STREAM) res=s.connect_ex (ip_port) s.send (' Hello Feng '. Encode (' Utf-8 '))

Service side

#_ *_coding:utf-8_*_from Socket Import *ip_port= (' 127.0.0.1 ', 8080) tcp_socket_server=socket (af_inet,sock_stream) TCP _socket_server.bind (Ip_port) Tcp_socket_server.listen (5) conn,addr=tcp_socket_server.accept () Data1=conn.recv (2) # Once the full data2=conn.recv #下次收的时候, the old data is taken first, and then the new print ('-----> ', data1.decode (' Utf-8 ') print ('-----> ') is taken, Data2.decode (' Utf-8 ')) Conn.close ()

Three, the case of sticky package:

Service side

Import Socketimport subprocessdin=socket.socket (socket.af_inet,socket. SOCK_STREAM) ip_port= (' 127.0.0.1 ', 8080) din.bind (ip_port) Din.listen (5) conn,deer=din.accept () DATA1=CONN.RECV (1024 ) Data2=conn.recv (1024x768) print (data1) print (DATA2)

Client:

Import Socketimport subprocessdin=socket.socket (socket.af_inet,socket. SOCK_STREAM) ip_port= (' 127.0.0.1 ', 8080) din.connect (ip_port) din.send (' HelloWorld '. Encode (' Utf-8 ')) din.send (' SB '). Encode (' Utf-8 '))

Iv. Occurrence of unpacking

When the length of the send-side buffer is greater than the MTU of the NIC, TCP splits the sent data into several packets to send past

supplementary question one: Why TCP is reliable transmission, UDP is unreliable transmission

TCP at the time of data transmission, the sending side first sends the data to its own cache, then the Protocol control to send the data in the cache to the peer, returns a ack=1 to the end, the sending side cleans up the data in the cache, returns the ack=0 to the end, then sends the data again, so TCP is reliable

While UDP sends data, the peer does not return a confirmation message, and therefore unreliable

supplementary question two:what does send (Byte stream) and recv (1024) and sendall mean?

The 1024 specified in recv means that 1024 bytes of data are taken out of the cache at a time.

The byte stream of send is put into the cache first, then the cache content is sent to the peer by the Protocol control, if the byte stream size is larger than the buffer space, then the data is lost, and the data will be called by Sendall.

Five, how to solve the problem of sticky bag?

The root of the problem is that the receiver does not know the length of the stream to be transmitted by the sender, so the way to solve the sticky packet is around, how to let the sending side before sending the data, the total size of the bytes will be sent to the receiving end to know, and then receive the end of a dead loop to receive all the data.

5-1 Simple Solutions (from a surface solution):

Add a time sleep below the client send to avoid the sticky packet phenomenon. When the server receives the time to sleep, can effectively avoid the sticky packet situation.

Client:

#客户端import socketimport timeimport subprocessdin=socket.socket (socket.af_inet,socket. SOCK_STREAM) ip_port= (' 127.0.0.1 ', 8080) din.connect (ip_port) din.send (' HelloWorld '. Encode (' Utf-8 ')) Time.sleep (3) Din.send (' SB '. Encode (' Utf-8 '))

Service side:

#服务端import socketimport timeimport subprocessdin=socket.socket (socket.af_inet,socket. SOCK_STREAM) ip_port= (' 127.0.0.1 ', 8080) din.bind (ip_port) Din.listen (5) conn,deer=din.accept () DATA1=CONN.RECV (1024 ) Time.sleep (4) data2=conn.recv (1024x768) print (data1) print (DATA2)

The above solution will certainly have a lot of flaws, because you do not know when the transmission is complete, the length of time will have problems, long words inefficient, short words inappropriate, so this method is inappropriate.

5-2 Common solutions (from a fundamental point of view):

The root of the problem is that the receiver does not know the length of the stream to be transmitted by the sender, so the way to solve the sticky packet is around, how to let the sending side before sending the data, the total size of the bytes will be sent to the receiving end to know, and then receive the end of a dead loop to receive all the data

Add a custom fixed-length header to the byte stream, the header contains the byte-stream length, and then send to the peer, then the peer receives the fixed-length header from the cache before fetching the true data.

Using a struct module to package a fixed length of 4 bytes or eight bytes, when the Struct.pack.format parameter is "I", can only package the length of a number of 10, then you may also first convert the length into a JSON string, and then package.

Normal client

# _*_ coding:utf-8 _*_ import socketimport structphone = Socket.socket (socket.af_inet,socket. Sock_stream) Phone.connect ((' 127.0.0.1 ', 8880)) #连接服while True: # send Message cmd = input (' Please enter command >>: '). Strip () if not cmd: Continue Phone.send (Cmd.encode (' Utf-8 ')) #发送 #先收报头 header_struct = phone.recv (4) #收四个 unpack_res = Struct.unpack (' i ', header_struct) total_size = unpack_res[0] #总长度 #后收数据 recv_size = 0 total_data=b "while recv_size<total_size: #循环的收 
  
   recv_data = Phone.recv (1024x768) #1024只是一个最大的限制  Recv_size+=len (recv_data) #  Total_data+=recv_data # print (' Message returned:%s '%total_data.decode (' GBK ')) Phone.close ()
  

Common service-side

# _*_ coding:utf-8 _*_ import socketimport subprocessimport structphone = Socket.socket (socket.af_inet,socket. SOCK_STREAM) #买手机phone. Bind ((' 127.0.0.1 ', 8880)) #绑定手机卡phone. Listen (5) #阻塞的最大数print (' Start runing ... ') while True: # Link Loop coon,addr = phone.accept () # Waiting to receive phone print (COON,ADDR) while True: #通信循环  # Send and receive messages  cmd = COON.RECV (1024x768) #接收的最大数 
  
   print (' Received:%s '%cmd.decode (' Utf-8 '))  #处理过程  res = subprocess. Popen (Cmd.decode (' utf-8 '), Shell = True,           stdout=subprocess. PIPE, #标准输出           stderr=subprocess. PIPE #标准错误        )  stdout = Res.stdout.read ()  stderr = Res.stderr.read ()  #先发报头 (turn to a fixed-length bytes type, so how do you turn it?) The struct module is used)  #len (stdout) + len (stderr) #统计数据的长度  Header = struct.pack (' i ', Len (stdout) +len (stderr)) #制作报头  coon.send (header)  #再发命令的结果  coon.send (stdout)  coon.send (stderr) coon.close () Phone.close ()
  


5-3 solution for the optimized version (from the root of the problem)

The idea of optimizing the problem of solving sticky packets is that the service side will optimize the header information, describe the content to be sent in a dictionary, first the dictionary cannot be transmitted directly to the network, it needs to be serialized into a JSON format string, and then transferred to the bytes format server for sending, Because the length of the JSON string in the bytes format is not fixed, a struct module is used to compress the length of the JSON string in the bytes format into a fixed length, which is sent to the client and accepted by the client, and the inverse solution gets the complete packet.

Ultimate version of the client

# _*_ coding:utf-8 _*_ import socketimport structimport jsonphone = Socket.socket (socket.af_inet,socket. Sock_stream) Phone.connect ((' 127.0.0.1 ', 8080)) #连接服务器while True: # send Message cmd = input (' Please enter command >>: '). Strip () if not cmd : Continue Phone.send (Cmd.encode (' Utf-8 ')) #发送 #先收报头的长度 Header_len = struct.unpack (' i ', PHONE.RECV (4)) [0] #吧bytes类型的反解 # In the receiving header Header_bytes = Phone.recv (header_len) #收过来的也是bytes类型 Header_json = Header_bytes.decode (' utf-8 ') #拿到json格式的字典 Header_dic = Json.loads (header_json) #反序列化拿到字典了 total_size = header_dic[' total_size '] #就拿到数据的总长度了 #最后收数据 recv_size = 0 tot Al_data=b "while recv_size<total_size: #循环的收  recv_data = phone.recv (1024x768) #1024只是一个最大的限制  Recv_size+=len ( Recv_data) #有可能接收的不是1024个字节, perhaps more than 1024,  # then receive when the reception is not complete, so also to receive the length of  total_data+=recv_data #最终的结果 print (' Message returned:%s '%total_data.decode (' GBK ')) Phone.close ()

The ultimate version of the service side

# _*_ coding:utf-8 _*_ import socketimport subprocessimport structimport jsonphone = Socket.socket (socket.AF_INET,socket . SOCK_STREAM) #买手机phone. setsockopt (socket. Sol_socket,socket. so_reuseaddr,1) Phone.bind ((' 127.0.0.1 ', 8080)) #绑定手机卡phone. Listen (5) #阻塞的最大数print (' Start runing ... ') while True: # Link Loop coon,addr = phone.accept () # waits to receive phone print (COON,ADDR) while True: #通信循环 # Send and receive messages cmd = COON.RECV (1024x768) #接收的最大数 print (' Connect The following is:%s '%cmd.decode (' utf-8 ') #处理过程 res = subprocess. Popen (Cmd.decode (' utf-8 '), Shell = True, stdout=subprocess. PIPE, #标准输出 stderr=subprocess. PIPE #标准错误) stdout = Res.stdout.read () stderr = Res.stderr.read () # make header Header_dic = {' Total_size ': Len (St  dout) +len (stderr), # Total size ' filename ': none, ' MD5 ': none} Header_json = Json.dumps (header_dic) #字符串类型 header_bytes = Header_json.encode (' utf-8 ') #转成bytes类型 (but the length is variable) #先发报头的长度 coon.send (struct.pack (' I ', Len (header_bytes))) # Sends a fixed-length header #再发报头 coon.send (header_bytes) #最后发命令的结果 coon.send (stdOut) Coon.send (stderr) coon.close () Phone.close () 

VI, struct module

People who know the C language must know the role of struct structure in C, which defines a structure that contains different types of data (Int,char,bool, etc.) to facilitate the processing of a structure object. In the network communication, most of the data transmitted is the binary stream (binary data) exists. When passing strings, you don't have to worry about too many problems, and when you pass basic data such as int, char, you need a mechanism to package certain struct types into a binary stream string and then transfer it over the network. The receiving end should also be able to unpack the original structure data by some mechanism. The struct module in Python provides a mechanism for converting a Python primitive type value to a C struct type represented in a Python string format (this module performs conversions Between Python values and C structs represented as Python strings.). The Stuct module provides a very simple number of functions, which are written in a few examples below.

1, basic pack and unpack

The struct provides packaging and unpacking of data in format specifier (Packing and unpacking). For example:

#该模块可以把一个类型, such as numbers, to the fixed-length bytes type import struct# res = struct.pack (' i ', 12345) # Print (Res,len (res), type (res)) #长度是4res2 = Struct.pack (' i ', 12345111) print (Res2,len (res2), type (res2)) #长度也是4unpack_res =struct.unpack (' i ', res2) print (Unpack_ RES) # (12345111,) # Print (Unpack_res[0]) #12345111

Code, first defines a tuple of data, containing int, string, float three data types, and then defines the struct object, and has the format ' i3sf ', I means int,3s represents three character length of the string, F for float. Finally, packaging and unpacking is done through the pack and unpack of the struct. Through the output can be found that the value of the pack after the conversion to a binary byte string, and unpack can convert the byte string back to a tuple, but it is worth noting that the accuracy of float has changed, which is determined by some such as the operating system and other objective factors. The number of bytes that are consumed after packaging is very similar to the struct in C.

2, define format to refer to the table provided by the official API:

3, Basic usage

Import json,struct# assumes that the 1t:1073741824000 file is uploaded through the client a.txt# to avoid sticky packets, the header must be customized header={' file_size ': 1073741824000, ' file_name ' : '/a/b/c/d/e/a.txt ', ' MD5 ': ' 8F6FBF8347FAA4924A76856701EDB0F3 '} #1T数据, file path and MD5 value # in order for the header to be transmitted, it needs to be serialized and converted to Byteshead_bytes =bytes (header), encoding= ' Utf-8 ') #序列化并转成bytes, for transmission # In order for the client to know the length of the header, use struck to convert the number of the header length to a fixed length: 4 bytes Head_len _bytes=struct.pack (' I ', Len (head_bytes)) #这4个字节里只包含了一个数字, the number is the length of the header # The client starts sending the Conn.send (head_len_bytes) #先发报头的长度, 4 Bytesconn.send (head_bytes) #再发报头的字节格式conn. Sendall (file contents) #然后发真实内容的字节格式 # Server start receiving HEAD_LEN_BYTES=S.RECV (4) # First, the header 4 bytes, the byte format of the header length X=struct.unpack (' i ', head_len_bytes) [0] #提取报头的长度head_bytes =s.recv (x) #按照报头长度x, The bytes format of the receiving header is Header=json.loads (Json.dumps (header)) #提取报头 # finally extracts real data based on the contents of the header, such as REAL_DATA_LEN=S.RECV (header[' File_ Size ']) s.recv (Real_data_len)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.