Python uses struct to process binary data (pack and unpack usage). sometimes, python must be used to process binary data, such as file access and socket operations. in this case, you can use the python struct module. you can use struct to process struct in C language.
The three most important functions in the struct module are pack (), unpack (), and calcsize ()
Pack (fmt, v1, v2,...) encapsulates data into strings in a given format (actually a byte stream similar to a c struct)
Unpack (fmt, string) parses the byte stream string according to the given format (fmt) and returns the parsed tuple.
Calcsize (fmt) calculates the number of bytes of memory occupied by a given format (fmt ).
The following table lists the formats supported by struct:
Format C Type Python byte count
X pad byte no value 1
C char string of length 1 1
B signed char integer 1
B unsigned char integer 1
? _ Bool bool 1
H short integer 2
H unsigned short integer 2
I int integer 4
I unsigned int integer or long 4
L long integer 4
L unsigned long 4
Q long 8
Q unsigned long 8
F float 4
D double float 8
S char [] string 1
P char [] string 1
P void * long
Note 1. q and Q are only interesting when the machine supports 64-bit operations
Note 2. there can be a number before each format, indicating the number
Note: the 3. s format indicates a string of a certain length. 4s indicates a string of 4, but p indicates a pascal string.
Note 4. P is used to convert a pointer. Its length is related to the machine's word length.
Note 5. the last one can be used to indicate the pointer type, which occupies 4 bytes.
In order to exchange data with the struct in c, some c or c ++ compilers use byte alignment, which is usually a 32-bit system in 4 bytes, therefore, struct is converted in byte sequence based on the local machine. you can use the first character in the format to change the alignment. definition:
Character Byte order Size and alignment
@ Native: 4 bytes
= Native standard
<Little-endian standard based on the number of original bytes
> Big-endian standard: number of original bytes
! Network (= big-endian)
Standard is based on the number of original bytes
The usage is placed at the first position of fmt, like '@ 5s6sif'
Example 1:
The struct is as follows:
struct Header{ unsigned short id; char[4] tag; unsigned int version; unsigned int count;}
The above struct data is received through socket. recv, which exists in string s. now you need to parse it. you can use the unpack () function:
import structid, tag, version, count = struct.unpack("!H4s2I", s)
In the format string above ,! It indicates that we want to use network byte sequence resolution, because our data is received from the network and transmitted over the network. H indicates an unsigned short id, 4s indicates a 4-byte long string, and 2I indicates that there are two unsigned int types of data.
Through an unpack, we have saved our information in id, tag, version, and count.
Similarly, you can easily pack local data into the struct format:
ss = struct.pack("!H4s2I", id, tag, version, count);
The pack function converts id, tag, version, and count into struct headers in the specified format. ss is now a string (actually a byte stream similar to a c struct) and can use socket. send (ss) sends this string.
Example 2:
Import structa = 12.34 # Convert a to binary bytes = struct. pack ('I',)
In this case, bytes is a string, and the bytes of the string are the same as the binary storage content of.
Then perform a reverse operation to convert the existing binary data bytes (actually a string) to the python data type in turn:
# Note: The unpack returns tuple !!
a,=struct.unpack('i',bytes)
If it is composed of multiple data, you can do this:
a='hello'b='world!'c=2d=45.123bytes=struct.pack('5s6sif',a,b,c,d)
In this case, bytes is binary data. you can directly write data to a file, such as binfile. write (bytes)
Then, we can read it again when needed, bytes = binfile. read ()
Then, the python variable is decoded by struct. unpack:
a,b,c,d=struct.unpack('5s6sif',bytes)
'5s6sif' is called fmt. it is a formatted string consisting of numbers and characters. 5s indicates the five-character string, 2i indicates two integers, and so on, the following are available characters and types. the ctype can correspond to the types in python.
Note: problems encountered during binary file processing
When processing binary files, use the following method:
Binfile = open (filepath, 'RB') # read binary file binfile = open (filepath, 'wb') # write binary files
So what is the difference between the result of binfile = open (filepath, 'r?
There are two differences:
First, if you encounter '0x1a 'when using 'R', it is regarded as the end of the file, which is EOF. This problem does not exist when 'RB' is used. That is, if you use binary data to write and then read the data in text, if '0x1a 'exists, only part of the file will be read. When 'RB' is used, it will always read at the end of the file.
Second, for string x = 'ABC \ ndef ', we can use len (x) to get its length of 7. \ n is called a line break, which is actually '0x0a '. When we use 'w' as the text writing method, '0x0a' is automatically changed to two characters '0x0d' and '0x0a' on windows ', that is, the file length is actually 8 .. When reading in 'r' text, it is automatically converted to the original line break. If it is written in 'WB 'binary format, it will keep one character unchanged and read as is. Therefore, if you write data in text and read data in binary mode, consider the extra byte. '0x0d' is also called a carriage return. Linux does not change. Because linux only uses '0x0a' to indicate line breaks.