Recently, I was studying python network programming. when I was writing a simple socket communication code, I encountered the use of the struct module. at that time, I was not quite clear about the role of this module, later, I checked the relevant information and learned about it. This article mainly introduced the operations of the struct module on the binary stream of byte streams in Python. if you need it, you can refer to it for reference. Recently, I was studying python network programming. when I was writing a simple socket communication code, I encountered the use of the struct module. at that time, I was not quite clear about the role of this module, later, I checked the relevant information and got to know about it. This article mainly introduced the operations of the struct module on the byte stream/binary stream in Python. if you need it, you can refer to it for reference.
Preface
Recently, I used Python to parse the MNIST dataset in the IDX file format and needed to read the binary file. I used the struct module. I checked many tutorials on the Internet and wrote quite well, but it was not very friendly to new users. so I rearranged some notes for quick start.
Note:The following four terms are synonymous: binary stream, binary array, byte stream, and byte array.
Quick start
In the struct module, when an integer, floating point, or character stream (character array) is converted to a byte stream (byte array, you need to use the formatted string fmt to tell the struct module the type of the object to be converted. for example, the integer is 'I', the floating point number is 'f', and an ascii character is's '.
Def demo1 (): # use bin_buf = struct. pack (fmt, buf) sets buf as a binary array bin_buf # use buf = struct. unpack (fmt, bin_buf) returns the bin_buf binary array to the buf # integer-> binary stream buf1 = 256 bin_buf1 = struct. pack ('I', buf1) # 'I' indicates 'integer' ret1 = struct. unpack ('I', bin_buf1) print bin_buf1, '<===>', ret1 # floating point number-> binary stream buf2 = 3.1415 bin_buf2 = struct. pack ('D', buf2) # 'd' indicates 'double' ret2 = struct. unpack ('D', bin_buf2) print bin_buf2, '<===>', ret2 # string-> binary stream buf3 = 'Hello world' bin_buf3 = struct. pack ('11s', buf3) # '11s' indicates the 'string' character array ret3 = struct. unpack ('11s', bin_buf3) print bin_buf3, '<===>', ret3 # struct-> binary stream # assume there is a struct # struct header {# int buf1; # double buf2; # char buf3 [11]; #} bin_buf_all = struct. pack ('id11s', buf1, buf2, buf3) ret_all = struct. unpack ('id11s', bin_buf_all) print bin_buf_all, '<==>', ret_all
The output result is as follows:
Detailed description of struct module
Main functions
The three most important functions in the struct module are:pack()
,unpack()
,calcsize()
# Encapsulate data into a string (in fact, a byte stream similar to a c struct) based on the given formatted string. string = struct. pack (fmt, v1, v2 ,...) # parse the byte stream string according to the given format (fmt), and return the parsed tupletuple = unpack (fmt, string) # Calculate the given format (fmt) memory offset = calcsize (fmt)
Format string in struct
The following table lists the formats supported by struct:
Format |
C Type |
Python |
Bytes |
X |
Pad byte |
No value |
1 |
C |
Char |
String of length 1 |
1 |
B |
Signed char |
Integer |
1 |
B |
Unsigned char |
Integer |
1 |
? |
_ Bool |
Bool |
1 |
H |
Short |
Integer |
2 |
H |
Unsigned short |
Integer |
2 |
I |
Int |
Integer |
4 |
I |
Unsigned int |
Integer or lon |
4 |
L |
Long |
Integer |
4 |
L |
Unsigned long |
Long |
4 |
Q |
Long |
Long |
8 |
Q |
Unsigned long |
Long |
8 |
F |
Float |
Float |
4 |
D |
Double |
Float |
8 |
S |
Char [] |
String |
1 |
P |
Char [] |
String |
1 |
P |
Void * |
Long |
|
Note 1: q and Q are only interesting when the machine supports 64-bit operations.
Note 2: There can be a number before each format, indicating the number
Note 3: The s format indicates a string of a certain length. 4s indicates a string of 4, but p indicates a pascal string.
Note 4: P is used to convert a pointer. Its length is related to the machine length.
Note 5: The last one can be used to indicate the pointer type, which occupies 4 bytes.
In order to exchange data with the struct in c, some c or c ++ compilers use byte alignment, which is usually a 32-bit system in 4 bytes, therefore, struct is converted in byte sequence based on the local machine. you can use the first character in the format to change the alignment. definition:
Character |
Byte order |
Size and alignment |
@ |
Native |
Native makes up 4 bytes |
= |
Native |
Standard is based on the number of original bytes |
< |
Little-endian |
Standard is based on the number of original bytes |
> |
Big-endian |
Standard is based on the number of original bytes |
! |
Network (= big-endian) |
Standard is based on the number of original bytes |
The usage is placed at the first position of fmt, like '@ 5s6sif'
For more information about how to use the struct module to operate byte streams and binary streams in Python, see PHP!