Tutorial on how to operate byte streams/binary streams in the struct module of Python, pythonstruct
Preface
Recently, I used Python to parse the MNIST dataset in the IDX file format and needed to read the binary file. I used the struct module. I checked many tutorials on the Internet and wrote quite well, but it was not very friendly to new users. So I rearranged some notes for quick start.
Note:The following four terms are synonymous: binary stream, binary array, byte stream, and byte array.
Quick Start
In the struct module, when an integer, floating point, or character stream (character array) is converted to a byte stream (byte array, you need to use the formatted string fmt to tell the struct module the type of the object to be converted. For example, the integer is 'I', the floating point number is 'F', and an ascii character is 'S '.
Def demo1 (): # Use bin_buf = struct. pack (fmt, buf) sets buf as a binary array bin_buf # Use buf = struct. unpack (fmt, bin_buf) returns the bin_buf binary array to the buf # integer-> binary stream buf1 = 256 bin_buf1 = struct. pack ('I', buf1) # 'I' indicates 'integer' ret1 = struct. unpack ('I', bin_buf1) print bin_buf1, '<===>', ret1 # Floating Point Number-> binary stream buf2 = 3.1415 bin_buf2 = struct. pack ('D', buf2) # 'D' indicates 'double' ret2 = struct. unpack ('D', bin_buf2) print bin_buf2, '<===>', ret2 # string-> binary stream buf3 = 'Hello world' bin_buf3 = struct. pack ('11s', buf3) # '11s' indicates the 'string' character array ret3 = struct. unpack ('11s', bin_buf3) print bin_buf3, '<===>', ret3 # struct-> binary stream # assume there is a struct # struct header {# int buf1; # double buf2; # char buf3 [11]; #} bin_buf_all = struct. pack ('id11s', buf1, buf2, buf3) ret_all = struct. unpack ('id11s', bin_buf_all) print bin_buf_all, '<==>', ret_all
The output result is as follows:
Demo1 output result
Detailed description of struct module
Main functions
The three most important functions in the struct module are:pack()
,unpack()
,calcsize()
# Encapsulate data into a string (in fact, a byte stream similar to a c struct) based on the given formatted string. string = struct. pack (fmt, v1, v2 ,...) # parse the byte stream string according to the given format (fmt), and return the parsed tupletuple = unpack (fmt, string) # Calculate the given format (fmt) memory offset = calcsize (fmt)
Format String in struct
The following table lists the formats supported by struct:
Format |
C Type |
Python |
Bytes |
X |
Pad byte |
No value |
1 |
C |
Char |
String of length 1 |
1 |
B |
Signed char |
Integer |
1 |
B |
Unsigned char |
Integer |
1 |
? |
_ Bool |
Bool |
1 |
H |
Short |
Integer |
2 |
H |
Unsigned short |
Integer |
2 |
I |
Int |
Integer |
4 |
I |
Unsigned int |
Integer or lon |
4 |
L |
Long |
Integer |
4 |
L |
Unsigned long |
Long |
4 |
Q |
Long |
Long |
8 |
Q |
Unsigned long |
Long |
8 |
F |
Float |
Float |
4 |
D |
Double |
Float |
8 |
S |
Char [] |
String |
1 |
P |
Char [] |
String |
1 |
P |
Void * |
Long |
|
Note 1: q and Q are only interesting when the machine supports 64-bit operations.
NOTE 2: there can be a number before each format, indicating the number
Note 3: The s format indicates a string of a certain length. 4s indicates a string of 4, but p indicates a pascal string.
Note 4: P is used to convert a pointer. Its length is related to the machine length.
Note 5: The last one can be used to indicate the pointer type, which occupies 4 bytes.
In order to exchange data with the struct in c, some c or c ++ compilers use byte alignment, which is usually a 32-bit System in 4 bytes, therefore, struct is converted in byte sequence based on the local machine. you can use the first character in the format to change the alignment. definition:
Character |
Byte order |
Size and alignment |
@ |
Native |
Native makes up 4 bytes |
= |
Native |
Standard is based on the number of original bytes |
< |
Little-endian |
Standard is based on the number of original bytes |
> |
Big-endian |
Standard is based on the number of original bytes |
! |
Network (= big-endian) |
Standard is based on the number of original bytes |
The usage is placed at the first position of fmt, like '@ 5s6sif'
Summary
The above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, you can leave a message.