Recently in the study of Python network programming this piece, in writing a simple socket communication code, encountered the use of the struct module, it was not clear at that time this has and effect, and later consulted the relevant information about, This article mainly introduces the operation of the struct module in Python, which can be used for reference by friends.
Objective
The recent use of Python to parse the mnist dataset in the IDX file format requires a read operation on the binaries, where I am using a struct module. Check the online quite a lot of tutorials are written very good, but not very friendly to the novice, so I re-organized some notes to get started quickly.
Note: the following four nouns are synonymous in the tutorial: binary streams, binary arrays, byte streams, bytes arrays
Get started quickly
In a struct module, when converting an integer number, floating-point number, or character stream (an array of characters) to a byte stream (an array of bytes), you need to use the format string fmt to tell the struct module what type of object is being converted, such as the integer number is ' I ', the floating-point number is ' F ', An ASCII code character is ' s '.
def demo1 (): # Use BIN_BUF = Struct.pack (FMT, buf) to buf a binary array bin_buf # use BUF = Struct.unpack (FMT, bin_buf) to reverse the Bin_buf binary array back into B UF # integer---binary Stream buf1 = BIN_BUF1 = Struct.pack (' i ', buf1) # ' I ' stands for ' integer ' Ret1 = Struct.unpack (' i ', bin_buf1) prin T bin_buf1, ' <====> ', Ret1 # floating point, binary stream buf2 = 3.1415 bin_buf2 = struct.pack (' d ', buf2) # ' d ' stands for ' double ' Ret2 = Struct.unpack (' d ', bin_buf2) print bin_buf2, ' <====> ', Ret2 # string-and-binary stream buf3 = ' Hello world ' bin_buf3 = Stru Ct.pack (' 11s ', buf3) # ' 11s ' stands for a length of 11 ' string ' character array ret3 = Struct.unpack (' 11s ', bin_buf3) print bin_buf3, ' <====> ', R Et3 # struct-B binary Stream # Suppose there is a struct # struct header {# int buf1; # double buf2; # char buf3[11]; #} Bin_buf_all = Struct.pac K (' id11s ', Buf1, Buf2, buf3) Ret_all = Struct.unpack (' id11s ', bin_buf_all) print Bin_buf_all, ' <====> ', Ret_all
The output results are as follows:
Demo1 Output Results
Detailed struct module
Main functions
The three most important functions in a struct module are pack()
, unpack()
calcsize()
# wraps the data into a string (actually a byte stream similar to the c struct) string = Struct.pack (FMT, V1, v2, ...) according to the given format string. # resolves a byte stream in the given format (FMT), returns the parsed tupletuple = Unpack (FMT, String) # calculates the memory offset = calcsize (FMT) that is taking up the given format (FMT)
A formatted string in a struct
The supported formats in a struct are the following table:
Format |
C Type |
Python |
Number of bytes |
X |
Pad byte |
No value |
1 |
C |
Char |
string of length 1 |
1 |
B |
Signed Char |
Integer |
1 |
B |
unsigned char |
Integer |
1 |
? |
_bool |
bool |
1 |
H |
Short |
Integer |
2 |
H |
unsigned short |
Integer |
2 |
I |
Int |
Integer |
4 |
I |
unsigned int |
Integer or Lon |
4 |
L |
Long |
Integer |
4 |
L |
unsigned long |
Long |
4 |
Q |
Long Long |
Long |
8 |
Q |
unsigned long long |
Long |
8 |
F |
Float |
Float |
4 |
D |
Double |
Float |
8 |
S |
Char[] |
String |
1 |
P |
Char[] |
String |
1 |
P |
void * |
Long |
|
Note 1:q and q are only interesting when the machine supports 64-bit operation
Note 2: There can be a number in front of each format, indicating the number of
Note 3:s format represents a certain length of string, 4s represents a string of length 4, but p represents a Pascal string
Note 4:p is used to convert a pointer whose length is related to the machine word size
Note 5: The last one can be used to represent a pointer type, accounting for 4 bytes
In order to exchange data with structs in C, it is also necessary to consider that some C or C + + compilers use byte alignment, usually 32-bit systems in 4 bytes, and therefore structs are converted according to the local machine byte order. You can change the alignment by using the first character in the format. defined as follows:
Character |
Byte Order |
Size and Alignment |
@ |
Native |
Native enough 4 bytes |
= |
Native |
Standard by original number of bytes |
< |
Little-endian |
Standard by original number of bytes |
> |
Big-endian |
Standard by original number of bytes |
! |
Network (= Big-endian) |
Standard by original number of bytes |
The use method is placed in the first position of the FMT, just like ' @5s6sif '