Python provides a struct module for package unpacking
---------------------------------------------------------------------------
The main methods of the module are:
Struct.pack (Fmt,v1,v2,.....)
The values of the parameters such as V1,v2 are packaged in a layer, and the packaging method is specified by the FMT. The parameters to be packaged must be in strict accordance with FMT. Finally, a wrapped string is returned.
For example:
>>>import struct
>>>a = 20
>>>b = 200
>>>buff = Struct.pack (' II ', A, b) #转换成字节流, although still a string, but can be used for packet transport
>>>print len (Buff)
8 #可以看到长度为8个字节, which is exactly the length of two int type data
>>>print Buff
#二进制是乱码
>>>print repr (Buff)
' \x14\x00\x00\x00\xc8\x00\x00\x00 ' #其中十六进制的 0x00000014,0x000000c8 20 and 200 respectively
>>>
Struct.unpack (fmt,string)
Unpack Pack it with pack and then unpack it with unpack. Returns a tuple (tuple) obtained from the unpacked data (string), even if only one data is Baochengyuan the group. where Len (string) must be equal to CalcSize (FMT)
For example:
>>>struct.unpack (' II ', buff) #接上面的例子已有打包好的数据buff
(20,200)
>>>
Struct.calcsize (FMT)
This is used to calculate the size of the structure described in the FMT format.
For example:
>>>struct.calcsize (' II ')
8
>>>
struct. Unpack_from (FMT,string,offset)
This is also used to unpack, similar to struct.unpack (fmt,string) , just start reading from the offset of the parameter string
struct. pack_into (fmt,string, offset, v1,v2,.....)
This is also used for packaging, with struct.pack (fmt,v1,v2,.....) Similarly, just start with the offset of the argument string from offset position
---------------------------------------------------------------------------
The format string, which is composed of one or more format characters (the format characters), refers to Python manual for descriptions of these format characters
As follows:
| Format |
C Type |
Python |
Notes |
| X |
Pad byte |
No value |
|
| C |
Char |
string of length 1 |
|
| B |
SignedChar |
Integer |
|
| B |
unsignedchar |
Integer |
|
| ? |
_bool |
bool |
(1) |
| H |
Short |
Integer |
|
| H |
unsigned short |
Integer |
|
| I |
Int |
Integer |
|
| I |
unsignedint |
Integer or Long |
|
| L |
Long |
Integer |
|
| L |
unsignedlong |
Long |
|
| Q |
LongLong |
Long |
(2) |
| Q |
Unsignedlonglong |
Long |
(2) |
| F |
Float |
Float |
|
| D |
Double |
Float |
|
| S |
Char[] |
String |
|
| P |
Char[] |
String |
|
| P |
void* |
Long |
|
---------------------------------------------------------------------------
An example
[Python]View Plaincopyprint?
- <span style="FONT-SIZE:13PX;" >import struct
- # Native Byteorder
- Buffer = Struct.pack ("IHB", 1, 2, 3)
- Print repr (buffer)
- Print Struct.unpack ("IHB", buffer)
- # data from a sequence, network Byteorder
- data = [1, 2, 3]
- Buffer = Struct.pack ("!IHB", *data)
- Print repr (buffer)
- Print Struct.unpack ("!IHB", buffer) </span>
Output:
' \x01\x00\x00\x00\x02\x00\x03 '
(1, 2, 3)
' \x00\x00\x00\x01\x00\x02\x03 '
(1, 2, 3)
First of all, the parameter is packaged in a package, before packaging is obviously a Python data type in the Integer,pack after it becomes the C structure of the binary string, to the Python string type to display is ' \x01\x00\x00\x00\x02\x00 \x03 '. Because this machine is small end (' Little-endian ', about big and small end of the difference please refer to Google), so high placed in the low address segment. I represents the type of int in C struct, so this machine occupies 4 bits, 1 is represented as 01000000;h for the short type in C struct, 2 bits, so it is represented as 0200; B represents the signed char type in c struct, accounting for 1 bits, Therefore, it is expressed as 03.
---------------------------------------------------------------------------
In the first place of the format string, there is an optional character that determines the big and small ends, which are listed below:
| Character |
Byte Order |
Size and Alignment |
| @ |
Native |
Native |
| = |
Native |
Standard |
| < |
Little-endian |
Standard |
| > |
Big-endian |
Standard |
| ! |
Network (= Big-endian) |
Standard |
If not attached, the default is @, that is, using the native character order (big or small end), the size of the C structure and in-memory alignment is also consistent with the native (native), for example, some machine integer is 2 bits and some machines are four bits, some machine memory to its bit four-bit alignment, Some are n-bit aligned (n unknown, I don't know how much).
There is also a standard option that is described as: If standard is used, no memory alignment is available for any type.
For example, the second half of a small program, using the format string in the first place! , which is the standard alignment of the big-endian mode, so the output is ' \x00\x00\x00\x01\x00\x02\x03 ', where the high-level itself is placed in the memory of the higher address bit.
Python byte stream package unpacking