Python uses struct to process binary instances.

Last Update:2017-09-27 Source: Internet

Author: User

Tags unpack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python uses struct to process binary instances.

How Python uses struct to process binary data

Sometimes you need to use python to process binary data, such as accessing files and socket operations. At this time, you can use the python struct module to complete the process. You can use struct to process struct in C language.

The three most important functions in the struct module are pack (), unpack (), and calcsize ()
Pack (fmt, v1, v2,...) encapsulates data into strings in a given format (actually a byte stream similar to a c struct)
Unpack (fmt, string) parses the byte stream string according to the given format (fmt) and returns the parsed tuple.
Calcsize (fmt) calculates the number of bytes of memory occupied by a given format (fmt ).

The following table lists the formats supported by struct:

Format	C Type	Python	Bytes
X	Pad byte	No value	1
C	Char	String of length 1	1
B	Signed char	Integer	1
B	Unsigned char	Integer	1
?	_ Bool	Bool	1
H	Short	Integer	2
H	Unsigned short	Integer	2
I	Int	Integer	4
I	Unsigned int	Integer or long	4
L	Long	Integer	4
L	Unsigned long	Long	4
Q	Long	Long	8
Q	Unsigned long	Long	8
F	Float	Float	4
D	Double	Float	8
S	Char []	String	1
P	Char []	String	1
P	Void *	Long

Note 1. q and Q are only interesting when the machine supports 64-bit operations
Note 2. There can be a number before each format, indicating the number
Note: The 3. s format indicates a string of a certain length. 4s indicates a string of 4, but p indicates a pascal string.
Note 4. P is used to convert a pointer. Its length is related to the machine's word length.
Note 5. The last one can be used to indicate the pointer type, which occupies 4 bytes.

In order to exchange data with the struct in c, some c or c ++ compilers use byte alignment, which is usually a 32-bit System in 4 bytes, therefore, struct is converted in byte sequence based on the local machine. you can use the first character in the format to change the alignment. definition:

Character	Byte order	Size and alignment
@	Native	Native makes up 4 bytes
=	Native	Standard is based on the number of original bytes
<	Little-endian	Standard is based on the number of original bytes
>	Big-endian	Standard is based on the number of original bytes
!	Network (= big-endian)	Standard is based on the number of original bytes

The usage is placed at the first position of fmt, like '@ 5s6sif'

Example 1:

For example, there is a struct

struct Header{  unsigned short id;  char[4] tag;  unsigned int version;  unsigned int count;}

The above struct data is received through socket. recv, which exists in string s. Now we need to parse it. You can use the unpack () function.

import struct id, tag, version, count = struct.unpack("!H4s2I", s)

In the format string above ,! It indicates that we want to use network byte sequence resolution, because our data is received from the network and transmitted over the network. H indicates an unsigned short id, 4s indicates a 4-byte long string, and 2I indicates that there are two unsigned int types of data.

Through an unpack, we have saved our information in id, tag, version, and count.

Similarly, you can easily pack local data into the struct format.

ss = struct.pack("!H4s2I", id, tag, version, count);

The pack function converts id, tag, version, and count into struct headers in the specified format. ss is now a string (actually a byte stream similar to a c struct) and can use socket. send (ss) sends this string.

Example 2:

Import structa = 12.34 # convert a to binary bytes = struct. pack ('I',)

In this case, bytes is a string, and the bytes of the string are the same as the binary storage content of.

Then perform a reverse operation.

The existing binary data bytes (actually a string) is converted to the python data type in turn:

a,=struct.unpack('i',bytes)

Note that the unpack returns tuple

So if there is only one variable:

bytes=struct.pack('i',a)

This is required for decoding.

A, = struct. unpack ('I', bytes) or (a,) = struct. unpack ('I', bytes)

If a = struct. unpack ('I', bytes) is used directly, a = (12.34,) is a tuple instead of the original floating point number.

If it is composed of multiple data, you can do this:

a='hello'b='world!'c=2d=45.123bytes=struct.pack('5s6sif',a,b,c,d)

In this case, bytes is binary data. You can directly write data to a file, such as binfile. write (bytes)

Then, we can read it again when needed, bytes = binfile. read ()

Then, the python variable is decoded by struct. unpack ().

a,b,c,d=struct.unpack('5s6sif',bytes)

'5s6sif' is called fmt. It is a formatted string consisting of numbers and characters. 5s indicates the five-character string, 2i indicates two integers, and so on, the following are available characters and types. The ctype can correspond to the types in python.

Note: problems encountered during Binary File Processing

When processing binary files, use the following method:

Binfile = open (filepath, 'rb') Read Binary File binfile = open (filepath, 'wb') Write binary file

So what is the difference between the result of binfile = open (filepath, 'R?

There are two differences:

First, if you encounter '0x1a 'when using 'R', it is regarded as the end of the file, which is EOF. This problem does not exist when 'rb' is used. That is, if you use binary data to write and then read the data in text, if '0x1a 'exists, only part of the file will be read. When 'rb' is used, it will always read at the end of the file.

Second, for string x = 'abc \ ndef ', we can use len (x) to get its length of 7. \ n is called a line break, which is actually '0x0a '. When we use 'W' as the text writing method, '0x0a' is automatically changed to two characters '0x0d' and '0x0a' on windows ', that is, the file length is actually 8 .. When reading in 'R' text, it is automatically converted to the original line break. If it is written in 'wb 'binary format, it will keep one character unchanged and read as is. Therefore, if you write data in text and read data in binary mode, consider the extra byte. '0x0d' is also called a carriage return. Linux does not change. Because linux only uses '0x0a' to indicate line breaks.

If you have any questions, please leave a message or go to the community on this site for discussion. Thank you for reading this article. Thank you for your support!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More