Describes in detail how to use struct to process binary data (pack and unpack usage) in Python)

Source: Internet
Author: User
Sometimes you need to use python to process binary data, such as accessing files and using socket operations, you can use the python struct module to process binary data in c. sometimes, you need to use python to process binary data, such as when accessing files or using sockets. in this case, you can use the python struct module. you can use struct to process struct in C language.

The three most important functions in the struct module are pack (), unpack (), and calcsize ()

Pack (fmt, v1, v2,...) encapsulates data into strings in a given format (actually a byte stream similar to a c struct)

Unpack (fmt, string) parses the byte stream string according to the given format (fmt) and returns the parsed tuple.

Calcsize (fmt) calculates the number of bytes of memory occupied by a given format (fmt ).

The following table lists the formats supported by struct:

Format C Type Python byte count

X pad byte no value 1

C char string of length 1 1

B signed char integer 1

B unsigned char integer 1

? _ Bool bool 1

H short integer 2

H unsigned short integer 2

I int integer 4

I unsigned int integer or long 4

L long integer 4

L unsigned long 4

Q long 8

Q unsigned long 8

F float 4

D double float 8

S char [] string 1

P char [] string 1

P void * long

Note 1. q and Q are only interesting when the machine supports 64-bit operations

Note 2. there can be a number before each format, indicating the number

Note: the 3. s format indicates a string of a certain length. 4s indicates a string of 4, but p indicates a pascal string.

Note 4. P is used to convert a pointer. Its length is related to the machine's word length.

Note 5. the last one can be used to indicate the pointer type, which occupies 4 bytes.

In order to exchange data with the struct in c, some c or c ++ compilers use byte alignment, which is usually a 32-bit system in 4 bytes, therefore, struct is converted in byte sequence based on the local machine. you can use the first character in the format to change the alignment. definition:

Character Byte order Size and alignment

@ Native: 4 bytes

= Native standard

<Little-endian standard based on the number of original bytes

> Big-endian standard: number of original bytes

! Network (= big-endian)

Standard is based on the number of original bytes

The usage is placed at the first position of fmt, like '@ 5s6sif'

Example 1:

The struct is as follows:

struct Header{    unsigned short id;    char[4] tag;    unsigned int version;    unsigned int count;}

The above struct data is received through socket. recv, which exists in string s. now you need to parse it. you can use the unpack () function:

import structid, tag, version, count = struct.unpack("!H4s2I", s)

In the format string above ,! It indicates that we want to use network byte sequence resolution, because our data is received from the network and transmitted over the network. H indicates an unsigned short id, 4s indicates a 4-byte long string, and 2I indicates that there are two unsigned int types of data.

Through an unpack, we have saved our information in id, tag, version, and count.

Similarly, you can easily pack local data into the struct format:

ss = struct.pack("!H4s2I", id, tag, version, count);

The pack function converts id, tag, version, and count into struct headers in the specified format. ss is now a string (actually a byte stream similar to a c struct) and can use socket. send (ss) sends this string.

Example 2:

Import structa = 12.34 # Convert a to binary bytes = struct. pack ('I',)

In this case, bytes is a string, and the bytes of the string are the same as the binary storage content of.

Then perform a reverse operation to convert the existing binary data bytes (actually a string) to the python data type in turn:

# Note: The unpack returns tuple !!

a,=struct.unpack('i',bytes)

If it is composed of multiple data, you can do this:

a='hello'b='world!'c=2d=45.123bytes=struct.pack('5s6sif',a,b,c,d)

In this case, bytes is binary data. you can directly write data to a file, such as binfile. write (bytes)

Then, we can read it again when needed, bytes = binfile. read ()

Then, the python variable is decoded by struct. unpack:

a,b,c,d=struct.unpack('5s6sif',bytes)

'5s6sif' is called fmt. it is a formatted string consisting of numbers and characters. 5s indicates the five-character string, 2i indicates two integers, and so on, the following are available characters and types. the ctype can correspond to the types in python.

Note: problems encountered during binary file processing

When processing binary files, use the following method:

Binfile = open (filepath, 'RB') # read binary file binfile = open (filepath, 'wb') # write binary files

So what is the difference between the result of binfile = open (filepath, 'r?

There are two differences:

First, if you encounter '0x1a 'when using 'R', it is regarded as the end of the file, which is EOF. This problem does not exist when 'RB' is used. That is, if you use binary data to write and then read the data in text, if '0x1a 'exists, only part of the file will be read. When 'RB' is used, it will always read at the end of the file.

Second, for string x = 'ABC \ ndef ', we can use len (x) to get its length of 7. \ n is called a line break, which is actually '0x0a '. When we use 'w' as the text writing method, '0x0a' is automatically changed to two characters '0x0d' and '0x0a' on windows ', that is, the file length is actually 8 .. When reading in 'r' text, it is automatically converted to the original line break. If it is written in 'WB 'binary format, it will keep one character unchanged and read as is. Therefore, if you write data in text and read data in binary mode, consider the extra byte. '0x0d' is also called a carriage return. Linux does not change. Because linux only uses '0x0a' to indicate line breaks.

The preceding section details how to use struct to process binary data (pack and unpack usage) in Python. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.