Python uses struct to process binary instances.

Source: Internet
Author: User
Tags unpack

Python uses struct to process binary instances.

How Python uses struct to process binary data

Sometimes you need to use python to process binary data, such as accessing files and socket operations. At this time, you can use the python struct module to complete the process. You can use struct to process struct in C language.

  • The three most important functions in the struct module are pack (), unpack (), and calcsize ()
  • Pack (fmt, v1, v2,...) encapsulates data into strings in a given format (actually a byte stream similar to a c struct)
  • Unpack (fmt, string) parses the byte stream string according to the given format (fmt) and returns the parsed tuple.
  • Calcsize (fmt) calculates the number of bytes of memory occupied by a given format (fmt ).

The following table lists the formats supported by struct:

Format C Type Python Bytes
X Pad byte No value 1
C Char String of length 1 1
B Signed char Integer 1
B Unsigned char Integer 1
? _ Bool Bool 1
H Short Integer 2
H Unsigned short Integer 2
I Int Integer 4
I Unsigned int Integer or long 4
L Long Integer 4
L Unsigned long Long 4
Q Long Long 8
Q Unsigned long Long 8
F Float Float 4
D Double Float 8
S Char [] String 1
P Char [] String 1
P Void * Long

Note 1. q and Q are only interesting when the machine supports 64-bit operations
Note 2. There can be a number before each format, indicating the number
Note: The 3. s format indicates a string of a certain length. 4s indicates a string of 4, but p indicates a pascal string.
Note 4. P is used to convert a pointer. Its length is related to the machine's word length.
Note 5. The last one can be used to indicate the pointer type, which occupies 4 bytes.

In order to exchange data with the struct in c, some c or c ++ compilers use byte alignment, which is usually a 32-bit System in 4 bytes, therefore, struct is converted in byte sequence based on the local machine. you can use the first character in the format to change the alignment. definition:

Character Byte order Size and alignment
@ Native Native makes up 4 bytes
= Native Standard is based on the number of original bytes
< Little-endian Standard is based on the number of original bytes
> Big-endian Standard is based on the number of original bytes
! Network (= big-endian)

Standard is based on the number of original bytes

The usage is placed at the first position of fmt, like '@ 5s6sif'

Example 1:

For example, there is a struct

struct Header{  unsigned short id;  char[4] tag;  unsigned int version;  unsigned int count;}

The above struct data is received through socket. recv, which exists in string s. Now we need to parse it. You can use the unpack () function.

import struct id, tag, version, count = struct.unpack("!H4s2I", s)

In the format string above ,! It indicates that we want to use network byte sequence resolution, because our data is received from the network and transmitted over the network. H indicates an unsigned short id, 4s indicates a 4-byte long string, and 2I indicates that there are two unsigned int types of data.

Through an unpack, we have saved our information in id, tag, version, and count.

Similarly, you can easily pack local data into the struct format.

ss = struct.pack("!H4s2I", id, tag, version, count);

The pack function converts id, tag, version, and count into struct headers in the specified format. ss is now a string (actually a byte stream similar to a c struct) and can use socket. send (ss) sends this string.

Example 2:

Import structa = 12.34 # convert a to binary bytes = struct. pack ('I',)

In this case, bytes is a string, and the bytes of the string are the same as the binary storage content of.

Then perform a reverse operation.

The existing binary data bytes (actually a string) is converted to the python data type in turn:

a,=struct.unpack('i',bytes)

Note that the unpack returns tuple

So if there is only one variable:

bytes=struct.pack('i',a)

This is required for decoding.

A, = struct. unpack ('I', bytes) or (a,) = struct. unpack ('I', bytes)

If a = struct. unpack ('I', bytes) is used directly, a = (12.34,) is a tuple instead of the original floating point number.

If it is composed of multiple data, you can do this:

a='hello'b='world!'c=2d=45.123bytes=struct.pack('5s6sif',a,b,c,d)

In this case, bytes is binary data. You can directly write data to a file, such as binfile. write (bytes)

Then, we can read it again when needed, bytes = binfile. read ()

Then, the python variable is decoded by struct. unpack ().

a,b,c,d=struct.unpack('5s6sif',bytes)

'5s6sif' is called fmt. It is a formatted string consisting of numbers and characters. 5s indicates the five-character string, 2i indicates two integers, and so on, the following are available characters and types. The ctype can correspond to the types in python.

Note: problems encountered during Binary File Processing

When processing binary files, use the following method:

Binfile = open (filepath, 'rb') Read Binary File binfile = open (filepath, 'wb') Write binary file

So what is the difference between the result of binfile = open (filepath, 'R?

There are two differences:

First, if you encounter '0x1a 'when using 'R', it is regarded as the end of the file, which is EOF. This problem does not exist when 'rb' is used. That is, if you use binary data to write and then read the data in text, if '0x1a 'exists, only part of the file will be read. When 'rb' is used, it will always read at the end of the file.

Second, for string x = 'abc \ ndef ', we can use len (x) to get its length of 7. \ n is called a line break, which is actually '0x0a '. When we use 'W' as the text writing method, '0x0a' is automatically changed to two characters '0x0d' and '0x0a' on windows ', that is, the file length is actually 8 .. When reading in 'R' text, it is automatically converted to the original line break. If it is written in 'wb 'binary format, it will keep one character unchanged and read as is. Therefore, if you write data in text and read data in binary mode, consider the extra byte. '0x0d' is also called a carriage return. Linux does not change. Because linux only uses '0x0a' to indicate line breaks.

If you have any questions, please leave a message or go to the community on this site for discussion. Thank you for reading this article. Thank you for your support!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.