Tutorial on how to operate byte streams/binary streams in the struct module of Python, pythonstruct

Source: Internet
Author: User

Tutorial on how to operate byte streams/binary streams in the struct module of Python, pythonstruct

Preface

Recently, I used Python to parse the MNIST dataset in the IDX file format and needed to read the binary file. I used the struct module. I checked many tutorials on the Internet and wrote quite well, but it was not very friendly to new users. So I rearranged some notes for quick start.

Note:The following four terms are synonymous: binary stream, binary array, byte stream, and byte array.

Quick Start

In the struct module, when an integer, floating point, or character stream (character array) is converted to a byte stream (byte array, you need to use the formatted string fmt to tell the struct module the type of the object to be converted. For example, the integer is 'I', the floating point number is 'F', and an ascii character is 'S '.

Def demo1 (): # Use bin_buf = struct. pack (fmt, buf) sets buf as a binary array bin_buf # Use buf = struct. unpack (fmt, bin_buf) returns the bin_buf binary array to the buf # integer-> binary stream buf1 = 256 bin_buf1 = struct. pack ('I', buf1) # 'I' indicates 'integer' ret1 = struct. unpack ('I', bin_buf1) print bin_buf1, '<===>', ret1 # Floating Point Number-> binary stream buf2 = 3.1415 bin_buf2 = struct. pack ('D', buf2) # 'D' indicates 'double' ret2 = struct. unpack ('D', bin_buf2) print bin_buf2, '<===>', ret2 # string-> binary stream buf3 = 'Hello world' bin_buf3 = struct. pack ('11s', buf3) # '11s' indicates the 'string' character array ret3 = struct. unpack ('11s', bin_buf3) print bin_buf3, '<===>', ret3 # struct-> binary stream # assume there is a struct # struct header {# int buf1; # double buf2; # char buf3 [11]; #} bin_buf_all = struct. pack ('id11s', buf1, buf2, buf3) ret_all = struct. unpack ('id11s', bin_buf_all) print bin_buf_all, '<==>', ret_all

The output result is as follows:


Demo1 output result

Detailed description of struct module

Main functions

The three most important functions in the struct module are:pack(),unpack(),calcsize()

# Encapsulate data into a string (in fact, a byte stream similar to a c struct) based on the given formatted string. string = struct. pack (fmt, v1, v2 ,...) # parse the byte stream string according to the given format (fmt), and return the parsed tupletuple = unpack (fmt, string) # Calculate the given format (fmt) memory offset = calcsize (fmt)

Format String in struct

The following table lists the formats supported by struct:

Format C Type Python Bytes
X Pad byte No value 1
C Char String of length 1 1
B Signed char Integer 1
B Unsigned char Integer 1
? _ Bool Bool 1
H Short Integer 2
H Unsigned short Integer 2
I Int Integer 4
I Unsigned int Integer or lon 4
L Long Integer 4
L Unsigned long Long 4
Q Long Long 8
Q Unsigned long Long 8
F Float Float 4
D Double Float 8
S Char [] String 1
P Char [] String 1
P Void * Long  

Note 1: q and Q are only interesting when the machine supports 64-bit operations.

NOTE 2: there can be a number before each format, indicating the number

Note 3: The s format indicates a string of a certain length. 4s indicates a string of 4, but p indicates a pascal string.

Note 4: P is used to convert a pointer. Its length is related to the machine length.

Note 5: The last one can be used to indicate the pointer type, which occupies 4 bytes.

In order to exchange data with the struct in c, some c or c ++ compilers use byte alignment, which is usually a 32-bit System in 4 bytes, therefore, struct is converted in byte sequence based on the local machine. you can use the first character in the format to change the alignment. definition:

Character Byte order Size and alignment
@ Native Native makes up 4 bytes
= Native Standard is based on the number of original bytes
< Little-endian Standard is based on the number of original bytes
> Big-endian Standard is based on the number of original bytes
! Network (= big-endian) Standard is based on the number of original bytes

The usage is placed at the first position of fmt, like '@ 5s6sif'

Summary

The above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, you can leave a message.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.