A simple way to read and write binary files using Python

Source: Internet
Author: User
Tags unpack
Overall, Python itself does not support binary, but it provides a module to compensate for it, which is the struct module.

Python does not have a binary type, but it can store binary types of data, that is, using string string types to store binary data, which is fine because the string is in 1 bytes.

Import struct

a=12.34

#将a变为二进制

Bytes=struct.pack (' I ', a)

At this point, Bytes is a string literal, which is the same as the binary storage of a byte in bytes.

And then reverse the operation.

The existing binary data bytes, which is actually a string, translates it into a Python data type:

A,=struct.unpack (' i ', bytes)

Note that the unpack returns a tuple

So if there is only one variable:

Bytes=struct.pack (' I ', a)

Well, that's what it takes to decode.

A,=struct.unpack (' i ', bytes) or (A,) =struct.unpack (' I ', bytes)

If you use A=struct.unpack directly (' I ', bytes), then a= (12.34,) is a tuple instead of the original floating-point number.

If it is composed of multiple data, you can:

A= ' Hello ' b= ' world! ' C=2d=45.123bytes=struct.pack (' 5s6sif ', a,b,c,d)

At this point the bytes is the binary form of the data, you can write directly to the file such as Binfile.write (bytes)

Then, when we need to, we can read it again, Bytes=binfile.read ()

Then decode the python variable by struct.unpack ()

A,b,c,d=struct.unpack (' 5s6sif ', bytes)

' 5s6sif ' is called FMT, which is a formatted string, consisting of numbers plus characters, 5s representing a 5-character string, 2i, representing 2 integers, and so on, the following are the available characters and types, and the CType representation can correspond to type one by one in Python.


Format C Type Python Number of bytes
X Pad byte No value 1
C Char string of length 1 1
B Signed Char Integer 1
B unsigned char Integer 1
? _bool bool 1
H Short Integer 2
H unsigned short Integer 2
I Int Integer 4
I unsigned int Integer or Long 4
L Long Integer 4
L unsigned long Long 4
Q Long Long Long 8
Q unsigned long long Long 8
F Float Float 4
D Double Float 8
S Char[] String 1
P Char[] String 1
P void * Long

The last one can be used to represent a pointer type, accounting for 4 bytes

In order to exchange data with structs in C, it is also necessary to consider that some C or C + + compilers use byte alignment, typically 32-bit systems in 4-byte units, and therefore provide

Character Byte Order Size and Alignment
@ Native Native enough 4 bytes
= Native Standard by original number of bytes
< Little-endian Standard by original number of bytes
> Big-endian Standard by original number of bytes
! Network (= Big-endian) Standard by original number of bytes

The use method is placed in the first position of the FMT, just like ' @5s6sif '

-----problems encountered while processing binary files-----

When we work with binary files, we need to use the following methods

Binfile=open (filepath, ' RB ') read the binary file

Or

Binfile=open (filepath, ' WB ') write binary files

So what's the difference between the results and Binfile=open (filepath, ' R ')?

The difference is two places:

First, if you touch ' 0x1A ' when using ' R ', it will be considered as the end of the file, which is EOF. There is no problem with ' RB '. That is, if you use binary writing to read the text again, if there is ' 0X1A ' in it, only a portion of the file will be read. Using ' RB ' will always read the end of the file.

Second, for the string x= ' abc/ndef ', we can use Len (x) to get its length to 7,/n what we call a newline character, which is actually ' 0X0A '. When we write with ' W ' as text, the ' 0X0A ' is automatically changed to two characters ' 0X0D ', ' 0X0A ', that is, the length of the file actually becomes 8 in the Windows platform. When read with the ' R ' text, it is automatically converted to the original newline character. If you replace it with a ' WB ' binary, it will keep one character intact and read as is. So if you write it in text and read it in binary mode, consider the extra byte. ' 0X0D ' is also called carriage return.
Linux does not change. Because Linux uses only ' 0X0A ' to represent line breaks.

The above this article uses Python to read and write binary files simple method (recommended) is the small part to share all the content of everyone, I hope to give you a reference, but also hope that we support topic.alibabacloud.com.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.