A simple way to read and write binary files using Python

Last Update:2017-02-23 Source: Internet

Author: User

Tags unpack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overall, Python itself does not support binary, but it provides a module to compensate for it, which is the struct module.

Python does not have a binary type, but it can store binary types of data, that is, using string string types to store binary data, which is fine because the string is in 1 bytes.

Import struct

a=12.34

#将a变为二进制

Bytes=struct.pack (' I ', a)

At this point, Bytes is a string literal, which is the same as the binary storage of a byte in bytes.

And then reverse the operation.

The existing binary data bytes, which is actually a string, translates it into a Python data type:

A,=struct.unpack (' i ', bytes)

Note that the unpack returns a tuple

So if there is only one variable:

Bytes=struct.pack (' I ', a)

Well, that's what it takes to decode.

A,=struct.unpack (' i ', bytes) or (A,) =struct.unpack (' I ', bytes)

If you use A=struct.unpack directly (' I ', bytes), then a= (12.34,) is a tuple instead of the original floating-point number.

If it is composed of multiple data, you can:

A= ' Hello ' b= ' world! ' C=2d=45.123bytes=struct.pack (' 5s6sif ', a,b,c,d)

At this point the bytes is the binary form of the data, you can write directly to the file such as Binfile.write (bytes)

Then, when we need to, we can read it again, Bytes=binfile.read ()

Then decode the python variable by struct.unpack ()

A,b,c,d=struct.unpack (' 5s6sif ', bytes)

' 5s6sif ' is called FMT, which is a formatted string, consisting of numbers plus characters, 5s representing a 5-character string, 2i, representing 2 integers, and so on, the following are the available characters and types, and the CType representation can correspond to type one by one in Python.

Format	C Type	Python	Number of bytes
X	Pad byte	No value	1
C	Char	string of length 1	1
B	Signed Char	Integer	1
B	unsigned char	Integer	1
?	_bool	bool	1
H	Short	Integer	2
H	unsigned short	Integer	2
I	Int	Integer	4
I	unsigned int	Integer or Long	4
L	Long	Integer	4
L	unsigned long	Long	4
Q	Long Long	Long	8
Q	unsigned long long	Long	8
F	Float	Float	4
D	Double	Float	8
S	Char[]	String	1
P	Char[]	String	1
P	void *	Long

The last one can be used to represent a pointer type, accounting for 4 bytes

In order to exchange data with structs in C, it is also necessary to consider that some C or C + + compilers use byte alignment, typically 32-bit systems in 4-byte units, and therefore provide

Character	Byte Order	Size and Alignment
@	Native	Native enough 4 bytes
=	Native	Standard by original number of bytes
<	Little-endian	Standard by original number of bytes
>	Big-endian	Standard by original number of bytes
!	Network (= Big-endian)	Standard by original number of bytes

The use method is placed in the first position of the FMT, just like ' @5s6sif '

-----problems encountered while processing binary files-----

When we work with binary files, we need to use the following methods

Binfile=open (filepath, ' RB ') read the binary file

Binfile=open (filepath, ' WB ') write binary files

So what's the difference between the results and Binfile=open (filepath, ' R ')?

The difference is two places:

First, if you touch ' 0x1A ' when using ' R ', it will be considered as the end of the file, which is EOF. There is no problem with ' RB '. That is, if you use binary writing to read the text again, if there is ' 0X1A ' in it, only a portion of the file will be read. Using ' RB ' will always read the end of the file.

Second, for the string x= ' abc/ndef ', we can use Len (x) to get its length to 7,/n what we call a newline character, which is actually ' 0X0A '. When we write with ' W ' as text, the ' 0X0A ' is automatically changed to two characters ' 0X0D ', ' 0X0A ', that is, the length of the file actually becomes 8 in the Windows platform. When read with the ' R ' text, it is automatically converted to the original newline character. If you replace it with a ' WB ' binary, it will keep one character intact and read as is. So if you write it in text and read it in binary mode, consider the extra byte. ' 0X0D ' is also called carriage return.
Linux does not change. Because Linux uses only ' 0X0A ' to represent line breaks.

The above this article uses Python to read and write binary files simple method (recommended) is the small part to share all the content of everyone, I hope to give you a reference, but also hope that we support topic.alibabacloud.com.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More