A simple way to read and write binary files using Python (recommended)

A simple way to read and write binary files using Python (recommended) _python

Last Update:2017-01-18 Source: Internet

Author: User

Tags pack unpack in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The general feeling is thatPython itself does not support the binary, but provides a module to make up for, that is, the struct module.

Python does not have a binary type, but it can store binary types of data, and that is to store binary data with string literals, which is fine, because the string is in 1 bytes.

Import struct

a=12.34

#将a变为二进制

Bytes=struct.pack (' I ', a)

At this point bytes is a string that is the same byte as the binary store of a.

And then reverse the operation

The existing binary data is bytes, (in fact, a string), which in turn converts it to the Python data type:

A,=struct.unpack (' i ', bytes)

Note that unpack is returning tuple.

So if there's only one variable:

Bytes=struct.pack (' I ', a)

Well, that's what it takes to decode.

A,=struct.unpack (' i ', bytes) or (A,) =struct.unpack (' I ', bytes)

If the a=struct.unpack (' I ', bytes) is used directly, then a= (12.34,) is a tuple rather than the original floating-point number.

If it's made up of multiple data, you can do this:

A= ' Hello '

b= ' world!

' c=2

d=45.123

bytes=struct.pack (' 5s6sif ', a,b,c,d)

At this point the bytes is binary form of data, you can write directly to the file such as Binfile.write (bytes)

Then, when we need it, we can read it again, Bytes=binfile.read ()

and decoding it into Python variables via struct.unpack ()

A,b,c,d=struct.unpack (' 5s6sif ', bytes)

The word ' 5s6sif ', called FMT, is a format string, consisting of a number of characters, 5s represents a 5-character string, 2i, 2 integers, and so on, and the following are the available characters and types, and CType representations can correspond to type one by one in Python.

Format	C Type	Python	Number of bytes
X	Pad byte	No value	1
C	Char	string of length 1	1
B	signed Char	Integer	1
B	unsigned Char	Integer	1
?	_bool	bool	1
H	Short	Integer	2
H	unsigned Short	Integer	2
I	Int	Integer	4
I	unsigned int	Integer or Long	4
L	Long	Integer	4
L	unsigned Long	Long	4
Q	Long Long	Long	8
Q	unsigned long long	Long	8
F	Float	Float	4
D	Double	Float	8
S	Char[]	String	1
P	Char[]	String	1
P	void *	Long

The last one that can be used to represent pointer types, 4 bytes

In order to exchange data with the struct in C, it is also considered that C or C + + compilers use byte alignment, typically a 4-byte 32-bit system, and therefore provide

Order

Character	Byte	Size and Alignment
@	Native	Native enough for 4 bytes.
=	Native	Standard by the original number of bytes
<	Little-endian	Standard by the original number of bytes
>	Big-endian	Standard by the original number of bytes
!	Network (= Big-endian)	Standard by the original number of bytes

The use method is placed in the first position of FMT, just like ' @5s6sif '

-----binary File Processing problems-----

When we work with binary files, we need to use the following methods

Binfile=open (filepath, ' RB ') read binary files

Binfile=open (filepath, ' WB ') writes binary files

So what's the difference with the results of Binfile=open (filepath, ' R ')?

There are two different places:

First, if you encounter ' 0x1A ' when using ' R ', it will be considered as the end of the file, which is EOF. There is no such problem with ' RB '. That is, if you use binary writing to read the text again, if there is ' 0X1A ' in it, you will only read a portion of the file. Use ' RB ' to read the end of the file all the time.

Second, for the string x= ' abc/ndef ', we can use Len (x) to get its length to be 7,/n we call it a line break, which is actually ' 0X0A '. When we write in ' W ' that is text, it automatically turns ' 0X0A ' into two characters ' 0X0D ', ' 0X0A ' in the Windows platform, which means that the file length actually becomes 8. When read in ' R ' text, it is automatically converted to the original line break. If it is written in ' WB ' binary mode, it will keep one character unchanged and read as is. So if you write in text and read in binary form, consider this extra byte. ' 0X0D ' is also known as a return character.
Linux does not change. Because Linux only uses ' 0X0A ' to represent line wrapping.

The above is a simple method of using Python to read and write binary files (recommended) is to share all the content of the small, hope to give you a reference, but also hope that we support the cloud habitat community.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More