Python file read-write and file character encoding Setup method

Last Update:2017-08-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

File read and write operations in a variety of programming languages are the more important part, is also a very common part, today to detail the Python file read and write operations, as well as the points to note.

One. Python Open file

The code is as follows:

f = open ("D:\test.txt", "W")

Description

The first parameter is the file name, including the path;

The second parameter is the mode that opens

' R ': Read-only (default. Throws an error if the file does not exist)

' W ': Write-only (if the file does not exist, the file is created automatically)

' A ': Append to end of file

' r+ ': Read and Write

If you need to open the file in binary mode, you need to add the character "B" after mode, such as "RB", "WB", etc.

Second, Python read the file content f.read ([size])

The parameter size indicates the number of reads, which can be omitted. If the size parameter is omitted, all contents of the file are read.

F.readline () reads the contents of the file line F.readlines () reads all rows into the array inside [line1,line2,... linen].

f = open ('./pythontab.txt ', ' r ') content = F.read () print content

This approach is often used to improve efficiency by avoiding the loading of all file content into memory.

Third, Python writes to file F.write (string)

Writes a string to a file

f = open ('./pythontab.txt ', ' r+ ') f.write (' Hello, pythontab.com ') f.close ()

Note: If the write ends, you can add "\ n" after the string to represent the line break, and finally the file must be closed with f.close (). Otherwise, exceptions can occur, especially in high concurrency situations.

Iv. content positioning in the document

F.read () After reading, the file pointer arrives at the end of the file, if again F.read () will find that the read is empty content, if you want to read the whole content again, you must move the positioning pointer to the beginning of the file:

F.seek (0)

The format of this function is as follows (in bytes): F.seek (offset, from_what) from_what indicates where to start reading, offset means moving from from_what to a certain distance, such as F.seek (10, 3) Represents a third character and then 10 characters later.

A from_what value of 0 indicates the beginning of the file, which can also be omitted, and by default 0 is the beginning of the file. A complete example is given below:

f = open ('./pythontab.txt ', ' r+ ') f.write (' Hello, pythontab.com ') F.seek (5)     # Navigate to 6th Bytef.read (1)        F.seek (-3, 2 ) #定位到第2个字符并再向前移动3个字符f. Read (1)

V. Closing files

Close File Release resource file operation finished, be sure to remember to close the file F.close (), you can release resources for other programs to make only ASCII or GBK encoded file read and write, relatively simple, read and write as follows:

# coding=gbkf = Open ('./pythontab.txt ', ' R ') # R indicates the file open mode, which is read-only S1 = F.read () s2 = f.readline () s3 = F.readlines () #读出所有内容f. Clos E () F = open ('./pythontab.txt ', ' W ') # W write file One f.write (S1) f.writelines (S2) # no writeline13 f.close ()

Six. F.writelines does not output line breaks

Python Unicode file reads and writes:

# Coding=gbkimport CODECSF = Codecs.open ('./pythontab.txt ', ' a ', ' Utf-8 ') f.write (U ' Chinese ') s = ' Chinese ' f.write (s.decode (' GBK ') ) F.close () F = Codecs.open ('./pythontab.txt ', ' r ', ' Utf-8 ') s = F.readlines () f.close () for line in S:    print Line.encode (' GBK ')

Seven. Encoding of Python code files

The py file is ASCII encoded by default, and Chinese will make an ASCII-to-system-default-encoding conversion when displayed, and an error will occur: Syntaxerror:non-ascii character. You need to add an encoding indication on the first or second line of the code file:

# coding=utf-8 # #以utf-8 encoding stored Chinese characters

print ' Chinese ' as above directly input string is processed according to code file encoding, if Unicode encoding, there are the following 2 ways:

S1 = U ' Chinese ' #u表示用unicode编码方式储存信息

S2 = Unicode (' Chinese ', ' GBK ')

Unicode is a built-in function, and the second parameter indicates the encoding format of the source string.

Decode is any string that has a method that converts a string into Unicode format, and the parameter indicates the encoding format of the source string.

Encode is also a method of any string that converts a string into the format specified by the parameter.

Encoding of the Python string

The Unicode type is constructed with U ' kanji ', so it is not necessary to construct the STR type.

The coding of STR is related to the system environment, which is generally the value obtained by sys.getfilesystemencoding ().

So to go from Unicode to STR, use the Encode method

Turn Unicode from STR, so use decode

For example:

# coding=utf-8   #默认编码格式为utf -8s = U ' Chinese ' #unicode编码的文字print s.encode (' utf-8 ')   #转换成utf-8 format output print S #效果与上面相同, Appears to be converted directly to the specified encoding by default

Summarize:

U=u ' Unicode encoded text '

G=u.encode (' GBK ') #转换为gbk格式

Print G #此时为乱码 Because the current environment is UTF-8,GBK encoded text garbled

Str=g.decode (' GBK '). Encode (' Utf-8 ') #以gbk编码格式读取g (because he is GBK encoded) and converted to UTF-8 format output

Print str #正常显示中文

Secure method:

S.decode (' GBK ', ' ignore '). Encode (' utf-8′ ') #以gbk编码读取 (of course, reading the GBK encoded format) and ignoring the wrong encoding, converting to UTF-8 encoded output

Because Decode's function prototype is decode ([encoding], [errors= ' strict ']), a second parameter can be used to control the policy of error handling, the default parameter is strict, which represents an exception thrown when an illegal character is encountered;

If set to ignore, illegal characters are ignored;

If set to replace, it will replace illegal characters;

If set to Xmlcharrefreplace, the character reference of the XML is used.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python file read-write and file character encoding Setup method

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support