Python file read-write and file character encoding Setup method

Source: Internet
Author: User
File read and write operations in a variety of programming languages are the more important part, is also a very common part, today to detail the Python file read and write operations, as well as the points to note.

One. Python Open file

The code is as follows:

f = open ("D:\test.txt", "W")

Description

The first parameter is the file name, including the path;

The second parameter is the mode that opens

' R ': Read-only (default. Throws an error if the file does not exist)

' W ': Write-only (if the file does not exist, the file is created automatically)

' A ': Append to end of file

' r+ ': Read and Write

If you need to open the file in binary mode, you need to add the character "B" after mode, such as "RB", "WB", etc.

Second, Python read the file content f.read ([size])

The parameter size indicates the number of reads, which can be omitted. If the size parameter is omitted, all contents of the file are read.

F.readline () reads the contents of the file line F.readlines () reads all rows into the array inside [line1,line2,... linen].

f = open ('./pythontab.txt ', ' r ') content = F.read () print content

This approach is often used to improve efficiency by avoiding the loading of all file content into memory.

Third, Python writes to file F.write (string)

Writes a string to a file

f = open ('./pythontab.txt ', ' r+ ') f.write (' Hello, pythontab.com ') f.close ()

Note: If the write ends, you can add "\ n" after the string to represent the line break, and finally the file must be closed with f.close (). Otherwise, exceptions can occur, especially in high concurrency situations.

Iv. content positioning in the document

F.read () After reading, the file pointer arrives at the end of the file, if again F.read () will find that the read is empty content, if you want to read the whole content again, you must move the positioning pointer to the beginning of the file:

F.seek (0)

The format of this function is as follows (in bytes): F.seek (offset, from_what) from_what indicates where to start reading, offset means moving from from_what to a certain distance, such as F.seek (10, 3) Represents a third character and then 10 characters later.

A from_what value of 0 indicates the beginning of the file, which can also be omitted, and by default 0 is the beginning of the file. A complete example is given below:

f = open ('./pythontab.txt ', ' r+ ') f.write (' Hello, pythontab.com ') F.seek (5)     # Navigate to 6th Bytef.read (1)        F.seek (-3, 2 ) #定位到第2个字符并再向前移动3个字符f. Read (1)

V. Closing files

Close File Release resource file operation finished, be sure to remember to close the file F.close (), you can release resources for other programs to make only ASCII or GBK encoded file read and write, relatively simple, read and write as follows:

# coding=gbkf = Open ('./pythontab.txt ', ' R ') # R indicates the file open mode, which is read-only S1 = F.read () s2 = f.readline () s3 = F.readlines () #读出所有内容f. Clos E () F = open ('./pythontab.txt ', ' W ') # W write file One f.write (S1) f.writelines (S2) # no writeline13 f.close ()

Six. F.writelines does not output line breaks

Python Unicode file reads and writes:

# Coding=gbkimport CODECSF = Codecs.open ('./pythontab.txt ', ' a ', ' Utf-8 ') f.write (U ' Chinese ') s = ' Chinese ' f.write (s.decode (' GBK ') ) F.close () F = Codecs.open ('./pythontab.txt ', ' r ', ' Utf-8 ') s = F.readlines () f.close () for line in S:    print Line.encode (' GBK ')

Seven. Encoding of Python code files

The py file is ASCII encoded by default, and Chinese will make an ASCII-to-system-default-encoding conversion when displayed, and an error will occur: Syntaxerror:non-ascii character. You need to add an encoding indication on the first or second line of the code file:

# coding=utf-8 # #以utf-8 encoding stored Chinese characters

print ' Chinese ' as above directly input string is processed according to code file encoding, if Unicode encoding, there are the following 2 ways:

S1 = U ' Chinese ' #u表示用unicode编码方式储存信息

S2 = Unicode (' Chinese ', ' GBK ')

Unicode is a built-in function, and the second parameter indicates the encoding format of the source string.

Decode is any string that has a method that converts a string into Unicode format, and the parameter indicates the encoding format of the source string.

Encode is also a method of any string that converts a string into the format specified by the parameter.

Encoding of the Python string

The Unicode type is constructed with U ' kanji ', so it is not necessary to construct the STR type.

The coding of STR is related to the system environment, which is generally the value obtained by sys.getfilesystemencoding ().

So to go from Unicode to STR, use the Encode method

Turn Unicode from STR, so use decode

For example:

# coding=utf-8   #默认编码格式为utf -8s = U ' Chinese ' #unicode编码的文字print s.encode (' utf-8 ')   #转换成utf-8 format output print S #效果与上面相同, Appears to be converted directly to the specified encoding by default

Summarize:

U=u ' Unicode encoded text '

G=u.encode (' GBK ') #转换为gbk格式

Print G #此时为乱码 Because the current environment is UTF-8,GBK encoded text garbled

Str=g.decode (' GBK '). Encode (' Utf-8 ') #以gbk编码格式读取g (because he is GBK encoded) and converted to UTF-8 format output

Print str #正常显示中文

Secure method:

S.decode (' GBK ', ' ignore '). Encode (' utf-8′ ') #以gbk编码读取 (of course, reading the GBK encoded format) and ignoring the wrong encoding, converting to UTF-8 encoded output

Because Decode's function prototype is decode ([encoding], [errors= ' strict ']), a second parameter can be used to control the policy of error handling, the default parameter is strict, which represents an exception thrown when an illegal character is encountered;

If set to ignore, illegal characters are ignored;

If set to replace, it will replace illegal characters;

If set to Xmlcharrefreplace, the character reference of the XML is used.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.