Python uses different code to read and write txt files in detail

Last Update:2017-02-28 Source: Internet

Author: User

Tags in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the use of different code to read and write TXT file in Python, this article gives the code of read and write files under different coding methods, the need for friends can refer to the

The code is as follows:

Import OS

Import Codecs

Filenames=os.listdir (OS.GETCWD ())

Out=file ("Name.txt", "W")

For filename in filenames:

Out.write (Filename.decode ("gb2312"). Encode ("Utf-8")

Out.close ()

Writes the current directory and file name of the executing file to the Name.txt file and saves it in UTF-8 format

If you use ANSI encoding to save, you can write with the following code:

The code is as follows:

Out.write (filename)

Open file and write to

The codecs module is referenced and is not currently known to the module. In this record method, have the time to master the function and usage of the module.

The code is as follows:

Import Codecs

File=codecs.open ("Lol.txt", "W", "Utf-8")

File.write (U "i")

File.close ()

Read ANSI-encoded text files and Utf-8 encoded files

Reading ANSI encoded files

Create a file test.txt, file format with ANSI, content:

The code is as follows:

ABC Chinese

Using Python to read

The code is as follows:

# CODING=GBK

Print open ("Test.txt"). Read ()

Result: ABC Chinese

Read UTF-8 encoded files (no BOM)

Change the file format to UTF-8:

The code is as follows:

Result: ABC Juan PO

Obviously, you need to decode this:

The code is as follows:

#-*-Coding:utf-8-*-

Import Codecs

Print open ("Test.txt"). Read (). Decode ("Utf-8")

Result: ABC Chinese

Read Utf-8 encoded file (with BOM)

Some software, when saving a file encoded in UTF-8, inserts three invisible characters (0xEF 0xBB 0xBF, or BOM) where the file begins. In some software you can control whether to insert a BOM. If, in the case of a BOM, you need to remove these characters when reading, the codecs module in Python defines this constant:

The code is as follows:

#-*-Coding:utf-8-*-

Import Codecs

data = open ("Test.txt"). Read ()

If data[:3] = = codecs. Bom_utf8:

data = Data[3:]

Print Data.decode ("Utf-8")

Result: ABC Chinese

Look at the following example:

The code is as follows:

#-*-Coding:utf-8-*-

data = open ("Name_utf8.txt"). Read ()

U=data.decode ("Utf-8")

Print U[1:]

When you open a file in utf-8 format and read the Utf-8 string, the decoding becomes a Unicode object. However, the additional three characters will be converted into a Unicode character. The character cannot be printed. So for normal display, use u[1:] The way to filter to the first character.

Note: When handling Unicode Chinese strings, you must first call the Encode function on it and convert it to another encoded output.

Set the Python default encoding

The code is as follows:

Import Sys

Reload (SYS)

Sys.setdefaultencoding ("Utf-8")

Print sys.getdefaultencoding ()

Today I ran into the Python coding problem, the error message is as follows

The code is as follows:

Traceback (most recent call last):

File "Ntpath.pyc", line 108, in join

Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xa1 in position 36:ordinal not in range (128)

Obviously the current encoding is ASCII and cannot parse 0xa1 (decimal 161, exceeding the upper limit of 128). After entering the Python console, we found that the default encoding is indeed ASCII, and the verification process is:

The sys.setdefaultencoding () function cannot be invoked in python2.6 to modify the default encoding. Because Python invokes the site.py file at startup, the Setdefaultencoding method of SYS is removed when the default encoding is set in this file. Can no longer be invoked. After you have determined that SYS has been imported, you can reload the Sys module and then sys.setdefaultencoding (' UTF8 ')

The code is as follows:

Import Sys

Reload (SYS)

Sys.setdefaultencoding ("Utf-8")

Print sys.getdefaultencoding ()

Really works, according to Limodou, site.py is a script that is loaded by default after the Python interpreter starts. If you start with Python-s, the site.py will not be loaded automatically.

It's kind of long-winded.

==================================

How do you permanently set the default encoding to Utf-8? There are 2 different ways:

==================================

The first method < do not recommend: Edit site.py, modify setencoding () function, force set to Utf-8

The second method < recommendation: Add a name sitecustomize.py, recommended to store the path for the Site-packages directory

Sitecustomize.py was executed in site.py, because Sys.setdefaultencoding () was last deleted at site.py, so sitecustomize.py can be used Tdefaultencoding ().

The code is as follows:

Import Sys

Sys.setdefaultencoding (' Utf-8 ')

Since sitecustomize.py can be loaded automatically, you can set up some other things besides coding.

Encoding of strings

The code is as follows:

s1= ' Chinese '

Strings entered directly like the one above are processed according to code file encoding, and in the case of Unicode encoding, there are three ways to do this:

The code is as follows:

1 s1 = U ' Chinese '

2 s2 = Unicode (' Chinese ', ' GBK ')

3 S3 = S1.decode (' GBK ')

Unicode is a built-in function, and the second parameter indicates the encoding format of the source string.

Decode is a method of any string that converts a string to Unicode format, and a parameter indicates the encoding format of the source string.

Encode is also a method of any string that converts a string into a format specified by the parameter.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More