Reading and Writing txt files using different codes in Python

Last Update:2015-05-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces how to read and write txt files using different codes in Python. This article provides code methods for reading and writing files under different codes. For more information, see

The Code is as follows:

Import OS

Import codecs

Filenames = OS. listdir (OS. getcwd ())

Out = file ("name.txt", "w ")

For filename in filenames:

Out. write (filename. decode ("gb2312"). encode ("UTF-8 "))

Out. close ()

Write the current directory and name of the execution file to the name.txt file and save it in UTF-8 format.

If it is saved in ANSI encoding, write the following code:

The Code is as follows:

Out. write (filename)

Open the file and write

REFERENCE The codecs module. I do not know about this module currently. Record the method here to learn about the functions and usage of this module.

The Code is as follows:

Import codecs

File = codecs. open ("lol.txt", "w", "UTF-8 ")

File. write (u "I ")

File. close ()

Read ANSI-encoded text files and UTF-8-encoded files

Read ANSI encoded files

Create a file named test.txt in ANSI format with the following content:

The Code is as follows:

Abc Chinese

Read data using python

The Code is as follows:

# Coding = gbk

Print open ("Test.txt"). read ()

Result: abc (Chinese)

Read UTF-8 encoded files (without BOM)

The file format into UTF-8:

The Code is as follows:

Result: abc Juan

Obviously, decoding is required here:

The Code is as follows:

#-*-Coding: UTF-8 -*-

Import codecs

Print open ("Test.txt"). read (). decode ("UTF-8 ")

Result: abc (Chinese)

Read UTF-8 encoded files (with BOM)

Some software inserts three invisible characters (0xEF 0xBB 0xBF, or BOM) at the beginning of the file by default when saving a UTF-8-encoded file ). Some software controls whether to insert BOM. If you need to remove these characters when reading a BOM, The codecs module in python defines the constant:

The Code is as follows:

#-*-Coding: UTF-8 -*-

Import codecs

Data = open ("Test.txt"). read ()

If data [: 3] = codecs. BOM_UTF8:

Data = data [3:]

Print data. decode ("UTF-8 ")

Result: abc (Chinese)

Let's look at the example below:

The Code is as follows:

#-*-Coding: UTF-8 -*-

Data = open ("name_utf8.txt"). read ()

U = data. decode ("UTF-8 ")

Print u [1:]

Open a file in UTF-8 format and read the UTF-8 string, and then decode it into a unicode object. However, the added three characters are converted to a unicode character. This character cannot be printed. For normal display, use the u [1:] method to filter the first character.

Note: When processing unicode Chinese strings, you must first call the encode function to convert it to other encoding outputs.

Set python default encoding

The Code is as follows:

Import sys

Reload (sys)

Sys. setdefaultencoding ("UTF-8 ")

Print sys. getdefaultencoding ()

Today I encountered a python encoding problem. The error message is as follows:

The Code is as follows:

Traceback (most recent call last ):

File "ntpath. pyc", line 108, in join

UnicodeDecodeError: 'ascii 'codec can't decode byte 0xa1 in position 36: ordinal not in range (128)

Obviously, the current encoding is ascii, and 0xa1 cannot be parsed (decimal: 161, exceeds the upper limit of 128). After Entering the python console, it is found that the default encoding is ascii, and the verification process is:

Sys. setdefaultencoding () function to modify the default encoding, because python calls site at startup. py file. After the default encoding is set in this file, the setdefaultencoding method of sys is deleted. It cannot be called again. After confirming that sys has been imported, You can reload the sys module and then sys. setdefaultencoding ('utf8 ')

The Code is as follows:

Import sys

Reload (sys)

Sys. setdefaultencoding ("UTF-8 ")

Print sys. getdefaultencoding ()

It does. According to limodou, site. py is a script loaded by default after the python interpreter is started. If it is started using python-S, site. py will not be automatically loaded.

The above is pretty cool.

========================================

How can I set the default encoding to UTF-8 permanently? There are two methods:

========================================

Method 1 <not recommended>: Edit site. py, modify the setencoding () function, and set it to UTF-8.

Method 2 <recommended>: Add sitecustomize. py. The recommended path is under the site-packages directory.

Sitecustomize. py is in site. py is executed by import because sys. setdefaultencoding () is in site. the last deleted by py, so you can click sitecustomize. py uses sys. setdefaultencoding ().

The Code is as follows:

Import sys

Sys. setdefaultencoding ('utf-8 ')

Since sitecustomize. py can be automatically loaded, you can set some other things besides encoding.

String Encoding

The Code is as follows:

S1 = 'Chinese'

Strings directly entered as above are processed according to the code file encoding. For unicode encoding, there are three methods:

The Code is as follows:

1 s1 = u'chinese'

2 s2 = unicode ('Chinese', 'gbk ')

3 s3 = s1.decode ('gbk ')

Unicode is a built-in function. The second parameter indicates the encoding format of the source string.

Decode is a method used by any string to convert the string to unicode format. The parameter indicates the encoding format of the source string.

Encode is also a method of any string. It converts a string to the format specified by the parameter.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Reading and Writing txt files using different codes in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Reading and Writing txt files using different codes in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support