Read and write txt files in Python using different encodings

Source: Internet
Author: User
Copy CodeThe code is as follows:


Import OS
Import Codecs
Filenames=os.listdir (OS.GETCWD ())

Out=file ("Name.txt", "W")
For filename in filenames:
Out.write (Filename.decode ("gb2312"). Encode ("Utf-8"))
Out.close ()

Writes the current directory and file name of the executing file to the Name.txt file and saves it in the UTF-8 format
If you save with ANSI encoding, write with the following code:
Copy the Code code as follows:


Out.write (filename)

Open the file and write

Refer to the Codecs module, which is not currently known to the module. The method is recorded here, the function and usage of the module are available.
Copy the Code code as follows:


Import Codecs
File=codecs.open ("Lol.txt", "W", "Utf-8")
File.write (U "i")
File.close ()

Read ANSI-encoded text files and Utf-8 encoded files

Reading ANSI-encoded files

Create a file test.txt, file format with ANSI, content:
Copy the Code code as follows:


ABC Chinese

Using Python to read
Copy the Code code as follows:


# CODING=GBK
Print open ("Test.txt"). Read ()


Result: ABC Chinese
Read UTF-8 encoded files (no BOM)
Change the file format to UTF-8:
Copy CodeThe code is as follows:


Result: ABC Juan PO


Clearly, this needs to be decoded:
Copy CodeThe code is as follows:


#-*-Coding:utf-8-*-
Import Codecs
Print open ("Test.txt"). Read (). Decode ("Utf-8")


Result: ABC Chinese

Read Utf-8 encoded file (with BOM)

Some software, when saving a UTF-8 encoded file, by default inserts three invisible characters (0xEF 0xBB 0xBF, or BOM) where the file begins. In some software, you can control whether the BOM is inserted. If, in the case of a BOM, you need to remove these characters at the time of reading, the codecs module in Python defines this constant:
Copy the Code code as follows:


#-*-Coding:utf-8-*-
Import Codecs
data = open ("Test.txt"). Read ()
If data[:3] = = codecs. Bom_utf8:
data = Data[3:]
Print Data.decode ("Utf-8")

Result: ABC Chinese
Look at the following example:
Copy the Code code as follows:


#-*-Coding:utf-8-*-
data = open ("Name_utf8.txt"). Read ()
U=data.decode ("Utf-8")
Print U[1:]

After opening a file in utf-8 format and reading the Utf-8 string, the decoding becomes a Unicode object. However, the additional three characters are converted into a Unicode character. The character cannot be printed. So for normal display, use u[1:] to filter to the first character.
Note: When processing a Unicode Chinese string, you must first call the Encode function to convert it to another encoded output.

Set Python default encoding

Copy the Code code as follows:


Import Sys
Reload (SYS)
Sys.setdefaultencoding ("Utf-8")
Print sys.getdefaultencoding ()

Today, I encountered a python coding problem, the error message is as follows
Copy the Code code as follows:


Traceback (most recent):
File "Ntpath.pyc", line 108, in join
Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xa1 in position 36:ordinal not in range (128)

Obviously the current encoding is ASCII, unable to parse 0xa1 (decimal 161, exceeding the upper limit of 128). After entering the Python console, it is true that the default encoding is ASCII and the verification process is:
The sys.setdefaultencoding () function cannot be called in python2.6 to modify the default encoding. Because Python invokes the site.py file when it is started, the setdefaultencoding method of the SYS is removed after the default encoding is set in this file. Can no longer be called. After determining that SYS has been imported, you can reload the Sys module and then sys.setdefaultencoding (' UTF8 ')

Copy the Code code as follows:


Import Sys
Reload (SYS)
Sys.setdefaultencoding ("Utf-8")
Print sys.getdefaultencoding ()

Really effective, according to Limodou, site.py is a script that is loaded by default after the Python interpreter starts. If you start with Python-s, the site.py will not be loaded automatically.

It's a very verbose writing.

==================================
How do I permanently set the default encoding to Utf-8? There are 2 ways to do this:
==================================

The first method <不推荐> : Edit site.py, modify the Setencoding () function, force set to Utf-8
The second method <推荐> : Add a named sitecustomize.py, the recommended path is the Site-packages directory
sitecustomize.py is executed in site.py by import, because Sys.setdefaultencoding () is the last delete in site.py, so you can use sitecustomize.py in sys.se Tdefaultencoding ().
Copy the Code code as follows:


Import Sys
Sys.setdefaultencoding (' Utf-8 ')


Since sitecustomize.py can be loaded automatically, there are other things that can be set up in addition to coding.
Encoding of strings
Copy CodeThe code is as follows:


s1= ' Chinese '


The string entered directly as above is processed according to the code file encoding, and if it is Unicode encoded, there are three ways to do this:
Copy CodeThe code is as follows:


1 s1 = U ' Chinese '
2 s2 = Unicode (' Chinese ', ' GBK ')
3 S3 = S1.decode (' GBK ')


Unicode is a built-in function, and the second parameter indicates the encoding format of the source string.
Decode is any string that has a method that converts a string into Unicode format, and the parameter indicates the encoding format of the source string.
Encode is also a method of any string that converts a string into the format specified by the parameter.
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.