By reading and writing text files, we can summarize "How python handles Chinese encoding ",

Source: Internet
Author: User

By reading and writing text files, we can summarize "How python handles Chinese encoding ",
I. Introduction

No matter what programming language you want to learn, the string data type is always very important. However, I recently learned the python language and want to display Chinese characters. There are always various garbled characters. As a result, I checked a lot of information on the Internet and tried a lot of methods, sometimes it can be displayed normally, sometimes it is indeed garbled, so that I can not understand. So I used python to read and write Chinese text files to try to explore the Chinese encoding problem in python. Fortunately, the Chinese data in the text can be normally read and displayed, and the Chinese result data can be written into the text file. However, this article only summarizes the problem of handling Chinese Garbled text and does not reveal the encoding principles. Let's get started.

2. Preparations

1. Create a text file first.(The encoding method is ascii), The content of the text file is as follows:

Number, rainfall, site location
1, 10.2, Nanjing
2, 45, Beijing
3, 78, Shanghai

2. Create a data layer for each row of the file, that is, the storage record class.

class Rain:    def __init__(self,id,acc,site):        self.id=id        self.acc=acc        self.site=site
Iii. Errors and corrections   1. Error 1 

First, we need to create a py file to write our code. After creating my py file, I did not write any code. I just wrote two lines of comments and saved them. Then I found an error in the Console result box below.

The Code is as follows:

  

  Cause analysis:

In the original Python source code, the default encoding is ascii, and the comments in my source code file contain Chinese characters, which are only characters that cannot be expressed by ascii codes, the following errors may occur when interpreted by the python interpreter.

  Solution:

In the first or second line of the file,It can only be the first and the second line with the following code:

  #coding:utf-8

 

This line of code means that the interpreter can use UTF-8 to interpret the source code file.

  2. Error 2

After a py file is created, the file in the text will be read. The Code is as follows:

F = open ("raindata.txt", "r") f. readline () # The first row is a column. You can move the file to the beginning of the second row for line in f: print line.

 

The results are garbled in Chinese, as shown below:

1��10.2���Ͼ�

2��45������

3��78���Ϻ�

  Cause:

Because the text in a txt file is not ascii encoded, the string to be read must be decoded before it can be properly displayed.

  Solution:

You only need to decode the file to be read. The Code is as follows:

F = open ("raindata.txt", "r") f. readline () # The first row is a column. You can move the file to the beginning of the second row for line in f: print line. decode ("gb2312 ")

 

The result is as follows:

1, 10.2, Nanjing

2, 45, Beijing

3, 78, Shanghai

   3. Error 3

After correctly reading the data, you need to store the data of each row in the object. The Code is as follows:

F = open ("raindata.txt", "r") f. readline () # The first row is a column. You can move the file to the beginning of the second row for line in f: lines = line. decode ("gb2312 "). split (",") obj = Rain (lines [0], lines [1], lines [2]) data. append (obj)

 

The split (",") method error occurs, prompting that the parameters in this method are not encoded in Chinese. As follows:

lines=line.decode("gb2312").split(",")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

  Cause:

I read the solution of this error on the internet, saying that because our solution changes the source code of py to UTF-8 encoding, all Chinese characters in the file can be solved, however, if the method used to make the methods in other modules appear in Chinese, an error is prompted.

  Solution:

Now that you know the cause, the solution is to change the default encoding mode of the entire environment to UTF-8. The changed code is as follows:

  

Import sysdefault_encoding = "UTF-8" if (default_encoding! = Sys. getdefaultencoding (): reload (sys) sys. setdefaultencoding (default_encoding) data = [] f = open ("raindata.txt", "r") f. readline () # The first row is a column. You can move the file to the beginning of the second row for line in f: lines = line. decode ("gb2312 "). split (",") obj = Rain (lines [0], lines [1], lines [2]) data. append (obj)
F. close () print len (data)

 

This will solve the problem.

   4. Error 4

After the encoding method of the text file is changed to UTF-8, the above Code has an error. The error is as follows:

Traceback (most recent call last):
File "D:\program\Java\PythonOne\src\Test\FileHandle.py", line 24, in <module>
lines=line.decode("gb2312").split(",")
UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 3-4: illegal multibyte sequence

  Cause:

This is because the original file encoding is UTF-8, so the gb2312 encoding method is used to decode it. No doubt, it must be an error. However, we have already set the default encoding method of the system to UTF-8, so we only need to remove the read text from the gb2312 encoding method. The Code is as follows:

Import sysdefault_encoding = "UTF-8" if (default_encoding! = Sys. getdefaultencoding (): reload (sys) sys. setdefaultencoding (default_encoding) data = [] f = open ("raindata.txt", "r") f. readline () # The first row is a column. You can move the file to the beginning of the second row for line in f: lines = line. split (",") obj = Rain (lines [0], lines [1], lines [2]) data. append (obj) f. close ()

 

This solves the problem of UTF-8 encoding for text files.

   5. Error 5:

After reading text data, write the data to the text file. The Code is as follows:

The Code is as follows:

f1=open('result.txt','w')for vs in data:    f1.write(vs.id+","+vs.acc+","+vs.site)    f1.write("\n")

 

This data writing code can be divided into two situations.

1. the interpreter's default encoding is UTF-8, while the text file encoding is also UTF-8, which is directly written. The results written to the txt file will not be garbled.

2. text files are encoded in ascii format and need to be encoded before writing. The changed code is as follows:

f1=open('result.txt','w')for vs in data:    f1.write((vs.id+","+vs.acc+","+vs.site).encode("gb2312"))    f1.write("\n")fl.close()

So far, python can be used to read and write files. However, it is not a garbled problem to print Chinese characters in the list.

   6. Error 6

When printing a list containing Chinese characters, Chinese characters cannot be effectively output, but UTF-8 encoding is used. The Code is as follows:

Strs = ['hello', 'Hello'] print strs

 

The result is as follows:

 ['\xe4\xbd\xa0\xe5\xa5\xbd', 'hello']

Solution:

There is no problem with outputting an item in the list. The Code is as follows:

Strs = ['hello', 'Hello'] print strs [0], sts [1]

Result:

Hello
Iv. Summary

After trying so many errors, I finally correctly read the text data and successfully write the data to the text file. To sum up the following points.

 
 
  • Encoding method in source code
  • Default encoding method in the environment
  • Encoding method in the result File

After all, I am a beginner in python. The principle of the above solution may be wrong. I hope you can point it out in time after seeing the error. I am very grateful.

 

  

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.