Simple Solution for Chinese encoding of Python files

Last Update:2018-07-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article describes how to solve the Chinese encoding problem of Python files. For more information, see Read/write Chinese

To read a UTF-8 encoded Chinese file, use the sublime text software to change it to a non-DOM encoding, and then use the following code:

with codecs.open(note_path, 'r+','utf-8') as f:line=f.readline()print line

In this way, the Chinese characters in the file can be correctly read.

Similarly, if you want to write Chinese characters to the created file, it is better to be similar to the above:

with codecs.open(st,'a+','utf-8') as book_note:book_note.write(st)

Create a Chinese File

Create a file with the read characters as the file name.

If you create a file using the string read above, the following error occurs:

st=digest_path+"\\"+onenote[0]+".txt"print stwith open(st,'a+') as book_note:

After debugging, it should be the last line break problem. When a name is generated, You can trip the character to get the file:

st=digest_path+"\\"+onenote[0].strip()+".txt"

The problem of Chinese encoding is that Chinese programmers often have a big problem. This is also true in python. How can we understand and solve the problem of python encoding?

We need to know that python uses unicode encoding internally, while the external is faced with a variety of strange encodings, such as gbk, gb2312, and utf8, which are frequently used by Chinese programs, how are these encodings converted to internal unicode?

First, let's take a look at the use of strings in the source code file. The source code file, as a text file, must store the code in some encoding form. By default, python considers the source code file as asci encoding. For example, the Code has a variable value assignment:

s1='a' print s1

Python regards this 'A' as an asci encoded character. When only English characters are used, everything works normally. However, if Chinese characters are used, for example:

S1 = 'har' print s1

An error occurs when the code file is executed, that is, the code is wrong. Python treats the content of the code file as asci encoding by default, but the asci encoding does not contain Chinese characters, so an exception is thrown.

The solution is to let python know the encoding format used in the file. for Chinese, common encodings such as UTF-8, gbk, and gb2312 can be used. You only need to add the following at the front end of the code file:

# -*- coding: utf-8 -*-

This tells python that the text in this file is encoded with UTF-8. In this way, python will interpret the characters in the file according to the UTF-8 encoding format and convert them into unicode encoding for internal processing.

However, if you run this code on the Windows console, even though the program is executed, the screen is not printed. This is because the python encoding is inconsistent with the console encoding. Encoding used in the Windows Console

It is gbk, And the UTF-8 used in the code, python prints to the gbk encoding console according to UTF-8 encoding, it will naturally be inconsistent and cannot print the correct Chinese characters.

One solution is to change the source code encoding to gbk, that is, to the first line of the Code:

# -*- coding: gbk -*-

Another way is to keep the source code file UTF-8 unchanged, but add a u character before 'har', that is:

S1 = u'har' print s1

In this way, you can print the word 'ha' correctly.

Here, this utable stores the strings that follow in unicode format. Python recognizes the Chinese character 'ha' in the code based on the nominal UTF-8 encoding in the first line of the code and converts it to a unicode object. If we use type to check the data type ('har') of 'har', we will get And type (u'ha') will get , That is, adding u before the character indicates that this is a unicode object. This word will exist in the memory in unicode format. If u is not added, it indicates that this is only an encoded string. The encoding format depends on python's recognition of the source code file encoding. Here it is UTF-8.

When Python outputs a unicode object to the console, it automatically converts the object according to the encoding in the output environment. However, if the output is not a unicode object but a common string, the output string is directly encoded according to the string, as a result, the above phenomenon occurs.

In addition to using the u mark, you can also use the unicode class and the string's encode and decode methods.

The constructor of the unicode class accepts a string parameter and an encoding parameter, and encapsulates the string as a unicode. For example, we use UTF-8 encoding, therefore, the encoding parameters in unicode use 'utf-8' to encapsulate characters

Unicode object, and then output it to the console correctly:

S1 = unicode ('har', 'utf-8') print s1

In addition, you can use the decode function to convert a common string to a unicode object. Many people cannot understand what the decode and encode functions of python strings mean. Here is a brief description.

Decode parses a common string according to the encoding format in the parameter and generates the corresponding unicode object. For example, here our code uses UTF-8, to convert a string to unicode is in the following format:

S2 = 'har'. decode ('utf-8 ′)

At this time, s2 is a unicode object that stores the 'har' character. In fact, it is the same as unicode ('har', 'utf-8') and u'har.

The encode function is the opposite. It converts a unicode object to a common character in the parameter encoding format. For example, the following code:

S3 = unicode ('har', 'utf-8'). encode ('utf-8 ′)

S3 now returns the UTF-8 'ha '.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Simple Solution for Chinese encoding of Python files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Simple Solution for Chinese encoding of Python files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support