Simple Solution for Chinese encoding of Python files

Source: Internet
Author: User
This article describes how to solve the Chinese encoding problem of Python files. For more information, see Read/write Chinese

To read a UTF-8 encoded Chinese file, use the sublime text software to change it to a non-DOM encoding, and then use the following code:

with codecs.open(note_path, 'r+','utf-8') as f:line=f.readline()print line

In this way, the Chinese characters in the file can be correctly read.

Similarly, if you want to write Chinese characters to the created file, it is better to be similar to the above:

with codecs.open(st,'a+','utf-8') as book_note:book_note.write(st)

Create a Chinese File

Create a file with the read characters as the file name.

If you create a file using the string read above, the following error occurs:

st=digest_path+"\\"+onenote[0]+".txt"print stwith open(st,'a+') as book_note:

After debugging, it should be the last line break problem. When a name is generated, You can trip the character to get the file:

st=digest_path+"\\"+onenote[0].strip()+".txt"

The problem of Chinese encoding is that Chinese programmers often have a big problem. This is also true in python. How can we understand and solve the problem of python encoding?

We need to know that python uses unicode encoding internally, while the external is faced with a variety of strange encodings, such as gbk, gb2312, and utf8, which are frequently used by Chinese programs, how are these encodings converted to internal unicode?

First, let's take a look at the use of strings in the source code file. The source code file, as a text file, must store the code in some encoding form. By default, python considers the source code file as asci encoding. For example, the Code has a variable value assignment:

s1='a' print s1

Python regards this 'A' as an asci encoded character. When only English characters are used, everything works normally. However, if Chinese characters are used, for example:

S1 = 'har' print s1

An error occurs when the code file is executed, that is, the code is wrong. Python treats the content of the code file as asci encoding by default, but the asci encoding does not contain Chinese characters, so an exception is thrown.

The solution is to let python know the encoding format used in the file. for Chinese, common encodings such as UTF-8, gbk, and gb2312 can be used. You only need to add the following at the front end of the code file:

# -*- coding: utf-8 -*-

This tells python that the text in this file is encoded with UTF-8. In this way, python will interpret the characters in the file according to the UTF-8 encoding format and convert them into unicode encoding for internal processing.

However, if you run this code on the Windows console, even though the program is executed, the screen is not printed. This is because the python encoding is inconsistent with the console encoding. Encoding used in the Windows Console

It is gbk, And the UTF-8 used in the code, python prints to the gbk encoding console according to UTF-8 encoding, it will naturally be inconsistent and cannot print the correct Chinese characters.

One solution is to change the source code encoding to gbk, that is, to the first line of the Code:

# -*- coding: gbk -*-

Another way is to keep the source code file UTF-8 unchanged, but add a u character before 'har', that is:

S1 = u'har' print s1

In this way, you can print the word 'ha' correctly.

Here, this utable stores the strings that follow in unicode format. Python recognizes the Chinese character 'ha' in the code based on the nominal UTF-8 encoding in the first line of the code and converts it to a unicode object. If we use type to check the data type ('har') of 'har', we will get And type (u'ha') will get , That is, adding u before the character indicates that this is a unicode object. This word will exist in the memory in unicode format. If u is not added, it indicates that this is only an encoded string. The encoding format depends on python's recognition of the source code file encoding. Here it is UTF-8.

When Python outputs a unicode object to the console, it automatically converts the object according to the encoding in the output environment. However, if the output is not a unicode object but a common string, the output string is directly encoded according to the string, as a result, the above phenomenon occurs.

In addition to using the u mark, you can also use the unicode class and the string's encode and decode methods.

The constructor of the unicode class accepts a string parameter and an encoding parameter, and encapsulates the string as a unicode. For example, we use UTF-8 encoding, therefore, the encoding parameters in unicode use 'utf-8' to encapsulate characters

Unicode object, and then output it to the console correctly:

S1 = unicode ('har', 'utf-8') print s1

In addition, you can use the decode function to convert a common string to a unicode object. Many people cannot understand what the decode and encode functions of python strings mean. Here is a brief description.

Decode parses a common string according to the encoding format in the parameter and generates the corresponding unicode object. For example, here our code uses UTF-8, to convert a string to unicode is in the following format:

S2 = 'har'. decode ('utf-8 ′)

At this time, s2 is a unicode object that stores the 'har' character. In fact, it is the same as unicode ('har', 'utf-8') and u'har.

The encode function is the opposite. It converts a unicode object to a common character in the parameter encoding format. For example, the following code:

S3 = unicode ('har', 'utf-8'). encode ('utf-8 ′)

S3 now returns the UTF-8 'ha '.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.