Python beginners must encounter problems-encode and decode, Chinese garbled characters

Source: Internet
Author: User

The version I used is python2.6. In fact, this problem has been solved in python3.0.

At the beginning, I was also broken down by this problem.


This is a good article to understand the problem:



Why is the error "UnicodeEncodeError: 'ascii 'codec can't encode characters in position 0-1: ordinal not in range (128)" reported )"? This article will focus on this issue.
The character string is unicode encoded in Python. Therefore, during encoding and conversion, unicode is usually used as the intermediate encoding, that is, the other encoded strings are decoded into unicode, then, convert the unicode encoding (encode) into another encoding.

The function of decode is to convert other encoded strings to unicode encoding, such as str1.decode ('gb2312'), which means to convert the string str1 encoded in gb2312 to unicode encoding.

Encode is used to convert unicode to other encoded strings, for example, str2.encode ('gb2312'), which means to convert the unicode encoded string str2 to gb2312 encoding.

Therefore, during transcoding, you must first understand the encoding of the str string, decode into unicode, and then encode into other encodings.

The default encoding of strings in the Code is the same as that of the code file.

For example, s = 'Chinese'

If it is in the utf8 file, the string is UTF-8 encoded. If it is in the gb2312 file, it is encoded as gb2312. In this case, you must first use the decode method to convert the encoding to unicode encoding, and then use the encode method to convert it to other encoding. Generally, when no specific encoding method is specified, the system creates the code file by default encoding.

If the string is defined as follows: S = u'chinese'

Then the encoding of the string is specified as Unicode, that is, the internal encoding of Python, which is irrelevant to the encoding of the code file. Therefore, to perform encoding conversion in this case, you only need to directly use the encode method to convert it to the specified encoding.

If a string is Unicode and then decoded, an error occurs. Therefore, you must determine whether the encoding method is Unicode:

Isinstance (S, Unicode) # used to determine whether it is Unicode

Encode reports an error if STR is not encoded in unicode format.

How do I obtain the default encoding of the system?

#! /Usr/bin/ENV Python
# Coding = UTF-8
Import sys
Print sys. getdefaultencoding ()

The program output in this section is ASCII on Windows XP.

In some ides, the output of strings is always garbled or even incorrect. In fact, the IDE result output console itself cannot display the encoding of strings, rather than the program itself.

For example, run the following code in ulipad:

S = u "Chinese"
Print s

The following message is displayed: unicodeencodeerror: 'ascii 'codec can't encode characters in position 0-1: ordinal not in range (128 ). This is because the ulipad console information output window on Windows XP is output according to ASCII encoding (the default English system encoding is ASCII), and the strings in the above Code are unicode encoded, therefore, an error occurs during output.

Change the last sentence to print S. encode ('gb2312 ')

Then, the word "Chinese" can be correctly output.

If the last sentence is changed to print S. encode ('utf8 ')

The output is/xe4/xb8/XAD/xe6/x96/x87. This is the result of utf8 encoded strings in the console information output window according to ASCII encoding.

Unicode (STR, 'gb2312') is the same as Str. Decode ('gb2312 ').

You can use Str. _ class _ to view the STR encoding format.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.