The version I used is python2.6. In fact, this problem has been solved in python3.0.
At the beginning, I was also broken down by this problem.
This is a good article to understand the problem:
(Turn): http://www.jb51.net/article/17560.htm
Why is the error "UnicodeEncodeError: 'ascii 'codec can't encode characters in position 0-1: ordinal not in range (128)" reported )"? This article will focus on this issue.
The character string is unicode encoded in Python. Therefore, during encoding and conversion, unicode is usually used as the intermediate encoding, that is, the other encoded strings are decoded into unicode, then, convert the unicode encoding (encode) into another encoding.
The function of decode is to convert other encoded strings to unicode encoding, such as str1.decode ('gb2312'), which means to convert the string str1 encoded in gb2312 to unicode encoding.
Encode is used to convert unicode to other encoded strings, for example, str2.encode ('gb2312'), which means to convert the unicode encoded string str2 to gb2312 encoding.
Therefore, during transcoding, you must first understand the encoding of the str string, decode into unicode, and then encode into other encodings.
The default encoding of strings in the Code is the same as that of the code file.
For example, s = 'Chinese'
If it is in the utf8 file, the string is UTF-8 encoded. If it is in the gb2312 file, it is encoded as gb2312. In this case, you must first use the decode method to convert the encoding to unicode encoding, and then use the encode method to convert it to other encoding. Generally, when no specific encoding method is specified, the system creates the code file by default encoding.
If the string is defined as follows: S = u'chinese'
Then the encoding of the string is specified as Unicode, that is, the internal encoding of Python, which is irrelevant to the encoding of the code file. Therefore, to perform encoding conversion in this case, you only need to directly use the encode method to convert it to the specified encoding.
If a string is Unicode and then decoded, an error occurs. Therefore, you must determine whether the encoding method is Unicode:
Isinstance (S, Unicode) # used to determine whether it is Unicode
Encode reports an error if STR is not encoded in unicode format.
How do I obtain the default encoding of the system?
#! /Usr/bin/ENV Python
# Coding = UTF-8
Import sys
Print sys. getdefaultencoding ()
The program output in this section is ASCII on Windows XP.
In some ides, the output of strings is always garbled or even incorrect. In fact, the IDE result output console itself cannot display the encoding of strings, rather than the program itself.
For example, run the following code in ulipad:
S = u "Chinese"
Print s
The following message is displayed: unicodeencodeerror: 'ascii 'codec can't encode characters in position 0-1: ordinal not in range (128 ). This is because the ulipad console information output window on Windows XP is output according to ASCII encoding (the default English system encoding is ASCII), and the strings in the above Code are unicode encoded, therefore, an error occurs during output.
Change the last sentence to print S. encode ('gb2312 ')
Then, the word "Chinese" can be correctly output.
If the last sentence is changed to print S. encode ('utf8 ')
The output is/xe4/xb8/XAD/xe6/x96/x87. This is the result of utf8 encoded strings in the console information output window according to ASCII encoding.
Unicode (STR, 'gb2312') is the same as Str. Decode ('gb2312 ').
You can use Str. _ class _ to view the STR encoding format.