Why do you get an error? Unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: Ordinal not in range (128) "? This article is to study the problem.
The representation of strings inside Python is Unicode encoding, so in encoding conversions, Unicode is usually used as the intermediate encoding, that is, the other encoded strings are decoded (decode) into Unicode. Again from Unicode encoding (encode) into another encoding.
The role of Decode is to convert other encoded strings into Unicode encoding, such as Str1.decode (' gb2312 '), which means converting the gb2312 encoded string str1 to Unicode encoding.
The role of encode is to convert Unicode encoding to other encoded strings, such as Str2.encode (' gb2312 '), which means converting a Unicode-encoded string str2 to a gb2312 encoding.
Therefore, it is important to figure out what the string STR is encoded in, and then decode into Unicode and then encode into another encoding
The default encoding for strings in code is the same as the encoding of the code file itself.
such as: s= ' Chinese '
If it is in a UTF8 file, the string is UTF8 encoded and, if it is in a gb2312 file, it is encoded as gb2312. In this case, to encode the conversion, you need to first convert it to Unicode encoding using the Decode method, and then use the Encode method to convert it to another encoding. Typically, a code file created using the system default encoding is used when no specific encoding is specified.
If the string is so defined: S=u ' Chinese '
The encoding of the string is specified as Unicode, which is the internal encoding of Python, regardless of the encoding of the code file itself. Therefore, for this case to do the encoding conversion, just use the Encode method directly to convert it to the specified encoding.
If a string is already Unicode, then the decoding is an error, so it is usually judged by whether the encoding is Unicode:
Isinstance (S, Unicode) # used to determine if Unicode
Str in non-Unicode encoded form encode will complain.
How do I get the default encoding for a system?
#!/usr/bin/env python
#coding =utf-8
Import Sys
Print sys.getdefaultencoding ()
This procedure is printed on English Windows XP: ASCII
In some Ides, the output of a string is always garbled, or even wrong, because the IDE's result output console itself cannot display the string's encoding, rather than the program's own problem.
If you run the following code in Ulipad:
S=u "Chinese"
Print S
Will prompt: Unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: Ordinal not in range (128). This is because ulipad the console Information Output window on Windows XP is output in ASCII encoding (the default encoding for the English system is ASCII), and the string in the above code is Unicode encoded, so the output error occurs.
Replace the last sentence with the following: Print S.encode (' gb2312 ')
Can correctly output "Chinese" two words.
If the last sentence should read: Print S.encode (' UTF8 ')
Output: \xe4\xb8\xad\xe6\x96\x87, which is the result of the console Information Output window output UTF8 encoded strings in ASCII encoding.
Unicode (str, ' gb2312 ') is the same as Str.decode (' gb2312 '), which converts the gb2312-encoded STR to Unicode encoding
Using str.__class__, you can view the encoded form of STR
Principle said for a long time, finally a package cure all diseases of the bar:
Copy Code code as follows:
#!/usr/bin/env python
#coding =utf-8
s= "Chinese"
If Isinstance (S, Unicode):
#s =u "Chinese"
Print S.encode (' gb2312 ')
Else
#s = "Chinese"
Print S.decode (' utf-8 '). Encode (' gb2312 ')