The encode and decode of Python strings the method of solving garbled problems _python

Source: Internet
Author: User

Why do you get an error? Unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: Ordinal not in range (128) "? This article is to study the problem.

The representation of strings inside Python is Unicode encoding, so in encoding conversions, Unicode is usually used as the intermediate encoding, that is, the other encoded strings are decoded (decode) into Unicode. Again from Unicode encoding (encode) into another encoding.

The role of Decode is to convert other encoded strings into Unicode encoding, such as Str1.decode (' gb2312 '), which means converting the gb2312 encoded string str1 to Unicode encoding.

The role of encode is to convert Unicode encoding to other encoded strings, such as Str2.encode (' gb2312 '), which means converting a Unicode-encoded string str2 to a gb2312 encoding.

Therefore, it is important to figure out what the string STR is encoded in, and then decode into Unicode and then encode into another encoding

The default encoding for strings in code is the same as the encoding of the code file itself.

such as: s= ' Chinese '

If it is in a UTF8 file, the string is UTF8 encoded and, if it is in a gb2312 file, it is encoded as gb2312. In this case, to encode the conversion, you need to first convert it to Unicode encoding using the Decode method, and then use the Encode method to convert it to another encoding. Typically, a code file created using the system default encoding is used when no specific encoding is specified.

If the string is so defined: S=u ' Chinese '

The encoding of the string is specified as Unicode, which is the internal encoding of Python, regardless of the encoding of the code file itself. Therefore, for this case to do the encoding conversion, just use the Encode method directly to convert it to the specified encoding.

If a string is already Unicode, then the decoding is an error, so it is usually judged by whether the encoding is Unicode:

Isinstance (S, Unicode) # used to determine if Unicode

Str in non-Unicode encoded form encode will complain.

How do I get the default encoding for a system?

#!/usr/bin/env python
#coding =utf-8
Import Sys
Print sys.getdefaultencoding ()

This procedure is printed on English Windows XP: ASCII

In some Ides, the output of a string is always garbled, or even wrong, because the IDE's result output console itself cannot display the string's encoding, rather than the program's own problem.

If you run the following code in Ulipad:

S=u "Chinese"
Print S

Will prompt: Unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: Ordinal not in range (128). This is because ulipad the console Information Output window on Windows XP is output in ASCII encoding (the default encoding for the English system is ASCII), and the string in the above code is Unicode encoded, so the output error occurs.

Replace the last sentence with the following: Print S.encode (' gb2312 ')

Can correctly output "Chinese" two words.

If the last sentence should read: Print S.encode (' UTF8 ')

Output: \xe4\xb8\xad\xe6\x96\x87, which is the result of the console Information Output window output UTF8 encoded strings in ASCII encoding.

Unicode (str, ' gb2312 ') is the same as Str.decode (' gb2312 '), which converts the gb2312-encoded STR to Unicode encoding

Using str.__class__, you can view the encoded form of STR

Principle said for a long time, finally a package cure all diseases of the bar:

Copy Code code as follows:

#!/usr/bin/env python
#coding =utf-8
s= "Chinese"

If Isinstance (S, Unicode):
#s =u "Chinese"
Print S.encode (' gb2312 ')
Else
#s = "Chinese"
Print S.decode (' utf-8 '). Encode (' gb2312 ')

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.