A big summary of Python Chinese garbled problem

Last Update:2016-10-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To run such a similar code:

#!/usr/bin/env pythons= "Chinese" print s

Recently, this problem has been frequently encountered:

Problem one: syntaxerror:non-ascii character ' \xe4 ' in the file E:\coding\python\Untitled 6.py on line 3, but no encoding declared; See http://www.python.org/peps/pep-0263.html for details

Question two: unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe5 in position 108:ordinal not in range (128)

Issue three: Unicodeencodeerror: ' gb2312 ' codec can ' t encode character U ' \u2014 ' in position 72366:illegal multibyte sequence

These are related to the character encoding problems, very depressed, Chinese is always unable to find a lot of solutions, here are some of the solutions I found a few days ago, take out to everyone to share the HA

The representation of a string inside Python is Unicode encoding, so in encoding conversion, it is usually necessary to use Unicode as the intermediate encoding, that is, decoding the other encoded string (decode) into Unicode first. From Unicode encoding (encode) to another encoding.

The role of Decode is to convert other encoded strings into Unicode encodings, such as Str1.decode (' gb2312 '), to convert gb2312 encoded string str1 into Unicode encoding.

The role of encode is to convert Unicode encoding into other encoded strings, such as Str2.encode (' gb2312 '), to convert Unicode encoded string str2 to gb2312 encoding.

In some Ides, the output of a string is always garbled, or even wrong, because the IDE's result output console itself cannot display the encoding of the string, rather than the problem of the program itself.

As in Ulipad, run the following code:

S=u "Chinese"

Print S

Prompt: Unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: Ordinal not in range (128). This is because ulipad on the English WindowsXP console information Output window is ASCII encoded output (the default encoding of the English system is ASCII), and the string in the above code is Unicode encoded, so the output generated an error.

Replace the last sentence with the following: Print S.encode (' gb2312 ')

Can correctly output "Chinese" two words.

If the last sentence should read: Print S.encode (' UTF8 ')

The output: \xe4\xb8\xad\xe6\x96\x87, which is the result of the Console Information Output window UTF8 encoded strings in ASCII encoded output.

The following code may be more generic, as follows:

#!/usr/bin/env python  #coding =utf-8  s= "Chinese" if isinstance (S, Unicode):     #s =u "Chinese"      Print S.encode (' gb2312 ') Else:     #s = "Chinese"      print s.decode (' utf-8 '). Encode (' gb2312 ') #!/usr/bin/env python#coding=utf-8s= "Chinese" if Isinstance (S, Unicode): #s =u "Chinese" Print s.encode (' gb2312 ') Else: #s = "Chinese" Print s.decode (' utf-8 '). Encode (' gb2312 ')

Take a look at the following code:

#!/usr/bin/env python  #coding =utf-8  #python version:2.7.4 #system: Windows XP    import httplib2def Getpagecontent (URL):    "'    uses HTTPLIB2 to programmatically retrieve Web content from a URL to    convert the contents of a bytes form into a utf-8 string ' '    # Using IE9 's user-agent, if not set user-agent will get 403 forbidden to access     headers={' user-agent ': ' mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; trident/5.0) ',            ' cache-control ': ' No-cache '    if URL:         response,content = httplib2. Http (). Request (Url,headers=headers)                     if Response.Status = =:            return content

Import sys  Reload (SYS)  sys.setdefaultencoding (' utf-8 ')   #修改默认编码方式, default to Ascci print Sys.getdefaultencoding ()   content = getpagecontent ("http://www.oschina.net/") Print Content.decode (' Utf-8 '). Encode (' gb2312 ') #!/usr/bin/env python#coding=utf-8#python version:2.7.4#system:windows xpimport httplib2def Getpagecontent (URL): "    use HTTPLIB2 programmatically to get Web content from a URL to convert the    contents of a bytes form into a utf-8 string    " '    # Using IE9 's user-agent, if not set user-agent will get 403 forbidden to access    headers={' user-agent ': ' mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; trident/5.0) ',            ' cache-control ': ' No-cache '    if URL:         response,content = httplib2. Http (). Request (Url,headers=headers)                   if Response.Status = =:            return content

Import sysreload (SYS) sys.setdefaultencoding (' Utf-8 ')   #修改默认编码方式, default = Ascciprint sys.getdefaultencoding () content = Getpagecontent ("http://www.oschina.net/") Print Content.decode (' utf-8 '). Encode (' gb2312 ')

The above code means: request his homepage to the Www.oschina.net website, (if it is utf-8 encoding, cannot output Chinese) want to encode the utf-8 to gd2312, there is a problem three

When I put it print content.decode (' utf-8 '). Encode (' gb2312 ') changed to print Content.decode (' Utf-8 '). When encode (' gb2312 ', ' ignore '), OK, can display Chinese, but not sure whether it is all, seemingly only part of it, some can not be encoded with gb2312

However, when I change the site to www.soso.com, do not switch to gb2312, with the utf-8 can display the normal Chinese

To summarize:

Outputting the SS directly to the file throws the same exception. When processing a Unicode Chinese string, you must first call the Encode function to convert it to another encoded output. This is the same for every environment. In Python, the "str" object is an array of bytes, and the contents are not a valid string, and the encoding (GBK, Utf-8, Unicode) of the string is unimportant. This content needs to be recorded and judged by the user. These restrictions also apply to "Unicode" objects. Remember that the content in the "Unicode" object is definitely not necessarily a valid Unicode string, and we'll see that in a few quick cases. On the console of Windows, GBK-encoded STR objects and Unicode-encoded Unicode objects are supported.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A big summary of Python Chinese garbled problem

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A big summary of Python Chinese garbled problem

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support