A big summary of Python Chinese garbled problem

Source: Internet
Author: User
To run such a similar code:

#!/usr/bin/env pythons= "Chinese" print s


Recently, this problem has been frequently encountered:

Problem one: syntaxerror:non-ascii character ' \xe4 ' in the file E:\coding\python\Untitled 6.py on line 3, but no encoding declared; See http://www.python.org/peps/pep-0263.html for details

Question two: unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe5 in position 108:ordinal not in range (128)

Issue three: Unicodeencodeerror: ' gb2312 ' codec can ' t encode character U ' \u2014 ' in position 72366:illegal multibyte sequence

These are related to the character encoding problems, very depressed, Chinese is always unable to find a lot of solutions, here are some of the solutions I found a few days ago, take out to everyone to share the HA

The representation of a string inside Python is Unicode encoding, so in encoding conversion, it is usually necessary to use Unicode as the intermediate encoding, that is, decoding the other encoded string (decode) into Unicode first. From Unicode encoding (encode) to another encoding.

The role of Decode is to convert other encoded strings into Unicode encodings, such as Str1.decode (' gb2312 '), to convert gb2312 encoded string str1 into Unicode encoding.

The role of encode is to convert Unicode encoding into other encoded strings, such as Str2.encode (' gb2312 '), to convert Unicode encoded string str2 to gb2312 encoding.

In some Ides, the output of a string is always garbled, or even wrong, because the IDE's result output console itself cannot display the encoding of the string, rather than the problem of the program itself.

As in Ulipad, run the following code:

S=u "Chinese"

Print S

Prompt: Unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: Ordinal not in range (128). This is because ulipad on the English WindowsXP console information Output window is ASCII encoded output (the default encoding of the English system is ASCII), and the string in the above code is Unicode encoded, so the output generated an error.

Replace the last sentence with the following: Print S.encode (' gb2312 ')

Can correctly output "Chinese" two words.

If the last sentence should read: Print S.encode (' UTF8 ')

The output: \xe4\xb8\xad\xe6\x96\x87, which is the result of the Console Information Output window UTF8 encoded strings in ASCII encoded output.

The following code may be more generic, as follows:

#!/usr/bin/env python  #coding =utf-8  s= "Chinese" if isinstance (S, Unicode):     #s =u "Chinese"      Print S.encode (' gb2312 ') Else:     #s = "Chinese"      print s.decode (' utf-8 '). Encode (' gb2312 ') #!/usr/bin/env python#coding=utf-8s= "Chinese" if Isinstance (S, Unicode): #s =u "Chinese" Print s.encode (' gb2312 ') Else: #s = "Chinese" Print s.decode (' utf-8 '). Encode (' gb2312 ')


Take a look at the following code:

#!/usr/bin/env python  #coding =utf-8  #python version:2.7.4 #system: Windows XP    import httplib2def Getpagecontent (URL):    "'    uses HTTPLIB2 to programmatically retrieve Web content from a URL to    convert the contents of a bytes form into a utf-8 string ' '    # Using IE9 's user-agent, if not set user-agent will get 403 forbidden to access     headers={' user-agent ': ' mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; trident/5.0) ',            ' cache-control ': ' No-cache '    if URL:         response,content = httplib2. Http (). Request (Url,headers=headers)                     if Response.Status = =:            return content


Import sys  Reload (SYS)  sys.setdefaultencoding (' utf-8 ')   #修改默认编码方式, default to Ascci print Sys.getdefaultencoding ()   content = getpagecontent ("http://www.oschina.net/") Print Content.decode (' Utf-8 '). Encode (' gb2312 ') #!/usr/bin/env python#coding=utf-8#python version:2.7.4#system:windows xpimport httplib2def Getpagecontent (URL): "    use HTTPLIB2 programmatically to get Web content from a URL to convert the    contents of a bytes form into a utf-8 string    " '    # Using IE9 's user-agent, if not set user-agent will get 403 forbidden to access    headers={' user-agent ': ' mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; trident/5.0) ',            ' cache-control ': ' No-cache '    if URL:         response,content = httplib2. Http (). Request (Url,headers=headers)                   if Response.Status = =:            return content


Import sysreload (SYS) sys.setdefaultencoding (' Utf-8 ')   #修改默认编码方式, default = Ascciprint sys.getdefaultencoding () content = Getpagecontent ("http://www.oschina.net/") Print Content.decode (' utf-8 '). Encode (' gb2312 ')


The above code means: request his homepage to the Www.oschina.net website, (if it is utf-8 encoding, cannot output Chinese) want to encode the utf-8 to gd2312, there is a problem three

When I put it print content.decode (' utf-8 '). Encode (' gb2312 ') changed to print Content.decode (' Utf-8 '). When encode (' gb2312 ', ' ignore '), OK, can display Chinese, but not sure whether it is all, seemingly only part of it, some can not be encoded with gb2312

However, when I change the site to www.soso.com, do not switch to gb2312, with the utf-8 can display the normal Chinese

To summarize:

Outputting the SS directly to the file throws the same exception. When processing a Unicode Chinese string, you must first call the Encode function to convert it to another encoded output. This is the same for every environment. In Python, the "str" object is an array of bytes, and the contents are not a valid string, and the encoding (GBK, Utf-8, Unicode) of the string is unimportant. This content needs to be recorded and judged by the user. These restrictions also apply to "Unicode" objects. Remember that the content in the "Unicode" object is definitely not necessarily a valid Unicode string, and we'll see that in a few quick cases. On the console of Windows, GBK-encoded STR objects and Unicode-encoded Unicode objects are supported.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.