To run such a similar code:
#!/usr/bin/env pythons= "Chinese" print s
Recently, this problem has been frequently encountered:
Problem one: syntaxerror:non-ascii character ' \xe4 ' in the file E:\coding\python\Untitled 6.py on line 3, but no encoding declared; See http://www.python.org/peps/pep-0263.html for details
Question two: unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe5 in position 108:ordinal not in range (128)
Issue three: Unicodeencodeerror: ' gb2312 ' codec can ' t encode character U ' \u2014 ' in position 72366:illegal multibyte sequence
These are related to the character encoding problems, very depressed, Chinese is always unable to find a lot of solutions, here are some of the solutions I found a few days ago, take out to everyone to share the HA
The representation of a string inside Python is Unicode encoding, so in encoding conversion, it is usually necessary to use Unicode as the intermediate encoding, that is, decoding the other encoded string (decode) into Unicode first. From Unicode encoding (encode) to another encoding.
The role of Decode is to convert other encoded strings into Unicode encodings, such as Str1.decode (' gb2312 '), to convert gb2312 encoded string str1 into Unicode encoding.
The role of encode is to convert Unicode encoding into other encoded strings, such as Str2.encode (' gb2312 '), to convert Unicode encoded string str2 to gb2312 encoding.
In some Ides, the output of a string is always garbled, or even wrong, because the IDE's result output console itself cannot display the encoding of the string, rather than the problem of the program itself.
As in Ulipad, run the following code:
S=u "Chinese"
Print S
Prompt: Unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: Ordinal not in range (128). This is because ulipad on the English WindowsXP console information Output window is ASCII encoded output (the default encoding of the English system is ASCII), and the string in the above code is Unicode encoded, so the output generated an error.
Replace the last sentence with the following: Print S.encode (' gb2312 ')
Can correctly output "Chinese" two words.
If the last sentence should read: Print S.encode (' UTF8 ')
The output: \xe4\xb8\xad\xe6\x96\x87, which is the result of the Console Information Output window UTF8 encoded strings in ASCII encoded output.
The following code may be more generic, as follows:
#!/usr/bin/env python #coding =utf-8 s= "Chinese" if isinstance (S, Unicode): #s =u "Chinese" Print S.encode (' gb2312 ') Else: #s = "Chinese" print s.decode (' utf-8 '). Encode (' gb2312 ') #!/usr/bin/env python#coding=utf-8s= "Chinese" if Isinstance (S, Unicode): #s =u "Chinese" Print s.encode (' gb2312 ') Else: #s = "Chinese" Print s.decode (' utf-8 '). Encode (' gb2312 ')
Take a look at the following code:
#!/usr/bin/env python #coding =utf-8 #python version:2.7.4 #system: Windows XP import httplib2def Getpagecontent (URL): "' uses HTTPLIB2 to programmatically retrieve Web content from a URL to convert the contents of a bytes form into a utf-8 string ' ' # Using IE9 's user-agent, if not set user-agent will get 403 forbidden to access headers={' user-agent ': ' mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; trident/5.0) ', ' cache-control ': ' No-cache ' if URL: response,content = httplib2. Http (). Request (Url,headers=headers) if Response.Status = =: return content
Import sys Reload (SYS) sys.setdefaultencoding (' utf-8 ') #修改默认编码方式, default to Ascci print Sys.getdefaultencoding () content = getpagecontent ("http://www.oschina.net/") Print Content.decode (' Utf-8 '). Encode (' gb2312 ') #!/usr/bin/env python#coding=utf-8#python version:2.7.4#system:windows xpimport httplib2def Getpagecontent (URL): " use HTTPLIB2 programmatically to get Web content from a URL to convert the contents of a bytes form into a utf-8 string " ' # Using IE9 's user-agent, if not set user-agent will get 403 forbidden to access headers={' user-agent ': ' mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; trident/5.0) ', ' cache-control ': ' No-cache ' if URL: response,content = httplib2. Http (). Request (Url,headers=headers) if Response.Status = =: return content
Import sysreload (SYS) sys.setdefaultencoding (' Utf-8 ') #修改默认编码方式, default = Ascciprint sys.getdefaultencoding () content = Getpagecontent ("http://www.oschina.net/") Print Content.decode (' utf-8 '). Encode (' gb2312 ')
The above code means: request his homepage to the Www.oschina.net website, (if it is utf-8 encoding, cannot output Chinese) want to encode the utf-8 to gd2312, there is a problem three
When I put it print content.decode (' utf-8 '). Encode (' gb2312 ') changed to print Content.decode (' Utf-8 '). When encode (' gb2312 ', ' ignore '), OK, can display Chinese, but not sure whether it is all, seemingly only part of it, some can not be encoded with gb2312
However, when I change the site to www.soso.com, do not switch to gb2312, with the utf-8 can display the normal Chinese
To summarize:
Outputting the SS directly to the file throws the same exception. When processing a Unicode Chinese string, you must first call the Encode function to convert it to another encoded output. This is the same for every environment. In Python, the "str" object is an array of bytes, and the contents are not a valid string, and the encoding (GBK, Utf-8, Unicode) of the string is unimportant. This content needs to be recorded and judged by the user. These restrictions also apply to "Unicode" objects. Remember that the content in the "Unicode" object is definitely not necessarily a valid Unicode string, and we'll see that in a few quick cases. On the console of Windows, GBK-encoded STR objects and Unicode-encoded Unicode objects are supported.