Python encoding -- Decode error-output not UTF-8
Decode error-output not UTF-8. If you use sublime text2 to run python2. You will know how big this pitfall is. The default value of python3 is Unicode and that of python2 is ascii. So I searched a lot of information and summarized it.
Analyze the role of each configuration. 1. #-*-coding: UTF-8 -*-
It is used to indicate the encoding used by the source file. Without this description, SyntaxError: Non-ASCII character is reported when the source file contains Chinese characters.
2. sys. setdefaultencoding ('utf-8 ')
Import sys
Reload (sys)
Sys. setdefaultencoding ('utf-8 ')
This is used to set the default encoding format-UTF-8. Because setdefaultencoding will be deleted during module loading, the system will be reloaded.
And then reset
3. "default_encoding": "UTF-8 ",
In Setting-Default of sublime preferences, "default_encoding": "UTF-8" is the encoding for configuring the sublime Environment
4. settings in Python. sublime-build
{
"Cmd": ["python", "-u", "$ file"],
"Path": "C:/Python27 ",
"File_regex": "^ [] * File \"(...*?) \ ", Line ([0-9] *)",
"Selector": "source. python ",
"Shell": "true ",
"Encoding": "UTF-8"
}
This setting is related to the python code running in sublime. Anyway, "encoding": "UTF-8" ensures that the entire encoding is UTF-8.
I believe many people have already configured and configured this with me, so there is no problem in Chinese decoding.
But it's time to come.
I found that the Chinese characters in the json string returned by zhihu request use Unicode conversion while crawling zhihu. Still appears
Decode error-output not UTF-8.
Decode error-output not UTF-8
import requestsimport ConfigParserimport sysreload(sys)sys.setdefaultencoding('utf-8')print sys.getdefaultencoding()print sys.stdin.encodingprint sys.stdout.encodingf='\u56fd\u5185\u6709\u54ea\u4e9b\u51b7\u95e8\u4f46\u6709\u7279\u8272\u7684\u65c5\u6e38\u5730\u70b9\uff1f' print(f.decode('unicode-escape'))
When I output the data like this, I find that there is a problem with decoding. To convert Unicode to Chinese, you can use decode ('unicode-escape ').
However, Decode error-output not UTF-8 is displayed.
What makes people suffer most is that sometimes it is normal to output Chinese-what are the cold but distinctive tourist sites in China? Decode error-output not UTF-8 error is reported.
Then, it runs normally directly in the dos window and the pydev environment of Eclipse.
Decode error-output not UTF-8 why?
Note: I wrote two more sentences.
print sys.stdin.encodingprint sys.stdout.encoding
The output in sublime text2 is None.
The UTF-8 UTF-8 output in Eclipse
In dos, CP936 CP936 is output.
These two sentences refer to the encoding of input and output. Therefore, the encoding of clt + B in sublime is unknown.
There is no good solution for the moment.
Unfortunately, I found that this write will not report this error again.
import requestsimport sysreload(sys)sys.setdefaultencoding('utf-8')print sys.getdefaultencoding()print sys.stdin.encodingprint sys.stdout.encodingf='\u56fd\u5185\u6709\u54ea\u4e9b\u51b7\u95e8\u4f46\u6709\u7279\u8272\u7684\u65c5\u6e38\u5730\u70b9\uff1f' print type(f.decode('unicode-escape'))f='\u56fd\u5185\u6709\u54ea\u4e9b\u51b7\u95e8\u4f46\u6709\u7279\u8272\u7684\u65c5\u6e38\u5730\u70b9\uff1f' print(f.decode('unicode-escape'))
The output type is 'unicode ', and no error is reported for subsequent decoding. I still don't understand why, but it's better to use sublime after I call some shortcut keys.
If you don't want this, you can change the IDE.