Use UTF-8 encoding in source code
Add comments coding: UTF-8 or #-*-coding: UTF-8-*-before the code, as shown below:
_ Author _ = 'webcler' # coding: UTF-8 #-*-coding: UTF-8 -*-
Non-ASCII Encoding
The default encoding of Python is ascii. Therefore, if you cannot process other encodings, you must set the default encoding of python to the required encoding. The following two methods are used:
1. Specific Code Processing
Import sysreload (sys) sys. setdefaultencoding ('gb2312 ')
2. Global settings
Create a sitecustomize. py file in the Lib \ site-packages folder of Python (Sitecustomize. pyIs a special file, Python will try to load the file at startup, so all code will run the file), you can automatically set the code.
Import syssys. setdefaultencoding ('gb2312 ')
3. Check the current Encoding
Import syssys. getdefaultencoding ()
Character encoding judgment
Chardet can be used to detect the encoding of strings/files.
1. chardet Installation
You can use the easy_install tool to quickly install chardet. The command is as follows: easy_install.exe chardet
2. Use of chardet
Chardet can directly use the detect function to detect the encoding of the given character. The Return Value of the function is a dictionary with two elements. One is the credibility of the detection, and the other is the detected encoding.
Import urllibimport chardetrawdata = urllib. urlopen ('HTTP: // www.sina.com.cn /'). read () print chardet. detect (rawdata) # result: {'confidence ': 0.99, 'encoding': 'gb2312 '}