Simply record several points for later forgetting:
1, the default encoding method in Python is ASCII
In [1]: Import Sysin [2]: Sys.getdefaultencoding () out[2]: ' ASCII '
2. Set the default encoding in Python
In [1]: Import Sysin [2]: Reload (SYS) <module ' sys ' (built-in) >in [3]: sys.setdefaultencoding (' Utf-8 ') in [4]: sys.ge Tdefaultencoding () ' Utf-8 '
3. The encoding format set on the top of Python # _*_ Coding:utf-8 _*_ does not affect default Python's default encoding format
#! /usr/bin/env python# _*_ coding:utf-8 _*_import sysprint sys.getdefaultencoding ()
The result is an ASCII encoded format after execution
So what is the encoding format that Python has set up on the top of its head?
#1, this declaration is required if there is a Chinese comment in the code
#2, a more advanced editor (like my Emacs), will format this as a code file according to the header declaration
#3, the program will be decoded by the head declaration, the initialization U "Life is too short", such a Unicode object, (so the head Declaration and code storage format to be consistent)
The above ideas come from http://python.jobbole.com/81244/this article
Let's do a test:
#! /usr/bin/env python# _*_ coding:utf-8 _*_import sysprint sys.getdefaultencoding () #reload (SYS) #sys. setdefaultencoding (' Utf-8 ') # will be encoded as Unicodes1 = U "This is a Test 1" # will be encoded as Asciis2 = "This is a Test 2" s1.encode (' GBK ') s2.encode (' GBK ') print S1print s2
Above test results:
Asciitraceback (most recent): File "testunicoding.py", line, in <module> s2.encode (' GBK ') unicodedec Odeerror: ' ASCII ' codec can ' t decode byte 0xe8 in position 0:ordinal not in range (128)
Main s2 The default encoding format for this string is ASCII and cannot be decode to Unicode first. Something's wrong.
After changing the default encoding mode to Utf-8
#! /usr/bin/env python# _*_ coding:utf-8 _*_import sysprint sys.getdefaultencoding () reload (SYS) sys.setdefaultencoding (' Utf-8 ') print sys.getdefaultencoding () # will be encoded as Unicodes1 = U "This is a Test 1" # will be encoded as Asciis2 = "This is a Test 2" s1.encode (' GBK ') s2.encode (' GBK ') print S1print s2
Execution Result:
Asciiutf-8 This is a test 1 which is a test 2
What does Python do when it comes to translating a GBK encoding into UTF-8 encoding?
s= "Test" s.decode (' GBK '). Encode (' Utf-8 ')
Now that s decode into Unicode, encode from Unicode to Utf-8
Python for Chinese encoding decoding first off