Because of the special encoding of Chinese, leading to Python2 and Python3 use the process of various coding problems, if not clear of the correlation between the relationship, then this has been a big pit, not to be ignorant force or Meng force, so the current encounter situation thoroughly comb under Python2 and Python3 The relationship and the difference between the codes in order to make the memo.
First of all, there are several places involved in encoding format:
- Script character encoding: is often seen at the beginning of the script file
# -*- coding: utf-8 -*-
, if using Python2, no explicit declaration of the default use of ASCII format, Python3 default use utf-8 format;
- Interpreter character encoding: can be viewed through
sys.getdefaultencoding()
the function, Python2 default is ascii,python3 default use Utf-8;
- Script file storage encoding: is the Py script file itself on the physical media above the storage format, usually have ASCII, GBK, Utf-8 and other formats.
Let's use Python2.6 and Python3.4 to see what the actual effect is after we put the above code together in the script.
1. Default script file encoding + file storage using GBK
Script content:
import sysprint(sys.getdefaultencoding())print(‘中文‘)
The results of running with Python2.6 are as follows, prompting for GBK encoded characters \xd6
non-ASCII characters:
> python26 test_gbk.py File "test_gbk.py", line 4SyntaxError: Non-ASCII character ‘\xd6‘ in file test_gbk.py on line 4, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
The results of running with Python3.4 are as follows, prompting for GBK encoded characters \xd6
not utf-8 characters:
> python26 test_gbk.py File "test_gbk.py", line 4SyntaxError: Non-UTF-8 code starting with ‘\xd6‘ in file test_gbk.py on line 4, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
conclusion : The default GBK encoding Chinese, Python2 interpreter character encoding (ASCII) and Python3 interpreter character encoding (UTF-8) format are not recognized, because the ASCII encoding does not contain Chinese, and Utf-8 is 3 byte encoding, GBK is 2 words Section code, so it is not recognized.
2. script file encoding GBK + file storage using GBK
The script file is explicitly declared in the header of a just-in-the-GBK format:
#coding:gbkimport sysprint(sys.getdefaultencoding())print(‘中文‘)
Results using the Python2.6 run:
> python26 test_gbk.pyascii中文
Results using the Python3.4 run:
> python34 test_gbk.pyutf-8中文
Conclusion : The file is stored in the GBK format and explicitly declares that the script file encoding is gbk,python2 and Python3 can be handled normally.
3. script file encoding Utf-8 + file storage using GBK
The script file is explicitly declared in the header of a just-in-the-utf-8 format:
# -*- coding: utf-8 -*-import sysprint(sys.getdefaultencoding())print(‘中文‘)
The result of running with Python2.6 is normal:
> python26 test_gbk.pyascii中文
The results of running with Python3.4 are as follows, prompting for an exception when trying to decode characters using Utf-8 0xd6
:
> python34 test_gbk.pyFile "test_gbk.py", line 6SyntaxError: (unicode error) ‘utf-8‘ codec can‘t decode byte 0xd6 in position 0: invalid continuation byte
conclusion : When the file is stored in the GBK format and explicitly declares that the script file is encoded as Utf-8, the Python2 is output using GBK on the Windows platform, so the parsing is normal, and Python3 uses Utf-8 to resolve the exception.
4. Default script file encoding + file storage using Utf-8
Remove the previous script header declaration, and then use the Utf-8 format to store the file (note, can not be forced to modify the storage encoding on the basis of the file, cast will appear in Chinese garbled problem, it is recommended to create a new utf-8 format file, and then enter Chinese):
import sysprint(sys.getdefaultencoding())print(‘中文‘)
The results of running with Python2.6 are as follows, ASCII also does not recognize characters in utf-8 format \xe4
:
> python26 test.py File "test.py", line 4SyntaxError: Non-ASCII character ‘\xe4‘ in file test.py on line 4, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
The results of running with Python3.4 are normally recognized because Python3 uses UTF-8 encoding by default:
> python34 test.pyutf-8中文
conclusion : The default utf-8 encoded in Chinese, Python2 will be read by default ASCII, so it is not recognized, PYTHON3 can be recognized normally.
5. script file encoding GBK + file storage using Utf-8
The script header explicitly declares that the script file is encoded in GBK format and uses the UTF-8 format to store the file:
#coding:gbkimport sysprint(sys.getdefaultencoding())print(‘中文‘)
The results of running with Python2.6 are as follows, and using GBK does not read anything in the Utf-8 format at all:
> python26 test.pyFile "test.py", line 6SyntaxError: ‘gbk‘ codec can‘t decode bytes in position 9-10: illegal multibyte sequence
The result of running with Python3.4 is the same as the above error, but the hint is more straightforward:
> python34 test.pyFile "test.py", line 1SyntaxError: encoding problem: gbk
conclusion : The default utf-8 encoding Chinese, if explicitly specified using GBK read, Python2 and Python3 can not do.
6. script file encoding Utf-8 + file storage using Utf-8
The script header explicitly declares that the script file is encoded in UTF-8 format and uses the UTF-8 format to store the file:
# -*- coding: utf-8 -*-import sysprint(sys.getdefaultencoding())print(‘中文‘)
The result of running with Python2.6 is as follows, although the read is correct, but Python2 in Windows system will default to use GBK to decode Chinese, so the output is garbled:
> python26 test.pyascii涓枃
The result of running with Python3.4 is normal:
> python34 test.pyutf-8中文
Conclusion : Although the file storage encoding and script file encoding are utf-8, but on the Windows platform, Python2 will parse the Chinese by GBK, so it will output garbled, can be in the Chinese front plus u to solve u‘中文‘
, or explicitly use Utf-8 to do an Deco De
Summary of the results of the verification, you can get the following table:
results of Python3 and Python2 treatment under different combinations |
Python3 |
Python2 |
Default script file encoding + file storage using GBK |
SyntaxError, parsing errors |
SyntaxError, parsing errors |
Script file encoding GBK + file storage using GBK |
Normal output Chinese |
Normal output Chinese |
Script file encoding Utf-8 + file storage using GBK |
SyntaxError, parsing errors |
Normal output Chinese |
Default script file encoding + file storage Utf-8 |
Normal output Chinese |
SyntaxError, parsing errors |
Script file encoding GBK + file storage using Utf-8 |
SyntaxError, parsing errors |
SyntaxError, coding Error |
Script file encoding Utf-8 + file storage using Utf-8 |
Normal output Chinese |
Chinese output garbled |
Concluding conclusions:
- If you use Python2, be sure to use the GBK format to store files;
- If you use Python2 to use GBK to store files and explicitly declare script files encoded as GBK, convenient for subsequent compatibility Python3;
- If you use Python3 to store files regardless of the format, be sure to explicitly declare script file encoding and storage format consistent;
- Whether using Python2 or Python3, the good habit of explicitly declaring script file encoding is maintained;
- If the script has cross-platform requirements, it is recommended to use Python3 + script file encoding Utf-8 + utf-8 format to store a combination of files;
Get a thorough understanding of Python coding