Get a thorough understanding of Python coding

Source: Internet
Author: User

Because of the special encoding of Chinese, leading to Python2 and Python3 use the process of various coding problems, if not clear of the correlation between the relationship, then this has been a big pit, not to be ignorant force or Meng force, so the current encounter situation thoroughly comb under Python2 and Python3 The relationship and the difference between the codes in order to make the memo.

First of all, there are several places involved in encoding format:

    1. Script character encoding: is often seen at the beginning of the script file # -*- coding: utf-8 -*- , if using Python2, no explicit declaration of the default use of ASCII format, Python3 default use utf-8 format;
    2. Interpreter character encoding: can be viewed through sys.getdefaultencoding() the function, Python2 default is ascii,python3 default use Utf-8;
    3. Script file storage encoding: is the Py script file itself on the physical media above the storage format, usually have ASCII, GBK, Utf-8 and other formats.

Let's use Python2.6 and Python3.4 to see what the actual effect is after we put the above code together in the script.

1. Default script file encoding + file storage using GBK

Script content:

import sysprint(sys.getdefaultencoding())print(‘中文‘)

The results of running with Python2.6 are as follows, prompting for GBK encoded characters \xd6 non-ASCII characters:

> python26 test_gbk.py  File "test_gbk.py", line 4SyntaxError: Non-ASCII character ‘\xd6‘ in file test_gbk.py on line 4, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

The results of running with Python3.4 are as follows, prompting for GBK encoded characters \xd6 not utf-8 characters:

> python26 test_gbk.py  File "test_gbk.py", line 4SyntaxError: Non-UTF-8 code starting with ‘\xd6‘ in file test_gbk.py on line 4, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

conclusion : The default GBK encoding Chinese, Python2 interpreter character encoding (ASCII) and Python3 interpreter character encoding (UTF-8) format are not recognized, because the ASCII encoding does not contain Chinese, and Utf-8 is 3 byte encoding, GBK is 2 words Section code, so it is not recognized.

2. script file encoding GBK + file storage using GBK

The script file is explicitly declared in the header of a just-in-the-GBK format:

#coding:gbkimport sysprint(sys.getdefaultencoding())print(‘中文‘)

Results using the Python2.6 run:

> python26 test_gbk.pyascii中文

Results using the Python3.4 run:

> python34 test_gbk.pyutf-8中文

Conclusion : The file is stored in the GBK format and explicitly declares that the script file encoding is gbk,python2 and Python3 can be handled normally.

3. script file encoding Utf-8 + file storage using GBK

The script file is explicitly declared in the header of a just-in-the-utf-8 format:

# -*- coding: utf-8 -*-import sysprint(sys.getdefaultencoding())print(‘中文‘)

The result of running with Python2.6 is normal:

> python26 test_gbk.pyascii中文

The results of running with Python3.4 are as follows, prompting for an exception when trying to decode characters using Utf-8 0xd6 :

> python34 test_gbk.pyFile "test_gbk.py", line 6SyntaxError: (unicode error) ‘utf-8‘ codec can‘t decode byte 0xd6 in position 0: invalid continuation byte

conclusion : When the file is stored in the GBK format and explicitly declares that the script file is encoded as Utf-8, the Python2 is output using GBK on the Windows platform, so the parsing is normal, and Python3 uses Utf-8 to resolve the exception.

4. Default script file encoding + file storage using Utf-8

Remove the previous script header declaration, and then use the Utf-8 format to store the file (note, can not be forced to modify the storage encoding on the basis of the file, cast will appear in Chinese garbled problem, it is recommended to create a new utf-8 format file, and then enter Chinese):

import sysprint(sys.getdefaultencoding())print(‘中文‘)

The results of running with Python2.6 are as follows, ASCII also does not recognize characters in utf-8 format \xe4 :

> python26 test.py  File "test.py", line 4SyntaxError: Non-ASCII character ‘\xe4‘ in file test.py on line 4, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

The results of running with Python3.4 are normally recognized because Python3 uses UTF-8 encoding by default:

> python34 test.pyutf-8中文

conclusion : The default utf-8 encoded in Chinese, Python2 will be read by default ASCII, so it is not recognized, PYTHON3 can be recognized normally.

5. script file encoding GBK + file storage using Utf-8

The script header explicitly declares that the script file is encoded in GBK format and uses the UTF-8 format to store the file:

#coding:gbkimport sysprint(sys.getdefaultencoding())print(‘中文‘)

The results of running with Python2.6 are as follows, and using GBK does not read anything in the Utf-8 format at all:

> python26 test.pyFile "test.py", line 6SyntaxError: ‘gbk‘ codec can‘t decode bytes in position 9-10: illegal multibyte sequence

The result of running with Python3.4 is the same as the above error, but the hint is more straightforward:

> python34 test.pyFile "test.py", line 1SyntaxError: encoding problem: gbk

conclusion : The default utf-8 encoding Chinese, if explicitly specified using GBK read, Python2 and Python3 can not do.

6. script file encoding Utf-8 + file storage using Utf-8

The script header explicitly declares that the script file is encoded in UTF-8 format and uses the UTF-8 format to store the file:

# -*- coding: utf-8 -*-import sysprint(sys.getdefaultencoding())print(‘中文‘)

The result of running with Python2.6 is as follows, although the read is correct, but Python2 in Windows system will default to use GBK to decode Chinese, so the output is garbled:

> python26 test.pyascii涓枃

The result of running with Python3.4 is normal:

> python34 test.pyutf-8中文

Conclusion : Although the file storage encoding and script file encoding are utf-8, but on the Windows platform, Python2 will parse the Chinese by GBK, so it will output garbled, can be in the Chinese front plus u to solve u‘中文‘ , or explicitly use Utf-8 to do an Deco De

Summary of the results of the verification, you can get the following table:

results of Python3 and Python2 treatment under different combinations Python3 Python2
Default script file encoding + file storage using GBK SyntaxError, parsing errors SyntaxError, parsing errors
Script file encoding GBK + file storage using GBK Normal output Chinese Normal output Chinese
Script file encoding Utf-8 + file storage using GBK SyntaxError, parsing errors Normal output Chinese
Default script file encoding + file storage Utf-8 Normal output Chinese SyntaxError, parsing errors
Script file encoding GBK + file storage using Utf-8 SyntaxError, parsing errors SyntaxError, coding Error
Script file encoding Utf-8 + file storage using Utf-8 Normal output Chinese Chinese output garbled
Concluding conclusions:
    1. If you use Python2, be sure to use the GBK format to store files;
    2. If you use Python2 to use GBK to store files and explicitly declare script files encoded as GBK, convenient for subsequent compatibility Python3;
    3. If you use Python3 to store files regardless of the format, be sure to explicitly declare script file encoding and storage format consistent;
    4. Whether using Python2 or Python3, the good habit of explicitly declaring script file encoding is maintained;
    5. If the script has cross-platform requirements, it is recommended to use Python3 + script file encoding Utf-8 + utf-8 format to store a combination of files;

Get a thorough understanding of Python coding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.