Python default Character Set
This article briefly introduces the character set history and configuration methods used for parsing Python programs.
Background: When writing a script program, it is inevitable that some variable content related to Chinese characters will be designed. In this case, it is a headache for a new Python beginner (including me) to configure python to correctly identify Chinese content in the program. This article briefly introduces how to configure the Python Character Set and some historical information.
Python default Character Set
The default Character Set of Python has been changed in several major versions. The following lists the default character sets of each version:
- Python2.1 and earlier: latin1
- Python2.3 and later, before Python2.5: latin1 (but WARNING is proposed for non-ASCII character sets)
- Python2.5 and later: ASCII
In addition, it is also proposed to adjust the default character set to UTF-8 in later versions in THE PEP
How to configure the default character set (before Python2.5)
It is difficult to configure the default character set used for parsing the current Python script file before 2.5. Because these old versions do not support coding configuration similar to shebang. Although the old versions earlier than 2.5 are out of date, we recommend that you configure character sets in these versions. The specific configuration principle is throughsys.setdefaultencoding()Function. But the Tangle is that this functionsite.py(A script that runs automatically when Python is started) is deleted. As a result, the following methods are available on the Internet:
- Reload (sys)
- Modify
sitecustomize.pyConfigure the global default Character Set
Both methods only work and are not elegant. For more detailed operation methods, refer to the discussion on stackoverflow.
How to configure the default character set (Python2.5 and later)
Python2.5The default character set configuration method will be much simpler in the future. As long as it is behind Shebang (that is#! /usr/bin/pythonThis line), followed by the character set configuration line of the previous line. The writing rules of Character Set Configuration lines must conform to such a regular expression.coding[:=]\s*([-\w.]+). That is to say, the following write methods can take effect:
#!/usr/bin/python# coding=utf8
Or
#!/usr/bin/python# -*- coding: utf8 -*-
Or
#!/usr/bin/python# vim: set fileencoding=<encoding name> :
All of these can work.
-------------------------------------- Split line --------------------------------------
Install Python3.4 on CentOS source code
Python core programming version 2. (Wesley J. Chun). [Chinese version of hd pdf]
Python development technology details. (Zhou Wei, Zong Jie). [hd PDF scan version + book guide video + code]
Obtain Linux information using a Python script
Build a desktop algorithm transaction research environment using Python in Ubuntu
A Brief History of Python Development
Python details: click here
Python: click here
This article permanently updates the link address: