Access to the Information link
Python code why is it so painful?
python2.7 Manual str function
Python source file default encoding with internal default encoding
1. The source file is encoded as ASCII by default, so if you do not show what encoding the current code is written with, Python will parse it with ASCII, and if there is UTF-8 encoding in the source file, it will be an error because ASCII cannot translate UTF8 encoding.
# file test.py save a='a'b=' good ' using UTF8
After running
' \xe5 ' in file test.py on line 2, but no encoding declared; see http: for details
The above error says character ' \xe5 ' is non-ASCII, because ' \xe5 ' is part of the ' Good ' UTF8 byte string
As follows
Print ' good ' >>> a=' good ' >>> A in the terminal encoded as UTF8 '\xe5\xa5\xbd
So, the current source file is what encoding editing, be sure to declare it. For example,
# Coding=utf-8 #先在这里声明 # here is the program code
2. One thing to note is that when editing code in the command-line mode of a terminal, there is no need to declare the encoding used for the current code, I guess because Python reads the encoding of the current system directly in command-line mode
For example, under Windows CMD, look at the word ' good '
Print ' good '>>> a=' good ' >>> A on the terminal (CMD) encoded as GBK '\xba\xc3'
You can see that in cmd, ' good ' is two bytes, different from the three bytes on the
3. Internal default encoding is ASCII, which requires attention when using some functions, such as STR and Unicode
Sometimes, run a script, for example, we usually save the script with UTF8 encoding, if run on Windows cmd, print Chinese will be garbled, because the default code of CMD is GBK, it explains UTF8 byte string, it will naturally mess up.
So how to let the terminal printing information, regardless of the terminal encoding is not garbled?
1. Adjust terminal default encoding
2. Let the script cater to the taste of the terminal, or plan A: The script is saved as GBK, or Plan B: in need of terminal display place to turn the code, I said B plan
# coding=utf-8 import Sysa = " good " # This file is saved as UTF 8 encoding, if you want to display normally on CMD, you need to switch to GBK, Aunicode = A.decode ( " utf-8 ") # First decoding to Unicode, decoding the time to tell Python,a is a UTF8 byte string, do not think that is the ASCII byte string agbk=aunicode.encode ( " GBK ") # Unicdode encoded A is re-encoded as GBK print AGBK
There is an episode above, I directly with A.encode (' GBK ') line? This is not possible because the encoding (encode) is for Unicode and must be Unicode encoded if abruptly uses something like ' I '. Encode (' GBK '), it will be an error
As follows
>>> a= " good " >>> A.encode ( " GBK " ) Traceback (most recent call last): File
" <STDIN> ", line 1, in <module>unicodedecodeerror: " ASCII " codec Can" t decode byte 0xba in position 0:ordinal not in range (128 >>>
See, Python crawled out of Unicode decoding exception, Python also said ' ASCII ' how, why?
Because Python only encode the Unicode string, and if it is not encode the byte string, it decode the byte string, which is
A.encode ('GBK') = = ( a.decode (' default encoding '). Encode (' gbk') )
The above Unicode decoding exception, is also in the decoding of the time thrown, Python think A is the default encoding ASCII encoding, can be a UTF8 encoding ah, ' good ' in ASCII does not exist, so will error
3. I don't care what your terminal code, terminal you have to give me normal display.
Then use the most direct Unicode encoding and let Python print it out according to the system's current encoding.
# Coding=utf-8 Print u' me '
When printing a Unicode string, the Unicode corresponding system-encoded character is printed, so it is not garbled.
Derivative of a small problem, I just want to see what a variable Unicode is like, then use the Reper function (return the string form of an object)
>>> a=u ' good '>>> au'\u597d' Print repr (a) U ' \u597d '
Say something about the STR function
The STR function, which returns the rendering of an object in the form of a string (in my understanding, is the rendering that one can see),
For different objects, STR has different methods of operation, for example, for string type, it returns
For a function type, the STR function returns the position of the function in memory in the form of a string
When using the STR function with the string type, note that if it is a Unicode type string, be aware of the current default encoding
As below, I am in cmd, encoded as Gbk,python default encoding for ASCII
>>> a=u " good " >>> str (a) Traceback (most recent call last): File " <STDIN> " , line 1, in <module> Unicodeencodeerror: " ascii " codec Can" t Encode character u " \u597d" in position 0 : Ordinal not in range (128 >>>
When using the STR function with Unicode above, this conversion involves the default encoding, which first makes such conversions: Unicodestr.encode (defaultencoding).
If defaultencoding is not coding the code itself, it throws an exception.
So, to set the defaultencoding, the following
>>> import sys >>> reload (SYS) <module " sys " (Built-in ) >>>> sys.setdefaultencoding ( ' GBK ") # Specifies that the default encoding is GBK >>> a=u " Span style= "COLOR: #800000" > " >>> str (a) # There's no error here. " \xc0\xb2 " >>> str (a) ==a True
Python coding problem, from dull ache to removing root cause