Recently listen to Alex about Python coding, but also deliberately with a blog to explain, think the problem is serious, and then look at various blogs, the first simple to do a summary of coding errors, and other follow-up slowly, but also to work, but also to learn, but also to write a blog? I feel a little overwhelmed. The gods don't like to spray. I am Mac computer, terminal default encoding format Utf-8
Original address
python coding errors and workarounds
The string is the most commonly used data type in Python, and many times it uses characters that are not part of the ASCII character set, which is thrown unicodedecodeerror:ascii codec can ' t decode byte 0xc4 in position 10:oridinal not range (128) exception. This exception is easily encountered in Python, especially in python2.x.
The representation of a string inside Python is Unicode encoding, so, in encoding conversion, it is usually necessary to use Unicode as the intermediate encoding for transcoding, that is, decoding other encoded strings (decode) to Unicode, and then encoding from Unicode (encode) into a different encoding. However, the default encoding format in python2.x is ASCII, meaning that all characters in the source code are ASCII by default without specifying the Python source code format. Also because of this root cause, in the python2.x often encountered Unicodedecodeerror or unicodeencodeerror anomalies.
Unicode in order to be able to process Unicode data, while also compatible with some of Python's internal modules, python2.x provides Unicode as a data type, and the decode and encode methods allow you to convert other encodings and Unicode encodings to each other.
Python Common coding exceptions (almost all appear in python2.x)
Common coding exceptions in Python include: Syntaxerror:non-ascii character, Unicodedecodeerror, and Unicodeencodeerror.
1.syntaxerror:non-ascii character
This exception is not very common, but it is best to solve it. As long as there are no ASCII characters in the Python source file, and there is no code format for declaring the source code, for example
# In python2.x, no encoding format is specified at the head of the file ' turkey big Liar 'print s#syntaxerror:non-ascii Character ' \xe5 ' in file/xxx/xxx/exercise-unicode.py on line 2, but no encoding declared; See http://python.org/dev/peps/pep-0263/for details
WORKAROUND: Declare the encoding format at the head of the file #! -*-encoding:utf-8-*-or #! Encoding:utf-8
Non-ASCII characters in the Python source file cannot appear in the python2.x if the encoding format is not specified in the first line of the source file. This is due to the fact that the Python interpreter defaults to the source code as an ASCII encoded format
2.UnicodeDecodeError
This exception occurs when the Decode method is called, because Python converts characters from other encoded formats to Unicode encoding, but the encoding format of the characters themselves is inconsistent with the encoded format that the decode passed in, for example:
#Python2. x in
# !encoding:utf-8
' Turkey Big Liar ' = S.decode ('gbk')# exception # Unicodedecodeerror: ' GBK ' codec can ' t decode bytes in position 4-5: illegal multibyte sequence
The above code string string s default encoding format is "Utf-8" (the #!encoding:utf-8 declaration means that all the characters in the current. py file are utf-8 encoded), but when using decode conversion to Unicode encoding is the passed-in encoding format "GBK", so the Unicodedecodeerror exception is thrown at the time of conversion. There is also the case when encode:
# in python2.x # !-*-encoding:utf-8-*- ' turkey big Liar '= S.encode ('gbk') # output #unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe5 in position 0: Ordinal not in range (+)
Here is S is "Utf-8" code, directly using S.encode (' GBK '), is actually using the system default defaultencoding to decode, equivalent to
S.decode (defaultencoding). Encode ('gbk')
The actual encoding of S and defaultencoding (python2.x default is ASCII) are different.
3.UnicodeEncodeError
Incorrect decode and Encode methods can cause exceptions, such as when using the Decode method to convert a Unicode string
# !-*-encoding:utf-8-*- = u' turkey big Liar '= S.decode ('utf-8') # output #unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-5: Ordinal not in range
Since the string is converted to Unicode encoding in python2.x, it can be passed Unicode (' xxx '), u ' xxx ', ' xxx ', decode (' Utf-8 '), but this example decodes the Unicode encoded string and throws " Unicodeencodeerror "exception
Coding specifications in Python
1. Follow the PEP0263 principle and declare the encoding format (recommended)
The most basic solution to Python coding problems is presented in THE PEP 0263 defining Python source code encodings: To declare the encoding format in a Python source file, the most common declaration format is as follows:
# ! /usr/bin/python#-*-encoding:utf-8-*-
Indicates that the string encoding format for the current. py file is encoded according to "Utf-8", not the read file is read with "Utf-8" encoding
2. Use u ' Chinese ' instead of Chinese (Python 2.x)
S1 = ' Chinese '
S2 = u' Chinese '
There are two ways in which you can declare string variables in Python, the main one being the different encoding format, the encoding format of the other S1 is consistent with the encoding format of the Python header file declaration, and the encoding format of the S2 is Unicode. If you declare a string variable that contains non-ASCII characters, it is best to use S2 's declarative format so that you can operate directly on the string without the need to execute decode, avoiding an exception.
Note: There is no way to declare U ' xx ' in Python3.
3.Reset default encoding
The root cause of so many coding problems in Python is that the default encoding in Python 2x is ASCII, so you can modify the default encoding format in the following ways:
Import syssys.setdefaultencoding ('utf-8')
This method can solve some coding problems, but also introduce many other problems, not worth the candle, it is recommended not to use this method.
python-code this muddy water.