The encoding and decoding in Python is the conversion between Unicode and Str. Encoding is Unicode-STR, instead, decoding is str---Unicode. The rest of the problem is deciding when to encode or decode. The "code indication" at the beginning of the file, which is the #-*-coding:-*-this statement. The Python default script file is UTF-8 encoded and is corrected with a "coded indication" when there are characters in the file that are not UTF-8 encoded in the range. About Sys.defaultencoding, this is used when decoding does not explicitly indicate the decoding method. For example, I have the following code:
# # ' Chinese '# s.encode (' GB18030 '
This code re-encodes s into the GB18030 format, which is the conversion of Unicode-Str. Because s itself is the str type, Python will automatically decode s to Unicode first and then encode it into GB18030. Because decoding is done automatically by Python, and we do not specify the decoding method, Python uses the sys.defaultencoding to decode it in the way indicated. In many cases sys.defaultencoding is anscii, and if S is not the type it will go wrong. In the above case, my sys.defaultencoding is Anscii, and the encoding method of S and the file encoding method is consistent, is UTF8, so error:
' ASCII ' codec can' not in range (128)
In this case, we have two methods to correct the error:
One is to explicitly indicate the encoding of s
## ' Chinese ' s.decode (' Utf-8'). Encode ('gb18030' )
The second is to change the encoding of the sys.defaultencoding file.
# # import #sys.setdefaultencoding (' Utf-8 ' ' Chinese ' str.encode ('gb18030') )
Python Chinese coding problem