Tag: Code default causes STR encoding without efault IMP decode
In Python, encoding and decoding is actually the conversion between different encoding systems, by default, the conversion target is Unicode, that is, encoding unicode→str, decoding str→unicode, where Str refers to the byte stream, While Str.decode is decoding the byte stream str in a given decoding mode and converting it into utf-8 form, U.encode is converting the Unicode class into a byte stream by the given encoding method Str. Note that calling the Encode method is a Unicode object that generates a byte stream, and the Decode method is called by the Str object (Byte stream), which produces a Unicode object. If the Str object calls encode will default to the system by default encoding decode into a Unicode object again encode, ignoring the middle default decode often lead to error.
For example, there is the following code:
#! /usr/bin/env python
#-*-Coding:utf-8-*-
s = ' Chinese characters ' # here str is of type STR, not Unicode
S.encode (' gb2312 ')
This code re-encodes s into the gb2312 format, which is the conversion of Unicode-Str. Because S is itself a str type,
Python automatically decodes s to Unicode first, and then encodes it into gb2312. Because decoding is done automatically by Python, and we do not specify the decoding method, Python uses the sys.defaultencoding to decode it in the way indicated. In many cases sys.defaultencoding is anscii, and if S is not the type it will go wrong.
Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe4 in position
0:ordinal not in range (128)
In this case, we have two methods to correct the error:
1. Clearly indicate the encoding of s
#! /usr/bin/env python
#-*-Coding:utf-8-*-
s = ' Chinese characters '
S.decode (' Utf-8 '). Encode (' gb2312 ')
2. Change the encoding of sys.defaultencoding to file
#! /usr/bin/env python
#-*-Coding:utf-8-*-
Import Sys
Reload (SYS) # Python2.5 removed the Sys.setdefaultencoding method after initialization, we need to reload
Sys.setdefaultencoding (' Utf-8 ')
str = ' Chinese characters '
Str.encode (' gb2312 ')
The role of sys.setdefaultencoding (' Utf-8 ') in Python