Character encoding
The characters in the computer are stored in a specific encoded form, from the earliest ASCII to later Unicode and UTF-8, in Python, the string str is also differentiated encoding, between the various encoded strings, there is a bridge between the Unicode type .
STR, Unicode
STR goes to Unicode and needs to be decoded, i.e. decode, whereas Unicode goes to STR and needs to be encoded, i.e. encode:
STR--(decode)--Unicode
Unicode--(encode)---Str
STR can also directly use the Encode method to transfer a coded STR to another encoding of STR, in fact, the surface of the direct transcoding or after decoding the first encoding process, that is to say:Str.encode () equivalent to Str.decode ( sys.defaultencoding). Encode (), sys.defaultencoding is the Python default encoding, which is generally ASCII encoded. Similarly, for Unicode types, there is also:Unicode.decode () equivalent to Unicode.encode (sys.defaultencoding). Decode ().
Example code:
1 #!/usr/bin/env python2 #-*-coding:utf-8-*-3 #@Time: 2017/7/17 22:094 #@Author: Dswang5 6 ImportSYS7 8 if __name__=='__main__':9 Printsys.getdefaultencoding ()Tenx ='Strict' One Printtype (x) A Printrepr (x) - -y = X.decode ('Utf-8') the Printtype (y) - Printrepr (y) - -z = Y.encode ('gb2312') + Printtype (z) - PrintRepr (z)
The result is:
ASCII ' Str '>'\xe4\xb8\xa5'Unicode'>u ' \u4e25 ' ' Str ' >'\xd1\xcf '
Note: The-*-coding:utf-8-*-in the file header note is used to indicate that the file is encoded in Utf-8, and the encoding of STR in the code is also default to Utf-8.
Reference articles
"1" character coded note: Ascii,unicode and UTF-8 by Ruan Yi Feng
Python character encoding and decoding UNICODE,STR