Conversion of character encoding in python3, python3 character encoding
A = 'I' #### python3 uses unicode encoding by default.
### Unicode> gb2312
Unicode_gb2312 = a. encode ('gb2312') ### because the default value is unicode, decode () is not required. encode is used to convert the code to gb2312.
Print ('My gb2312', unicode_gb2312) ### returned result: My gb2312 B '\ xce \ xd2 \ xba \ xdc \ xba \ xc3'
### Gb2312> utf8
Gb2312_utf8 = unicode_gb2312.decode ('gb2312 '). encode ('utf-8') # The current character is gb2312. Therefore, decode to unicode (the input parameter in decode is the encoding set of the current character) and encode to UTF-8.
Print ('I am UTF-8', gb2312_utf8) ### returned result: I am UTF-8 B '\ xe6 \ x88 \ x91 \ xe5 \ xbe \ x88 \ xe5 \ xa5 \ xbd'
### Utf8> gbk
Utf8_gbk = gb2312_utf8.decode ('utf-8'). encode ('gbk') # to convert the current character set to UTF-8, decode it into a unicode Character Set and then encode it into a gbk character set
Print ("I am gbk", utf8_gbk) ### returned result: I am gbk B '\ xce \ xd2 \ xba \ xdc \ xba \ xc3'
### Utf8> uicode
Utf8_unicode = utf8_gbk.decode ('gbk') #### note that encode () is not required for unicode conversion ()
Print ('I am unicode', utf8_unicode) ### returned result: I am a unicode and I am fine
### Unicode> gb18030
Unicode_gb18030 = utf8_unicode.encode ('gb18030 ')
Print ('I am gb18030', unicode_gb18030) ### returned result: I am gb18030 B' \ xce \ xd2 \ xba \ xdc \ xba \ xc3'
### Summary each encoding must be converted to unicode first and then converted to the desired encoding through unicode
# From the above we can see that the results returned by gb2312, gbk, and gb18030 are all the same. It should be because these three are all Chinese codes, so they are all backward compatible with each other.
# The first encoding in China is gb2312, then gb18030, and then gbk, the number of characters they support also increases with the order of nearly 7000 characters from the first 30 thousand to the present