The Python2 string contains str, Unicode two types, and Str's string encoding is determined by the encoding of the source file, which is currently used primarily in the UTF-8 encoding format, so you want to specify the encoding format in the header of the py file:
#-*-Coding:utf-8-*-
Within a Python program, a string literal is a Unicode encoding, and the string character is a memory-encoded format that converts a Unicode-encoded string to a storage-encoding format for a particular character set if the data is stored in a file or log. For example: UTF-8, GBK, and so on, many times Python programmers will encounter garbled problems, I believe that the following some of the processing methods and skills can help you solve garbled problems.
Unicode to Utf-8 conversion method: Unicodestr.encode (' utf-8 '), example:
>>> u ' Chinese test ' encode (' utf-8 ')
' \xe4\xb8\xad\xe6\x96\x87\xe6\xb5\x8b\xe8\xaf\x95 '
Utf-8 to Unicode conversion method: Utf8str.decode (' utf-8 '), example:
>>> ' Chinese test '. Decode (' Utf-8 ')
U ' \u4e2d\u6587\u6d4b\u8bd5 '
In fact, in the process of storing Unicode in text, there is also a way of storing it, instead of converting Unicode to the actual text storage character set, but storing Unicode memory encoding values, and then inverting them back when reading files. is to adopt: Unicode-escape conversion Mode.
Unicode to Unicode-escape conversion method: Unicodestr.encode (' Unicode-escape '), example:
>>> u ' Chinese test ' encode (' Unicode-escape ')
' \\u4e2d\\u6587\\u6d4b\\u8bd5 '
Unicode-escape to Unicode conversion method: Unicodeescapestr.decode (' Unicode-escape '), example:
>>> ' \\u4e2d\\u6587\\u6d4b\\u8bd5 ' decode (' Unicode-escape ')
U ' \u4e2d\u6587\u6d4b\u8bd5 '
For UTF-8 encoded strings, when stored, they are usually stored directly, and there is actually a way of storing utf-8 encoded values, namely: String-escape.
Utf-8 to String-escape conversion method: Utf8str.encode (' String-escape '), example:
>>> ' Chinese test '. Encode (' String-escape ')
' \\xe4\\xb8\\xad\\xe6\\x96\\x87\\xe6\\xb5\\x8b\\xe8\\xaf\\x95 '
>>> print ' \\xe4\\xb8\\xad\\xe6\\x96\\x87\\xe6\\xb5\\x8b\\xe8\\xaf\\x95 '
\xe4\xb8\xad\xe6\x96\x87\xe6\xb5\x8b\xe8\xaf\x95
String-escape to Utf-8 conversion method: Strescapestr.decode (' String-escape '), example:
>>> ' \\xe4\\xb8\\xad\\xe6\\x96\\x87\\xe6\\xb5\\x8b\\xe8\\xaf\\x95 ' decode (' String-escape ')
' \xe4\xb8\xad\xe6\x96\x87\xe6\xb5\x8b\xe8\xaf\x95 '
>>> print ' \xe4\xb8\xad\xe6\x96\x87\xe6\xb5\x8b\xe8\xaf\x95 '
Chinese test
Note:
Unicode also supports encoding conversions including:
Idna
Raw_unicode_escape
Utf_8_sig
Utf-8, GBK and other encoded strings also support coding conversions including:
Base64
Quopri
bz2
Hex
Unicode_internal
Uu
Zlib