The problem of Chinese coding is always a headache in Python programming, this paper summarizes it in detail. Specifically as follows:
When the string is: ' \U4E2D\U56FD '
>>>s=[' \u4e2d\u56fd ', ' \u6e05\u534e\u5927\u5b66 ']
>>>str=s[0].decode (' Unicode_escape ') # Encode ("Euc_kr")
>>>print str
China
When the string is: ' An East Asian school group '
>>>print UNICHR (19996)
East
Ord () supports Unicode, which can display Unicode numbers for specific characters, such as:
Unicode strings are generated as long as they are connected to Unicode. Such as:
>>> "Help" Help "
>>>" help, ' + u ' python '
u ' help,python '
For ASCII (7-bit)-compatible strings, the built-in str () function converts the Unicode string to an ASCII string. Such as:
>>> str (U ' Hello World ")
' Hello World '
The understanding of several concepts:
ASCII codes correspond to characters in data words as shown in the following figure:
And Chinese is the location code corresponding Chinese characters. For example: "Good" ASCII code: 22909
Unicode encoding each country is divided into one piece. It has UTF-8, UTF-16, UTF-32 and other forms.
Chinese range 4E00-9FBF: There are gbk,gb2312 in this range,
Utf-8 is an international situation based on Unicode that is suitable for use
GB2312 and gb2312 are all GB codes. Earlier mainly used for encoding and decoding common Chinese characters
I hope this article will help you with your Python programming.