The problem of Chinese coding has always been a headache in Python program design, and this paper summarizes it in detail. Specific as follows:
When the string is: ' \U4E2D\U56FD '
>>>s=[' \u4e2d\u56fd ', ' \u6e05\u534e\u5927\u5b66 ']>>>str=s[0].decode (' Unicode_escape ') #.encode ("EUC_KR") >>>print str China
When the string is: ' East Asian Studies Group One '
>>>print UNICHR (19996) east
Ord () supports Unicode, which can display Unicode numbers for specific characters, such as:
>>>print Ord (' A ') 65
Unicode strings are generated as long as they are connected to Unicode. Such as:
>>> ' Help ' "Help" >>> ' help, ' + u ' python ' u ' help,python '
For ASCII (7-bit)-compatible strings, the Unicode string can be converted to an ASCII string with the built-in STR () function. Such as:
>>> str (U ' Hello World ') ' Hello World '
An understanding of several concepts:
The ASCII code is indicated by the corresponding word such as with the data word:
And Chinese is the location code corresponding to Chinese characters. For example: "Good" ASCII code: 22909
Unicode encoding is a piece of each country. It has UTF-8, UTF-16, UTF-32 and other forms
Chinese range 4E00-9FBF: There are gbk,gb2312 in this range,
Utf-8 is a Unicode-based international occasion for use
GB2312 and gb2312 are both of them. Earlier mainly used to encode and decode commonly used Chinese characters
Hopefully this article will help you with Python programming.