ASCII is a character set, including uppercase and lowercase English letters, numbers, and control characters. It is represented in one byte and ranges from 0 to 127.
Because ASCII characters are very limited, each country or region puts forward its own character set on this basis. For example, gb2312, which is widely used in China, provides encoding for Chinese characters, it is expressed in two bytes.
These character sets are incompatible with each other. The same number may indicate different characters, which makes information exchange troublesome.
Unicode is a character set that maps all characters in the world into a unique number (Code Point), such as the number 0x0041 corresponding to letter. Unicode is still in development, and more characters are supported.
A certain encoding method, such as a UCS-2, is also required to store characters represented by Unicode, which uses two bytes to represent Unicode-encoded characters. While UTF-8 is another encoding method of the Unicode character set, it is a variable length, up to 6 bytes, less than 127 characters are represented in one byte, the same as the results of the ASCII character set, therefore, it has a very good compatibility. The English text in ASCII encoding can be processed as a UTF-8 without modification. It is widely used.
Python supports Unicode from 2.2. the decode (char_set) function can convert other encodings to Unicode. The function encode (char_set) can convert Unicode to other encoding methods, the Unicode string here refers to the code points encoded by a UCS-2 or UCS-4.
For example, ("hello"). Decode ("gb2312") will get U' \ u4f60 \ u597d ', that is, the Unicode codes "you" and "good" are 0x4f60 and 0x597d respectively.
Reuse (U' \ u4f60 \ u597d '). encode ("UTF-8") will get '\ xe4 \ xbd \ xa0 \ xe5 \ xa5 \ xbd', which is the result of "hello" UTF-8 encoding.
References:
The absolute minimum every software developer absolutely, positively must know about Unicode and character sets (no excuses !) (Joel Spolsky)
Unicode for programmers (des Unicode in Python) (Jason orendorff)
Python Unicode objects (Fredrik lundh)
Python Unicode tutorial (reportlab)
End to end Unicode Web applications in Python (Martin doudoroff)
Unicode in Python (Thijs van der Vossen)
Unicode official website http://www.unicode.org/
Unicode description
Gb2312 Character Set
Introduction to UCs and UTF