ANSI, utf8, Unicode, ASCII Encoding
1. ASCII and ANSI Encoding
Character inner code(Charcter code) refers to the internal code used to represent characters. Readers must use the internal code when entering and storing documents. The internal code is divided
Single-byte internal code-- Single-byte character sets (sbcs), which can be 256 characters encoded.
Double Byte internal code-- Double-byte character sets) (DBCS), which supports 65000 character encoding.
The former is ASCII encoding, and the latter corresponds to ANSI.
As for the simplified Chinese code gb2312, it is actually an ANSI code page 936
2. Unicode
As shown above, ANSI has many code pages, and internal codes of different code pages cannot be normally displayed in other codes. This is why Japanese/Traditional Chinese games cannot be directly displayed on the simplified Chinese platform.
Unicode is also a character encoding method, but it is designed by international organizations and can accommodate all languages and texts in the world. it is a 2-byte encoding that can provide 65536 characters. This number is not enough to indicate all characters (there are more than 55000 characters in Chinese). Therefore, an additional 917,476 character representation is implemented through a proxy pair to ensure that all characters are uniquely encoded.3. Unicode and bigendianunicode
The two are stored in different order, for example, the Unicode encoding of "a" is 65 00.
Its bigendianunicode encoding is 00 65
4. UTF-8 this is the encoding designed for transmission, its series also have UTF-7 and UTF-16
The UTF-16 and Unicode encoding are roughly the same, and the UTF-8 is encoded in 8 bits. The encoding method from Unicode to UTF-8 is as follows:
Unicode encoding (HEX) UTF-8 byte stream (Binary)
0000-007f 0 xxxxxxx
0080-07ff 110 XXXXX 10 xxxxxx
0800-FFFF 1110 XXXX 10 xxxxxx 10 xxxxxx
For example, the Unicode code of the Chinese character is 6c49. 6c49 is between 0800-ffff, so it must use a 3-byte template: 1110 XXXX 10 xxxxxx 10 xxxxxx. Write 6c49 as binary: 0110 110001 001001. Use this bit stream to replace X in the template. The result is 11100110 10110001 10001001, that is, E6 B1 89.