Reference: http://www.cnblogs.com/lizhenghn/p/3690406.html
GB 2312 May 1981 release
1, GB: GB Pinyin first letter;
2, the total income of 6,763 characters and non-Chinese characters graphic characters 682;
GBK released December 1995
1, GBK: "GB", "expand" Pinyin first letter;
2, GBK down and GB 2312 encoding compatible, up support ISO 10646.1 International standard (ISO 10646.1 equivalent to GB 13000.1);
3, a total of 21,003 Chinese characters, including all the characters in the GB2312, including all the Chinese characters in the BIG5 code;
GB18030 gb18030-2000, published in 2000
Gb18030-2005, published on 2005
1, backward-compatible GBK and GB2312 standards;
2, GB18030 encoding is 124 byte variable length encoding;
3, the income of more than 70,000 Chinese characters, support Tibetan, Mongolian, Dai, Yi, North Korea, Uyghur text;
Unicode 1994 Release
- Unicode is a character encoding scheme developed by international organizations that can accommodate all the words and symbols in the world;
UTF-8 Ken Thompson was created in 1992
1, Unicode transformation format-8bit;
2, is the most widely used in the Internet, a Unicode implementation mode;
3, UTF-8 One of the biggest features, is that it is a variable length encoding method. It can save storage space by using 1~4 bytes to represent a symbol, varying the byte length according to different symbols;
4, English use 8 bits (that is, one byte), Chinese use 24 for (three bytes) to encode;
- ASCII is used to denote English characters, which are represented by 7 bits, and can represent 128 characters, and their expansion uses 8-bit representations, representing 256 characters;
- GB2312 Simplified Chinese encoding format, only supports 6,763 commonly used Chinese characters;
- GBK is GB2312 based on the expansion of compatible GB2312 standards, including all Chinese characters, support Simplified Chinese and Traditional Chinese;
- GBK is more common than UTF8, but UTF8 occupies a larger database than GBK.
- GB2312, GBK to GB18030 are two-byte character set (DBCS);
- From ASCII, GB2312, GBK to GB18030, these coding methods are backwards compatible, meaning that the same character always has the same encoding in these scenarios, and the latter standard supports more characters. In these codes, English and Chinese can be handled in a unified manner. The method of distinguishing Chinese encoding is that the highest bit of high byte is not 0;
Introduction to character encoding