Introduction to character encoding

Source: Internet
Author: User

Reference: http://www.cnblogs.com/lizhenghn/p/3690406.html

GB 2312 May 1981 release

1, GB: GB Pinyin first letter;

2, the total income of 6,763 characters and non-Chinese characters graphic characters 682;

GBK released December 1995

1, GBK: "GB", "expand" Pinyin first letter;

2, GBK down and GB 2312 encoding compatible, up support ISO 10646.1 International standard (ISO 10646.1 equivalent to GB 13000.1);

3, a total of 21,003 Chinese characters, including all the characters in the GB2312, including all the Chinese characters in the BIG5 code;

GB18030 gb18030-2000, published in 2000

Gb18030-2005, published on 2005

1, backward-compatible GBK and GB2312 standards;

2, GB18030 encoding is 124 byte variable length encoding;

3, the income of more than 70,000 Chinese characters, support Tibetan, Mongolian, Dai, Yi, North Korea, Uyghur text;

Unicode 1994 Release

    1. Unicode is a character encoding scheme developed by international organizations that can accommodate all the words and symbols in the world;

UTF-8 Ken Thompson was created in 1992

1, Unicode transformation format-8bit;

2, is the most widely used in the Internet, a Unicode implementation mode;

3, UTF-8 One of the biggest features, is that it is a variable length encoding method. It can save storage space by using 1~4 bytes to represent a symbol, varying the byte length according to different symbols;

4, English use 8 bits (that is, one byte), Chinese use 24 for (three bytes) to encode;

    • Summarize
    1. ASCII is used to denote English characters, which are represented by 7 bits, and can represent 128 characters, and their expansion uses 8-bit representations, representing 256 characters;
    2. GB2312 Simplified Chinese encoding format, only supports 6,763 commonly used Chinese characters;
    3. GBK is GB2312 based on the expansion of compatible GB2312 standards, including all Chinese characters, support Simplified Chinese and Traditional Chinese;
    4. GBK is more common than UTF8, but UTF8 occupies a larger database than GBK.
    5. GB2312, GBK to GB18030 are two-byte character set (DBCS);
    6. From ASCII, GB2312, GBK to GB18030, these coding methods are backwards compatible, meaning that the same character always has the same encoding in these scenarios, and the latter standard supports more characters. In these codes, English and Chinese can be handled in a unified manner. The method of distinguishing Chinese encoding is that the highest bit of high byte is not 0;

Introduction to character encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.