About encodings: ANSI, GBK, gb2312, UTF8, UTF16, UTF32, Unicode

Source: Internet
Author: User

Since the contact programming, has been to the coding knowledge smattering, always has not mastered the essence.
For example: the relationship between ANSI and GBK, what is the relationship between GBK and gb2312, what is the difference between ANSI and UTF8, what is the relationship between Unicode and UTF8, and ANSI, GBK, gb2312, UTF8 (with or without BOM), UTF16, UTF32, Unicode between the conversion and so on, the hearts of the doubt has not found a chance to solve, hoping to get a satisfactory answer in the Segmentfault.

If there's a book on this (JavaScript is the best, because it happens to be a problem in JavaScript), it's best!

Summary

Books
"Fonts and encodings" only find English version
Xilingpo in-depth analysis of Java Web Technology Insider Section 3.3

Article
Character-coded notes: Ascii,unicode and UTF-8
Talk about Unicode encoding
Unicode,gbk,utf-8 differences
Unicode character set and multi-byte character set relationships
Talk about Unicode encoding, briefly explain UCS, UTF, BMP, BOM and other nouns
Analysis of Notepad writing unicom two characters garbled reason 1
Analysis of Notepad writing unicom two characters garbled reason 2
JavaScript bit arithmetic of small code brainiac

Other:
Chinese-Japanese-Korean Character Unicode encoding table for font editing

Reply content:

Since the contact programming, has been to the coding knowledge smattering, always has not mastered the essence.
For example: the relationship between ANSI and GBK, what is the relationship between GBK and gb2312, what is the difference between ANSI and UTF8, what is the relationship between Unicode and UTF8, and ANSI, GBK, gb2312, UTF8 (with or without BOM), UTF16, UTF32, Unicode between the conversion and so on, the hearts of the doubt has not found a chance to solve, hoping to get a satisfactory answer in the Segmentfault.

If there's a book on this (JavaScript is the best, because it happens to be a problem in JavaScript), it's best!

Summary

Books
"Fonts and encodings" only find English version
Xilingpo in-depth analysis of Java Web Technology Insider Section 3.3

Article
Character-coded notes: Ascii,unicode and UTF-8
Talk about Unicode encoding
Unicode,gbk,utf-8 differences
Unicode character set and multi-byte character set relationships
Talk about Unicode encoding, briefly explain UCS, UTF, BMP, BOM and other nouns
Analysis of Notepad writing unicom two characters garbled reason 1
Analysis of Notepad writing unicom two characters garbled reason 2
JavaScript bit arithmetic of small code brainiac

Other:
Chinese-Japanese-Korean Character Unicode encoding table for font editing

ANSI is the standard set, the National Standards Association, covers a wide range, similar to the mainland's GB
Win under the ANSI is narrow, refers to the current system encoding, equivalent to the code page

GB2312 is a national standard, double-byte character set, but earlier in the year, with fewer characters included (including punctuation)
GBK is expanded on the basis of GB2312, using unused code bits to incorporate more Chinese characters and symbols, so in general it should be used with GBK instead of GB2312. Web page Use gb2312 no problem is because the Web page itself is only displayed, depending on the client's font, even if the characters beyond the gb2312 encoding range, as long as the font contains can be displayed, the current client's font is basically enough to display GBK, not only gb2312 so few characters, So the display is absolutely no problem. But when programming these two must divide clearly, because gb2312 the number of characters is few, easy to cause the transcoding error, therefore should use GBK

Unicode is a character set, in fact, is a code table, not specific coding, the specific encoding is UC2,UC4,UTF-7,UTF-8,UTF-16,UTF-32, etc., UC is fixed length, the number of bytes per character is the same, UTF is variable length, The characters vary depending on the number of code-segment bytes in Unicode
Unicode under Windows refers to Utf-16, which is a bit confusing concept
The concept of a BOM is to add a few specific bytes to the front of the file to facilitate the identification of a Unicode encoded text

The conversion of the encoding between the General Code table, this general if not research, with the existing control or interface can be, such as ICONV, etc.

In-depth analysis of Java Web Technology Insider Xilingpo wrote section 3.3
The book above is the clearest thing I've ever seen on the code interpretation. Look for it.

Baidu a bit, a lot of blog introduced this.

http://blog.csdn.net/garfield2005/article/details/7681299

Http://www.cnblogs.com/cy163/archive/2007/05/31/766886.html

...

Recommended by Ruan Yi Feng wrote an introductory article: http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.