About encodings: ANSI, GBK, gb2312, UTF8, UTF16, UTF32, Unicode

Last Update:2016-06-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Since the contact programming, has been to the coding knowledge smattering, always has not mastered the essence.
For example: the relationship between ANSI and GBK, what is the relationship between GBK and gb2312, what is the difference between ANSI and UTF8, what is the relationship between Unicode and UTF8, and ANSI, GBK, gb2312, UTF8 (with or without BOM), UTF16, UTF32, Unicode between the conversion and so on, the hearts of the doubt has not found a chance to solve, hoping to get a satisfactory answer in the Segmentfault.

If there's a book on this (JavaScript is the best, because it happens to be a problem in JavaScript), it's best!

Summary

Books
"Fonts and encodings" only find English version
Xilingpo in-depth analysis of Java Web Technology Insider Section 3.3

Article
Character-coded notes: Ascii,unicode and UTF-8
Talk about Unicode encoding
Unicode,gbk,utf-8 differences
Unicode character set and multi-byte character set relationships
Talk about Unicode encoding, briefly explain UCS, UTF, BMP, BOM and other nouns
Analysis of Notepad writing unicom two characters garbled reason 1
Analysis of Notepad writing unicom two characters garbled reason 2
JavaScript bit arithmetic of small code brainiac

Other:
Chinese-Japanese-Korean Character Unicode encoding table for font editing

Reply content:

If there's a book on this (JavaScript is the best, because it happens to be a problem in JavaScript), it's best!

Summary

Books
"Fonts and encodings" only find English version
Xilingpo in-depth analysis of Java Web Technology Insider Section 3.3

Article
Character-coded notes: Ascii,unicode and UTF-8
Talk about Unicode encoding
Unicode,gbk,utf-8 differences
Unicode character set and multi-byte character set relationships
Talk about Unicode encoding, briefly explain UCS, UTF, BMP, BOM and other nouns
Analysis of Notepad writing unicom two characters garbled reason 1
Analysis of Notepad writing unicom two characters garbled reason 2
JavaScript bit arithmetic of small code brainiac

Other:
Chinese-Japanese-Korean Character Unicode encoding table for font editing

ANSI is the standard set, the National Standards Association, covers a wide range, similar to the mainland's GB
Win under the ANSI is narrow, refers to the current system encoding, equivalent to the code page

GB2312 is a national standard, double-byte character set, but earlier in the year, with fewer characters included (including punctuation)
GBK is expanded on the basis of GB2312, using unused code bits to incorporate more Chinese characters and symbols, so in general it should be used with GBK instead of GB2312. Web page Use gb2312 no problem is because the Web page itself is only displayed, depending on the client's font, even if the characters beyond the gb2312 encoding range, as long as the font contains can be displayed, the current client's font is basically enough to display GBK, not only gb2312 so few characters, So the display is absolutely no problem. But when programming these two must divide clearly, because gb2312 the number of characters is few, easy to cause the transcoding error, therefore should use GBK

Unicode is a character set, in fact, is a code table, not specific coding, the specific encoding is UC2,UC4,UTF-7,UTF-8,UTF-16,UTF-32, etc., UC is fixed length, the number of bytes per character is the same, UTF is variable length, The characters vary depending on the number of code-segment bytes in Unicode
Unicode under Windows refers to Utf-16, which is a bit confusing concept
The concept of a BOM is to add a few specific bytes to the front of the file to facilitate the identification of a Unicode encoded text

The conversion of the encoding between the General Code table, this general if not research, with the existing control or interface can be, such as ICONV, etc.

In-depth analysis of Java Web Technology Insider Xilingpo wrote section 3.3
The book above is the clearest thing I've ever seen on the code interpretation. Look for it.

Baidu a bit, a lot of blog introduced this.

http://blog.csdn.net/garfield2005/article/details/7681299

Http://www.cnblogs.com/cy163/archive/2007/05/31/766886.html

...

Recommended by Ruan Yi Feng wrote an introductory article: http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

About encodings: ANSI, GBK, gb2312, UTF8, UTF16, UTF32, Unicode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

About encodings: ANSI, GBK, gb2312, UTF8, UTF16, UTF32, Unicode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support