Since the contact programming, has been to the coding knowledge smattering, always has not mastered the essence.
For example: the relationship between ANSI and GBK, what is the relationship between GBK and gb2312, what is the difference between ANSI and UTF8, what is the relationship between Unicode and UTF8, and ANSI, GBK, gb2312, UTF8 (with or without BOM), UTF16, UTF32, Unicode between the conversion and so on, the hearts of the doubt has not found a chance to solve, hoping to get a satisfactory answer in the Segmentfault.
If there's a book on this (JavaScript is the best, because it happens to be a problem in JavaScript), it's best!
Summary
Books
"Fonts and encodings" only find English version
Xilingpo in-depth analysis of Java Web Technology Insider Section 3.3
Article
Character-coded notes: Ascii,unicode and UTF-8
Talk about Unicode encoding
Unicode,gbk,utf-8 differences
Unicode character set and multi-byte character set relationships
Talk about Unicode encoding, briefly explain UCS, UTF, BMP, BOM and other nouns
Analysis of Notepad writing unicom two characters garbled reason 1
Analysis of Notepad writing unicom two characters garbled reason 2
JavaScript bit arithmetic of small code brainiac
Other:
Chinese-Japanese-Korean Character Unicode encoding table for font editing
Reply content:
Since the contact programming, has been to the coding knowledge smattering, always has not mastered the essence.
For example: the relationship between ANSI and GBK, what is the relationship between GBK and gb2312, what is the difference between ANSI and UTF8, what is the relationship between Unicode and UTF8, and ANSI, GBK, gb2312, UTF8 (with or without BOM), UTF16, UTF32, Unicode between the conversion and so on, the hearts of the doubt has not found a chance to solve, hoping to get a satisfactory answer in the Segmentfault.
If there's a book on this (JavaScript is the best, because it happens to be a problem in JavaScript), it's best!
Summary
Books
"Fonts and encodings" only find English version
Xilingpo in-depth analysis of Java Web Technology Insider Section 3.3
Article
Character-coded notes: Ascii,unicode and UTF-8
Talk about Unicode encoding
Unicode,gbk,utf-8 differences
Unicode character set and multi-byte character set relationships
Talk about Unicode encoding, briefly explain UCS, UTF, BMP, BOM and other nouns
Analysis of Notepad writing unicom two characters garbled reason 1
Analysis of Notepad writing unicom two characters garbled reason 2
JavaScript bit arithmetic of small code brainiac
Other:
Chinese-Japanese-Korean Character Unicode encoding table for font editing
ANSI is the standard set, the National Standards Association, covers a wide range, similar to the mainland's GB
Win under the ANSI is narrow, refers to the current system encoding, equivalent to the code page
GB2312 is a national standard, double-byte character set, but earlier in the year, with fewer characters included (including punctuation)
GBK is expanded on the basis of GB2312, using unused code bits to incorporate more Chinese characters and symbols, so in general it should be used with GBK instead of GB2312. Web page Use gb2312 no problem is because the Web page itself is only displayed, depending on the client's font, even if the characters beyond the gb2312 encoding range, as long as the font contains can be displayed, the current client's font is basically enough to display GBK, not only gb2312 so few characters, So the display is absolutely no problem. But when programming these two must divide clearly, because gb2312 the number of characters is few, easy to cause the transcoding error, therefore should use GBK
Unicode is a character set, in fact, is a code table, not specific coding, the specific encoding is UC2,UC4,UTF-7,UTF-8,UTF-16,UTF-32, etc., UC is fixed length, the number of bytes per character is the same, UTF is variable length, The characters vary depending on the number of code-segment bytes in Unicode
Unicode under Windows refers to Utf-16, which is a bit confusing concept
The concept of a BOM is to add a few specific bytes to the front of the file to facilitate the identification of a Unicode encoded text
The conversion of the encoding between the General Code table, this general if not research, with the existing control or interface can be, such as ICONV, etc.
In-depth analysis of Java Web Technology Insider Xilingpo wrote section 3.3
The book above is the clearest thing I've ever seen on the code interpretation. Look for it.
Baidu a bit, a lot of blog introduced this.
http://blog.csdn.net/garfield2005/article/details/7681299
Http://www.cnblogs.com/cy163/archive/2007/05/31/766886.html
...
Recommended by Ruan Yi Feng wrote an introductory article: http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html