Since I started programming, I have been familiar with coding and have never mastered the essence. For example, what is the relationship between ansi and gbk? What is the relationship between gbk and gb2312? What is the difference between ansi and utf8? What is the relationship between Unicode and utf8, ansi, gbk, gb2312, utf8 (with or without Bom), ut...
Since I started programming, I have been familiar with coding and have never mastered the essence.
For example, what is the relationship between ansi and gbk? What is the relationship between gbk and gb2312? What is the difference between ansi and utf8? What is the relationship between Unicode and utf8, as well as ansi, gbk, gb2312, utf8 (with or without Bom), utf16, utf32, and Unicode conversion among others, there is always no chance to solve these problems, we hope to get a satisfactory answer in segmentfault.
If you have books in this area (javascript is the best, because it happens to be a problem in javascript), it would be better!
Summary
Books
Fonts and encodings are available in English only
Xu lingbo's in-depth analysis of the Java Web Technology Section 3.3
Article
Character encoding notes: ASCII, Unicode and UTF-8
Unicode encoding
UNICODE, GBK, UTF-8 differences
Unicode Character Set and multi-Byte Character Set
I will talk about Unicode encoding and briefly explain the terminologies such as UCOS, UTF, BMP, and BOM.
Analysis of the Causes of garbled characters in the notebooks written by China Unicom 1
Analysis of the Causes of garbled characters in notepad and Unicom 2
Small code University question-JavaScript bit operation
Others:
Chinese and Japanese character Unicode encoding table for font editing
Reply content:
Since I started programming, I have been familiar with coding and have never mastered the essence.
For example, what is the relationship between ansi and gbk? What is the relationship between gbk and gb2312? What is the difference between ansi and utf8? What is the relationship between Unicode and utf8, as well as ansi, gbk, gb2312, utf8 (with or without Bom), utf16, utf32, and Unicode conversion among others, there is always no chance to solve these problems, we hope to get a satisfactory answer in segmentfault.
If you have books in this area (javascript is the best, because it happens to be a problem in javascript), it would be better!
Summary
Books
Fonts and encodings are available in English only
Xu lingbo's in-depth analysis of the Java Web Technology Section 3.3
Article
Character encoding notes: ASCII, Unicode and UTF-8
Unicode encoding
UNICODE, GBK, UTF-8 differences
Unicode Character Set and multi-Byte Character Set
I will talk about Unicode encoding and briefly explain the terminologies such as UCOS, UTF, BMP, and BOM.
Analysis of the Causes of garbled characters in the notebooks written by China Unicom 1
Analysis of the Causes of garbled characters in notepad and Unicom 2
Small code University question-JavaScript bit operation
Others:
Chinese and Japanese character Unicode encoding table for font editing
ANSI is a standard set. It covers many aspects, similar to the national standard of the Chinese mainland.
ANSI in Windows is a narrow sense. It refers to the encoding of the current system, which is equivalent to code page.
GB2312 is a Chinese national standard and dual-byte character set. However, the year is earlier and contains fewer Chinese characters (including punctuation marks ).
GBK is extended on the basis of GB2312. More Chinese characters and symbols are incorporated using unused codes. Therefore, GBK is generally used instead of GB2312. It's okay to use gb2312 on the webpage because the webpage itself is only displayed, depending on the font of the client. Even if the characters that exceed the gb2312 encoding range can be displayed as long as the font is included, at present, the client font is basically enough to display gbk, not as few as gb2312 characters, so the display is completely OK. However, when programming, the two must be clearly divided, because the number of gb2312 characters is small, it is easy to cause transcoding errors, so we should use gbk
Unicode is a character set, in fact is a code table, not specific encoding, the specific encoding is uc2, uc4, utf-7, UTF-8, UTF-16, utf-32, etc. uc is fixed length, each character has the same number of bytes, And the utf variable length. The characters vary depending on the number of bytes in the unicode code segment.
Unicode in windows refers to UTF-16, which is a bit confusing.
The concept of bom is to add several specific bytes at the beginning of the file to facilitate the identification of a unicode encoded text.
Code table is generally used for inter-encoding conversion. If this is not studied, you can use some existing controls or interfaces, such as iconv.
In-depth analysis of Java Web Technology in section 3.3 written by Xu lingbo
The above book is the clearest explanation of the code I have ever seen. You can find it.
Baidu, a lot of blogs introduce this.
Http://blog.csdn.net/garfield2005/article/details/7681299
Http://www.cnblogs.com/cy163/archive/2007/05/31/766886.html
...
I recommend a beginner article written by Ruan Yifeng: http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html.