UTF-8 GBK and gbk2312

Source: Internet
Author: User

Original address: http://blog.163.com/dangzhengtao@yeah/blog/static/7780087420102132421629? Fromdm & fromSearch & isFromSearchEngine = yes

1. Encoding
Commonly used encoding are: UTF-8, GBK, GB2312, ISO-8859-1, in addition to the iso-8859-1 of the other three encoding can be very good support for Chinese, but they are compatible with the ISO-8859-1 encoding (that is, no matter how the encoding changes, as long as it is a character in the ISO-8859-1, there will never be garbled ).
Among the four types of codes, GB2312 is a Chinese character encoding set specified by China, or a simplified Chinese character set encoding. GBK is an extension of GB2312, in addition to compatibility with GB2312, it can also display Traditional Chinese and Japanese Kana; while UTF-8 although also supports Chinese, but not compatible with the GB code (encoding value ). The UTF-8 uses a variable-length UNICODE encoding that may be 1-bit hexadecimal (that is, a character in the ISO-8859-1, which is also encoded in the same way) it may also be two or three hexadecimal digits. The advantage of UTF-8 is: 1. It has nothing to do with the CPU byte order, can communicate between different platforms. 2. High Fault tolerance capability. If any one byte is damaged, only one encoding bit will be lost at most and no chain lock error will occur (for example, if one byte is incorrect, the entire line will be garbled ), so in international processing, it is basically recommended to use UTF-8 as the encoding.

2. file encoding
There are two most commonly used file encoding types: ANSI and UTF-8, you can guess the name, ANSI is the default encoding we use to save the file, and UTF-8 needs to set their own. I used NOTEPAD and ECLIPSE tools for coding changes. NOTEPAD is the easiest to use. Just open the file and select the corresponding encoding in the Save As file, in addition, it supports encoding very well. In ECLIPSE, you only need to set the encoding slightly, open the preferences, and then select: regular-> content type (ContentType ), on the right side, select the file type you want to change the encoding, change its value in the default encoding below, and click UPDATE.


In other editors, the default saved content is GB2312 or GBK (corresponding to ANSI in NOTEPAD ). and according to the above mentioned UTF-8 and GBK, GB2312 and Other encoding values are different, you can know, if the file uses a UTF-8, then character encoding must use UTF-8, otherwise, different encoding values may cause garbled characters. This is why so many people use the UTF-8 encoding will also produce garbled root cause.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.