The difference between gb2312 and UTF-8

Source: Internet
Author: User
Tags mixed movable type

The GB2312 code contains about 6,000 Chinese characters (excluding special characters), the coding range is the first B0-f7, the second coding range is A1-fe (the first is CF, the second is a1-d3), and the number of Chinese characters is 6,762 characters. Of course there are other characters. Includes control keys and other characters about 7,573 character encodings
GBK coding is an extension of the gb2312 code, which accommodates more Chinese characters, but only expands and does not change in quality. All the gb2312 codes are preserved, and the coding range is extended. Accommodating (including special characters) is 22,014 characters encoded.
GB18030 encoding is an extension of the GBK encoding, because more Chinese characters, only using two-bit encoding can not
Accommodates the required Chinese characters, so a 2/4-bit blending method is used to support more encoding. and retained the original GBK 2-byte encoding compatible gb2312 and GBK encoded files. Approximately 55,657 encodings (including special characters)
Unicode encoding (that is, UTF encoding): Universal code, is committed to the use of unified coding guidelines to express the country's text.
In order to express more text, the Utf-8 adopts 2/3 mixed method. The range of Chinese characters currently accommodated is less than GBK encoding. And with
3-byte way to deal with Chinese, bringing the problem of compatibility, the original gbk,gb2312,gb18030 code files are not normal processing, there is a long way to go.

A note on the standard of Chinese character characters
The GB2312 code contains about 6,000 Chinese characters (excluding special characters), the coding range is the first B0-f7, the second coding range is A1-fe (the first is CF, the second is a1-d3), and the number of Chinese characters is 6,762 characters. Of course there are other characters. Includes control keys and other characters about 7,573 character encodings
GBK coding is an extension of the gb2312 code, which accommodates more Chinese characters, but only expands and does not change in quality. All the gb2312 codes are preserved, and the coding range is extended. Accommodating (including special characters) is 22,014 characters encoded.
GB18030 encoding is an extension of the GBK encoding, because more Chinese characters, only using two-bit encoding can not
Accommodates the required Chinese characters, so a 24-bit blending method is used to support more encoding. and retained the original GBK 2-byte encoding compatible gb2312 and GBK encoded files. Approximately 55,657 encodings (including special characters)
Unicode encoding (that is, UTF encoding): Universal code, is committed to the use of unified coding guidelines to express the country's text.
In order to express more text, the Utf-8 adopts 2/3 mixed method. The range of Chinese characters currently accommodated is less than GBK encoding. And with
3-byte way to deal with Chinese, bringing the problem of compatibility, the original gbk,gb2312,gb18030 code files are not normal processing, there is a long way to go.


In the end, with UTF-8 or GB2312?


tend to gb2312

I now use English 2000, but in addition to the interface of the "start" into a "start" such a small change other things do not feel any difference, just installed 2000 when casually go to a domestic website he will ask you whether to install Simplified Chinese, point is, Ann, even IE Do not have to reopen to see the Chinese directly, English system to see Chinese is so simple, replace the traditional system to see the simplified version should be no difference, it is impossible to appear or garbled.

UTF-8 's display of multiple languages on the same screen is a really interesting new thing, however, after all, as long as the use of all day to consider compatibility issues, and rarely need to UTF-8 characteristics: I use Simplified Chinese to write things, see the people in general system only Simplified Chinese or a simplified Chinese can be seen, There will never be a Japanese or Korean, so UTF-8 is still useful.

XML and DVDs, these are very good things, implemented for many years also can not become "default configuration", yes, GB2312 in UTF-8, just as with VCD of Dvd:dvd good is good, but at present almost all the software is also used 650MB CD-ROM, a home computer can not read DVD , but must not be able to read the ordinary CD-ROM, or even install a system to start up is very difficult.


UTF8 or GB2312.
Friends who have been surfing the internet in the early years know that NS or IE browser earlier version does not support the browsing of multinational languages, if you want to browse traditional Chinese, Japanese and other foreign language sites also need a such as "Chinese star" or "four-pass Cube" plug-in software, and then gradually upgrade the browser, so far, Almost all browsers support multi-language characters and can browse websites in any country or language. The appearance of the blog, especially the appearance of trackback, make the behavior of the network internationalization from passive browsing information to the active interactive information transition, but the new language barrier problem appeared again ...

The main problem is the function of interactive functions such as trackback (reference), Ping (notice), Notification (notice). Previously our interactions in understanding and application networks were largely confined to the c-s range, this is the interaction between the customer (individual) and the server (the website), for example: Publish an article on the network or reply to a forum post, generally speaking this kind of interaction rarely appears the problem of language incompatibility. However, the trackback in the blog is not only the c-s of the interactive way, it's still s-s (between blog sites), or even multiple, such as when publishing an article, you can simultaneously choose to publish it on one or more different blogs at the same time, or send updates to the designated person , you can also allow more people to book, include your RSS content update information. Blog Interactive Way More, more flexible, of course, this interaction is not unlimited, language coding is a big obstacle, if your blog system code is the Chinese simplified GB2312, then all trackback, Ping interactive objects can only be limited to the domestic use of GB2312 code users , your blog will not be able to communicate with Taiwan, Japan and other users with a GB2312 code to achieve interaction.

A better solution is to use UTF-8 encoding, although the use of UTF-8 code back to occupy some space (a Chinese character needs 3 bytes), but the internationalization of the problem has been solved, UTF-8 compatible GB2312, BIG5, EUC-JP and other countries of the language code, tested, There are no problems with the various interactions and communication between blogs with UTF-8 encoding. In fact, more than 90% of Taiwan's blog has abandoned the BIG5, and the use of UTF-8 encoding, and the mainland's blog is almost still GB2312 code, it seems that Taiwan's internationalization is still quite a leader.

A few days ago, I converted my blog from GB2312 code to UTF-8 code, trackback and ping a few Taiwanese friends ' blogs and found no problems. It seems that the "internationalization" of the problem has been resolved, but with the advent of a new problem, my blog and the domestic GB2312 code of the blog can not be interactive, of course, this is inevitable. I ping to online-edu.org (the site uses GB2312 code) on the information are garbled.

I think this is not a technical aspect of the problem. If your site or blog has the need for international communication, through the use of UTF-8 coding method to solve, if there is no such demand, the use of GB2312 also no serious. In the eyes of the user is the same, coding is only backstage things. But I hope bloggers are best to use the UTF-8 code, because your blog has trackback and Ping, they are holding international flights, if only around the country, it is really wasteful.

About Trackback
Trackback is one of the first small functions on the movable type. It can be said that this small function in the blogosphere has set off a revolution.

Trackback is a feature that connects countless blogs around the world. For example, when you read a website article, you want to write down your thoughts about it. This is the most common way to use the discussion features prepared by the Web site to contribute. But this is only to write their own comments on the site to other people to contribute, and you have nothing left in your own hands.

Trackback is a big difference. You can write your comments on your website. Then send the URL of the page to the server where the original article is posted, with information such as the title, part of the body, the name of the Web site (note). Although this process is just called "Send Trackback Ping", in this way, you leave the URL, title, and other parts of your comment in place of the original article. Of course, other people can also send trackback ping to the original article, so in the original article will include your trackback ping, including all the comments are recorded.

In addition, if you have a trackback ping feature on your site, anyone will be able to comment on you by Trackback Ping. In this way, a number of sites through the relevant topics linked. All kinds of comments are connected like meshes on the Internet. This creates a completely different culture from the diary site.

Note: The sending address takes the URL specified in the original article, which is called the "TrackBack Ping URL." The last "128" is a dedicated number for the original article, called the "TrackBack ID." In addition, Trackback's technical standards are published in the "lowlife.jp" blog site.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.