Correctly understand and use GBK and UTF-8 page encoding

Source: Internet
Author: User

The Web page code is translated into the website page encoding, which is a library that specifies its specific character encoding format in a Web page.

GBK is the standard of GB2312 compatible GB2312 on the basis of national standard. The text encoding of the GBK is expressed in double-byte notation, that is, both Chinese and English characters are represented by double-byte, in order to distinguish the language, the highest bit is set to 1. GBK contains all Chinese characters, is the country code, the generality is worse than the UTF8, but UTF8 occupies the database bigger than GBK.

UTF-8: Unicode transformationformat-8bit, which allows BOM, but usually does not include BOM. is a multi-byte encoding used to solve the international character, which uses 8 bits (or one byte) in English, and Chinese uses 24 (three bytes) to encode. UTF-8 contains the characters that all countries in the world need to use, and is an international code with strong versatility. UTF-8 encoded text can be displayed on browsers that support the UTF8 character set in each country. If it is UTF8 code, it can also display Chinese in the foreigner's English ie, they do not need to download IE's Chinese language support package.

Although the UTF-8 version has good international compatibility, Chinese needs to occupy 50% more database storage space than the GBK/BIG5 version, so it is not recommended for use by users with special requirements for international compatibility. To put it simply: for Chinese more sites, it is appropriate to use GBK encoding to save database space. For English more websites, it is suitable to use UTF-8 to save database space.

GBK, GB2312, etc. and UTF8 how to convert ? GBK, GB2312, and UTF8 must pass Unicode encoding to convert to each other: GBK, GB2312--UNICODE--UTF8;UTF8--UNICODE--GBK, GB2312. Using Save as in Windows Notepad, you can convert between GBK, Unicode, Unicode big endian, and UTF-8 in several ways.

How to make the browser correctly recognize the page encoding ? Generally in the Web page should have the following sentence : <meta http-equiv= "Content-type" content= "text/html; charset=gb2312" >, Indicates that the character set encoding for this web page is GB2312. (or UTF-8)

The page sometimes specifies why encoding is sometimes garbled ? This may be the page declaration encoding is inconsistent with the file itself, more often in the wrong code to open the page and then save the result, or the use of some FTP software directly online modification of the file, such as CuteFTP, due to the software encoding configuration errors caused by the conversion of the wrong encoding. At this point, use the window's Notepad to open, save as to the corresponding encoding to solve the problem.

When using IE as a browser on a Windows operating system, the problem often occurs when browsing a Web page that uses UTF-8 encoding, the browser does not automatically recognize the encoding used by the page, even if the page has already been declared encoded: <meta http-equiv= " Content-type "content=" text/html; Charset=utf-8 "/>, resulting in some pages containing Chinese UTF-8 encoded blank output. If you are using a Firefox, Sarafi browser, this will not cause this problem. This is due to IE parsing the page encoding in the HTML tag first, and then the HTTP header message, and the Mozilla series browser is just the opposite.

Since UTF-8 represents a man for 3 bytes, the normal GB2312 or BIG5 is two. Page output, because of the above reasons, so that the browser parsing, output <title></title> content, if there is an odd number of full-width characters before </title>, ie treats UTF-8 as two bytes when parsing a half Chinese characters, At this time the half of the Chinese character and </title> < combined into a garbled character, resulting in IE can not read the <title> part, so that the entire page is empty hundred output, and if you look at the source file, you will find that the entire page has actually been output, However, the browser does not display content. The simplest solution is to put <meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 "/> before <title></title>.

Correctly understand and use GBK and UTF-8 page encoding

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.