Html5 webpage encoding and html5 Encoding

Source: Internet
Author: User

Html5 webpage encoding and html5 Encoding

How do you specify the encoding of a page? Do you know how the browser recognizes the encoding?

First, let's look at the differences in different browsers using simple HTML pages:

<!DOCTYPE html>

Simplest HTML,And<body>There is no content, and the server does not provide a specific encoding statement. Open it locally and check the page encoding in each Browser:

Browser Display Encoding Remarks
IE6 UTF-8  
IE8 UTF-8  
IE9 GB2312 Default Character Set
Firefox3.5 GBK2312 Default Character Set
Firefox4.0 ISO-8859-1 Western European language, English default encoding
Chrome GBK Default Character Set
Opera Chinese-Automatic Detection It should also be GB2312

As can be seen from the table, different browsers have different resolutions for pages that do not use any means to declare encoding. Of course, no matter what encoding is used (the premise is the ASCII superset) in the simplest page, it does not affect, but it is sufficient to show the importance of correctly setting the encoding.

Encoding statement

HTML4 and HTML5 use a chapter to describe the encoding declaration method. You can click here to view the relevant sections of HTML4 or click here to view the relevant sections of html5.

Source: http://www.otakustay.com/learning-html5-charset

First, what is encoding? Encoding means specifying a browser (or user agent) to parse byte streams with a special algorithm to get the correct content. In HTML standards, codes can be represented by aliases. The encoding alias is defined by IANA. Only the encoding in this list can be recognized by the browser. So if the UTF-8 written UTF8, the browser may be completely ignored. In addition, the encoding alias isCase Insensitive.

In HTML4, three methods are proposed to specify the page encoding. The priority is as follows:

There is no doubt about this. It should be noted that<meta http-equiv="Content-Type">If the browser finds that the encoding used by the browser is inconsistent with the label declaration, the browser will return to the header to reparse the page. This will cause a part of the page to be re-parsed. Therefore, if you try to declare the encoding using the TAG method, we recommend that you write the tag in front as much as possible. One best practice is to writeLabel, before any other labels. Google PageSpeed also introduces this point.

Age Evolution

But with the passage of time, developers gradually discovered one thing. Just like the simplest statement of DOCTYPE, the browser is reading<meta>Tag encoding is not strictly in accordance with the standard. All in all, because the page encoding must be determined before the Tokenizer stage in the HTML parsing phase, it is impossible for the browser to break down the DOM tree just like the DOM tree is analyzed.<meta>Structure of the tag.http-equivAndcontentAttribute, and then determine the encoding.

In reality, the browser does a very simple thing to read<meta>Tag-defined encoding:

From the above algorithm, it is not difficult to find that the following methods can allow the browser to correctly identify the encoding:

  • <meta http-equiv="Cotnent-Type" content="text/html; charset=utf-8" />
  • <meta charset="utf-8" />
  • <meta charset=utf-8 />
  • ...... And many other odd statements.

As a result, with the advancement of history, the browser vendors finally sat together and began to discuss this issue ...... In the end, they were surprised to find that their implementations were very similar (maybe they simply borrowed from each other), so they decided to turn this approach into a standard ...... Finally, after a long discussion, HTML5's popular encoding declaration method was born. In HTML5, it is called "meta charset element". Its simplest form is as follows:

<meta charset=utf-8>

Of course, this is the HTML syntax. If you follow XHTML and feel that XHTML is more cordial, write<meta charset="utf-8" />No problem.

The specific algorithm used to obtain the encoding described above is also detailed and can be seen here.

In the HTML5 era, standards have once again corrected and refined the encoding declaration methods. In general, there are the following differences:

  • HTML5 allows the use of BOM to determine the encoding, but only supports the BOM of the UTF-16 (that is, U + FEFF), and does not explain how the BOM specifies the encoding priority.
  • HTML5 addedmeta charsetLabel.
  • HTML5 requires that if no encoding is specified for a page, ASCII is used as its encoding, while HTML4 requires that the browser can select its own Encoding Based on the environment.
Miscellaneous

In addition to the basic declaration method of encoding, there are still many details to be aware of in the standard:

  • If you use<meta>Label declaration encoding, the encoding can only be ASCII superset encoding. It can be simply considered that an ASCII superset supports ASCII encoding of 256 characters.
  • HTML5 is highly recommended for UTF-8 encoding.
  • Do not use UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB character set, and prohibit the use of CESU-8, UTF-7, BOCU-1 and SCSU character set. But in fact the browser can at least recognize the UTF-7.
  • Developers who want to strictly abide by XHTML should use the XML declaration to specify the encoding, that is<?xml version="1.0" encoding="UTF-8" standalone="no" ?>. However, this affects DOCTYPE in IE6, so developers cannot compromise on this point and use HTML statements.
  • This article is worth reading about the priority of each encoding declaration method in reality and some other details that need attention.
Best practices
  • Use the HTTP header to specify the encoding.
  • Use UTF-8 as much as possible, or at least all resources on the site use unified encoding.
  • If you want to use a UTF-16, add BOM to the file to determine whether it is Little Endian or Big Endian.
  • If you use<meta>The label specifies the encoding. http-equiv format is not used, but the label should appear first as much as possible, at least before any non-ASCII characters.
  • Link to the external script. If the encoding cannot be determined to be the same, add the charset attribute.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.