Imread)
Http: // 127.0.0.1/bom.html
Set header to: Content-Type: text/html; charset = UTF-8
Page content:
<Meta HTTP-EQUIV = "Content-Type" CONTENT = "text/html; CHARSET = big5">
<Script>
Alert (document. charset); // IE chrome
Alert (document. defaultCharset); // IE chrome
Alert (document. characterSet); // FF
</Script>
Specifically, bom.html is encoded as unicode. That is, the BOM on this page is ff fe.
Use IE Chrome Opera Firefox to access this page.
We can find that not only IE (tested in ie8), but Chrome ignores the header and sets the charset Browser Based on BOM. Only Opera FF is based on the charset in the header. (Updated on March 13, February 14: Opera focuses on the header)
I checked the documents on W3 and found no clear priority descriptions on BOM in html4. (If you find out, please tell me. Thank you very much.)
Html/charset.html # h-5.2.2 "> http://www.w3.org/TR/html4/charset.html#h-5.2.2
5.2.2 Specifying the character encoding
To sum up, conforming user agents must observe the following priorities when determining a documents character encoding (from highest priority to lowest ):
1. An HTTP "charset" parameter in a "Content-Type" field.
2. a meta declaration with "http-equiv" set to "Content-Type" and a value set for "charset ".
3. The charset attribute set on an element that designates an external resource.
In addition to this list of priorities, the user agent may use heuristics and user settings.
In HTML5-related documents, http://www.w3.org/TR/2011/WD-html5-diff-20110113/
For the HTML syntax of HTML5, authors have three means of setting the character encoding:
At the transport level. By using the HTTP Content-Type header for instance.
Using a Unicode Byte Order Mark (BOM) character at the start of the file. This character provides a signature for the encoding used.
Using a meta element with a charset attribute that specifies the encoding within the first 512 bytes of the document. e. g. <meta charset = "UTF-8"> cocould be used to specify the UTF-8 encoding. this replaces the need for <meta http-equiv = "Content-Type" content = "text/html; charset = UTF-8"> although that syntax is still allowed.
The above does not clearly indicate the priority. In addition, there are several similar parts in the html5 document, and there is no clear indication of the priority of HTTP Content-Type header and BOM. (Although the HTTP Content-Type header should be of a high priority), it is certain that the BOM priority is higher than that of meta.
There are still many related tests, such as the influence of the charset of the parent page on the Child page.
In addition, when identifying the encoding, if the browser finds that it has identified an error during rendering, it will re-load the original page.
Http://code.google.com/intl/zh-CN/speed/page-speed/docs/rendering.html
Browsers differ with the respect to the number of bytes buffered and the default encoding assumed if no character set is found. however, once they have buffered the requisite number of bytes and begun to render the page, if they encounter a character set specification that doesnt match their default, they need to reparse the input and redraw the page. sometimes, they may even have to rerequest resources, if the mismatch affects the URLs of external resources.
Http://simon.html5.org/test/html/parsing/encoding/charset-reload-200k.htm