Interpreting HTML: Namespaces and character encodings

Source: Internet
Author: User
Tags character set xmlns

In the process of doing the project, we often establish a variety of specifications to facilitate better cooperation between the team to better achieve the project; Also, we often hear a variety of protocols, such as the Open XMPP protocol used by Google's IM software gtalk, As long as other IM software also follow the XMPP protocol can use interoperability with gtalk, and the information on the Internet is not counted, the information itself is independent, how to concatenate and present in front of the user, you need to use the HTTP protocol.

In the same way, because browsers differ in their respective cores, the rendering of the default style is different, so it requires a rule that each browser follows to ensure that the same Web document appears on different browsers in a consistent style, the rule being DOCTYPE declaration.

Since the internet is interoperable, any two or more Web pages may involve data exchange, and because the XML language allows the user to customize the label, any two exchanged documents may have the same label, causing the same label to conflict, So you need a namespace to swap out the same tags that might exist in the document.

XHTML is a language that transitions from HTML to XML and does not implement user-defined tags in XML languages, so the namespaces in XHMTL documents are the same:

XMLNS is the acronym for XHTML namespace, which is called "namespaces." Like the DOCTYPE declaration, xmlns also belongs to a declaration. Unlike the DOCTYPE declarations that still exist in HTML documents, there is no xmlns in the HTML document, and the usual xmlns we see is in the XHTML document.

When making a Web page, in addition to declaring DOCTYPE (document type) at the beginning, if it is an XHTML document, you need to declare the namespace, and the third thing to declare is the character encoding type of the page document:

<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8″/>

Each XHTML document should declare the character encoding used in order to be properly interpreted by the browser and validated by the consortium. Most of the time the Web page documents appear garbled mostly because of the wrong character encoding caused.

Utf-8 is a Unicode variable length encoding expression, as a universal character encoding is increasingly used in Web documents, the use of UTF-8 characters encoded Web pages to the maximum extent to avoid different areas of users access to the same Web page when the character encoding differences caused by the garbled phenomenon.

But when we open most of the domestic web site, especially the portal type of large Web site, the statement on character encoding is not utf-8, but gb2312:

<meta http-equiv= "Content-type" content= "text/html; Charset=gb2312″/>

Of course, in addition to gb2312, there are some sites using GBK or GB18030 encoding, these three characters are encoded in the Simplified Chinese character set. In other words, if a computer does not have the Simplified Chinese character set installed, when it accesses the Chinese page with character encoding as gb2312, the display is garbled.

Since the use of gb2312 character encoding will be due to different areas of user access and may appear garbled, then why not use Utf-8?

One reason may be the legacy of history, and another more important reason is that the size of the document is different because the two encodings are stored in different ways.

When using a gb2312 character encoding set, a Chinese character is 2 bytes in size, while a Chinese character in the Utf-8 encoding is often in 3 bytes, or even more than 3 bytes. So for the same Chinese document, the volume used to store the gb2312 character encoding is less than the size of the document stored by the Utf-8 encoding.

And for the text of the large number of visits to the Chinese web site, the use of gb2312-encoded Web pages can be saved on the download transmission of the traffic, and also because the Chinese site's user groups are basically locked in Chinese users, which is also a lot of sites using gb2312 encoding rather than utf-8 coding reasons.

But the text to visit a large number of sites, not many in the country, coupled with possible garbled problems, so in the production of Web pages recommended using Utf-8 code.

Of course, regardless of the encoding, the most important thing is that the whole station to use the code to unify.

For a character-coded declaration, in addition to the above, you might see another way of declaring it:

<meta http-equiv= "Content-language" content= "gb2312″/>

<meta http-equiv= "Content-language" content= "ZH-CN"/>

This declarative approach is intended for older browsers and is deprecated today as browsers have generally been upgraded.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.