Thinking Xml:xml style of HTML5

Source: Internet
Author: User
Tags closing tag header numeric serialization svn versions xml parser xmlns

There was a time when there was competition on the web about marking future developments, namely, the competition between XHTML 2 and HTML5, the main browser manufacturer, under their respective organizations. First, the HTML5 was taken over by the consortium, and it recently announced a decline in XHTML 2. This makes a huge difference in the future of XML on the Web, and HTML5 is now a technology that every XML developer will use.

However, XML enthusiasts need not be disappointed: HTML5 supports proper XML serialization. Understanding the HTML5 of XML forms includes some of the major differences with legacy XHTML rules, and how to actually apply this vocabulary to modern web browsers.

The history of HTML has been controversial. Even with the best efforts of the Web architect, the Web page is always an unmanageable field, with its confusing, puzzling, and sometimes very annoying, broken tags (alias for tag Chowder). One of the goals of XML has always been to help solve this confusion, so XML is defined as "SGML on the Web" (SGML is an original language, and HTML is just one of them). The advent of XML immediately caused a stir. The consortium expects XML to succeed in the browser, and XHTML as the most natural development that is more coherent than HTML. Unfortunately, there are always unexpected problems that undermine this goal. Seemingly simple concepts, such as namespaces and links, have become a technical political nightmare. The resulting controversy and latency are enough to convince browser developers that XML can help solve existing problems, but it raises new, unknown issues.

Even without the growing evidence that XML is not a panacea, for a large number of legacy Web pages using tag chowder, browser developers are always encountering problems when trying to migrate to a strict xml-based path. Also, consider Postel's Law (Postel), which is named after a computer scientist named John Postel. The law stipulates that:

Do as conservatives do, and accept the rest as liberals do.

The limitations of XML are consistent with this rule on the server or database side, and the managers are conservative in policy. And that's why XML thrives. The Web browser may be the final example of receiving information from others, so this is where the XML and Postel laws are most concerned.

The development of XHTML

The situation has been very grim in the past few years. Browser vendors have largely ignored the world wide Web, and have created a group,what by creating a web-based Hypertext Technology Workgroup (Web Hypertext application Technology Working HTML5 WG) to develop HTML. But support for the XHTML of the consortium has stalled. By providing a place to continue HTML5 work, the world's business world first recognized reality, and in 2009 it stopped XHTML and accepted the fact of failure. There is no way to measure whether this is the end of XHTML in practice. Of course HTML5 is not intentionally designed to be XML-friendly, but it provides lip at least in the form of XML serialization of HTML (which is XHTML5 in this article). However, the matter is far from resolved, as described in a question in the HTML5 FAQ:

If I use syntax carefully in an HTML document, can I use the XML parser to handle it? No. There is a significant difference between HTML and XML, especially in parsing requirements, and you cannot use a tool designed for one side to deal with the other's problems. However, because HTML5 is defined by the DOM, in most cases, you can use HTML or XHTML serialization to represent the same document. However, some of the differences will be described later, which makes XHTML unable to accurately represent some HTML documents, and vice versa.

This can be confusing to any developer interested in the future of XML on the web. This article will provide a practical guide to the use of XML in HTML 5. This article is written for those who I call the ultimate cyber hacker, not the master of the world standard, but either interested in generating XHTML 5 on the web, or interested in using it in a simple way (that is, using information rather than worrying about large and complex rendering). I confess that it is painful for me to make some of these suggestions, because there is a long-standing claim to handling XML properly. Remember, HTML5 is still a working draft of the Consortium, and it may take some time before it becomes a complete recommendation. Although some of these features are stable and can be implemented well on the web.

To provide a document as a XHTML5

Unfortunately, I have more bad news. You may not be able to use XHTML5 as officially defined. This is because some regulations stipulate that in order to convert a document to XHTML5, it must be provided using a application/xhtml+xml or Application/xml MIME type. But if you do, all of the published Microsoft? Internet Explorer? Version will not be able to display it (but there is no problem with other mainstream modern browsers). The only practical workaround is to use the text/html MIME type to provide syntax XHTML5. Technically, this may violate some versions of the HTML5 specification, but unless you can not support Internet Explorer, there is no more choice. When it comes to increased clutter, this is a very controversial topic in the relevant working group, and at least that language has eased some drafts. The Internet Explorer 9 beta (also known as "Platform Preview") fully supports XHTML provided with XML MIME types, so this problem does not exist once the user has universal access to this version. Also, if you need to support versions of Internet Explorer 6 or earlier, the workaround described in this article is not enough. You can only use HTML 4.x.

Recommendation for the ultimate Web hacker: Use text/html MIME type to provide syntax XHTML5.

The fun of DOCTYPE

The good news from the ultimate Web Hacker's point of view is that XHTML5 makes document type declaration (DTDECL) issues less. XHTML 1.x and 2 require a notorious construct, such as <! DOCTYPE HTML PUBLIC "-//w3c//dtd XHTML 1.1//en" "Http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" >. The biggest problem with this is that the new processor is likely to load this DTD URL, which may be an unnecessary network operation. In addition, a URL includes many other URLs, and you typically do not need to download multiple files from the Web site. Sometimes the files placed on the consortium may even have problems, which can lead to problems that are difficult to debug.

In XHTML5, the XML nature of a file is entirely determined by the MIME type, and virtually any dtdecl will be ignored, so you can ignore it. But HTML5 provides a minimal dtdecl <! DOCTYPE html>. If you use this dtdecl, almost all browsers will switch to standard mode, which is generally more consistent and more predictable, even for the full HTML5. Note that HTML5 DTDECL does not reference any single file, so you can avoid some of the earlier XHTML problems.

Recommendations for the ultimate Web hacker: Document type declarations in XHTML5 using HTML minimized <! DOCTYPE html>.

Because you do not use any external DTD components, you cannot use common HTML entities, such as &nbsp; or &copy;. These are defined in the XHTML DTD that you have not declared. If you want to use them, the XML processor will fail with a undefined entity error. The only safe name character entity is:&lt;, &gt;, &amp;, &quot; and &apos;. Use numeric equivalence instead. For example, the use of & #160; Rather than &nbsp;; use & #169; Rather than &copy;.

Recommendation to the ultimate Web hacker: Do not use any named character entity except for the following entities:&lt;, &gt;, &amp;, &quot; and &apos;

Technically, if you provide the document as a text/html, using an HTML named character entity, based on the first recommendation, will not go wrong on most browsers, but relying on the accident is very fragile. Also, remember that browsers are not the only users of XML. Other XML processors do not know what to do with this type of document.

The Pleasures of namespaces

The last layer of understanding the complex mechanism of XML format is the namespace, the first two are MIME types and dtdecl. Perhaps you used to start XHTML documents with content similar to the following lines.

The Bold section (xmlns= "http://www.w3.org/1999/xhtml") is a namespace. In XHTML5, this namespace is still required. If you include other XML vocabularies, such as scalable vector graphics (scalable vectors graphics,svg), place them in their respective required namespaces.

Recommendation to the ultimate Web hacker: Always include the default namespace at the top of the XHTML5 document and use the appropriate namespaces for other embedded XML formats.

If you include other words, their namespace declarations must be in the outermost opening tag of the embedded section. If you declare them in HTML elements, you will experience a text/html document consistency error.

Handling XHTML5 Content

XHTML5 enables you to specify the type of media by using a protocol header, such as an HTTP Content-type header, or by using a special character tag named "Byte Order mark (BOM)", or by using an XML declaration. As long as you don't conflict with each other, you can use these methods at will, but the best way to avoid problems is to be cautious when choosing a combination method. Unfortunately, there is a potential problem with XML declarations because it makes Internet Explorer 8 and earlier switch to weird mode, which can lead to notorious display anomalies that make browsers famous.

Recommendation to the ultimate Web hacker: Only Unicode encoding is used for XHTML5 documents. At the beginning of the document, omit the XML declaration and use the UTF-8 encoding, or use the UTF-16 Unicode byte order mark (byte Mark,bom). If you can, you can use the Content-type HTTP header when you provide the document.

The following is an example of such an HTTP header:

Content-type: "text/html; Charset=utf-8 "

The new semantic markup element

HTML5 introduces new elements that provide a more semantically defined content structure, such as section and article. These elements are part of the HTML5 and may change, but the changes are not too great, and the improved expressions provided by these new elements also reduce the risk. One problem is that Internet Explorer does not construct these elements in the DOM, so if you use JavaScript, you need to adopt another workaround. Remy Sharp maintains the JavaScript fix by including the following code snippet in the document header.

<!--[If ie]>
<script src= "Http://html5shim.googlecode.com/svn/trunk/html5.js" ></script>
<! [endif]-->

You may also want to define CSS rules for elements so that any browser displays the document in HTML 4, and HTML 4 renders the unknown element inline. The following CSS should be valid.

header, footer, nav, section, article, figure, aside {
Display:block;
}

Recommendations for the ultimate Web hacker: Use the new HTML5 element, but include HTML5 Shiv JavaScript and default CSS rules to support them.

Put them together.

I have given a number of separate suggestions, and I will combine them to form a complete example. Listing 1 is the XHTML5 to meet these recommendations. When you provide it through HTTP, use the header Content-type, unless you can not support Internet Explorer: "text/html;" Charset=utf-8 when Internet Explorer is not supported, use the header content-type: "Application/xhtml+xml; Charset=utf-8 ".

Listing 1. The Complete XHTML5 sample

<! DOCTYPE html>
<title>a Micro Blog, in xhtml5</title>
<style>
<!--provide a fall-back for browsers that don ' t understand the new elements-->
header, footer, nav, section, article, figure, aside {
Display:block;
} </style>
<script type= "Application/javascript" >
<!--Hack support for the new elements in JavaScript under Internet Explorer-->
<!--[If ie]>
<script src= "Http://html5shim.googlecode.com/svn/trunk/html5.js" ></script>
<! [endif]-->
</script>
<script type= "Application/javascript" >
<!--... Other JavaScript goes ...-->
</script>
<body>
<article>
<section>
<p>
There is something important I want to say:
</p>
<blockquote>
A Stitch in time saves nine.
</blockquote>
</section>
<section><p>by the way, are as excited about the world Cup as I am?</p>
</section>
</article>
<article>
<section>
<p>
Welcome to my new XHTML5 weblog
</p>
</section>
</article>
<aside>
<ul>
<li><a href= "/2010/04" >april 2010</a></li>
<li><a href= "/2010/05" >may 2010</a></li>
<li><a href= "/2010/06" >june 2010</a></li>
</ul>
</aside>
<footer>& #169; by Uche Ogbuji</footer>
<nav>
<ul>
<li><a href= "/" >Home</a></li>
<li><a href= "/about" >About</a></li>
<li><a href= "/2010/06" >Home</a></li>
</ul>
</nav>
</body>

Listing 1 uses HTML5 dtdecl and declares the default namespace at the top. The style and script elements in this example only provide a workaround for the actual browser problem. Script elements are required only if you are using other JavaScript. The document uses a large number of new HTML5 elements, and I don't describe them in detail because they are not XML-specific. Note that the IMG element uses the "self-closing" syntax (in other words, it ends with a/>), and the copyright symbol uses the numeric entity form & #169;.

See table 1 for an overview of the behavior of the above examples in different browsers.

Table 1. Browser support for XHTML5 that is recommended in this article

Browser behavior
Legacy browsers (for example, Internet Explorer 6.x or earlier, Netscape, Firefox 1.x) render will be unpredictable. For example, the closing tag of a "self-closing" element might be incorrect. If you use an HTML named entity, there is no error.
Internet Explorer 7 or 8 because the text/html MIME type is used, rendering will be a regular "tag chowder", but any dtdecl will trigger "standard mode", as Internet Explorer provides it. No error reports appear for HTML named entities.
Modern, HTML5 browsers such as Firefox 3.x, Safari 4 or the latest Opera or Google Chrome because of the MIME type, rendering will be HTML5 (not XHTML5), but it will be in "Standard mode" Under No error reports appear for HTML named entities.
Any standard XML 1.x processor will not consider MIME types. The parser will see all the elements in the XHTML namespace. If you use any fake HTML named entities, you will receive an error message.


Conclusion

A recent important situation was the publication of the first public work draft, "Polyglot markup:html-compatible XHTML Documents", by the HTML Working Group on the global business, which was designed to provide more comprehensive, accurate and up-to-date information about XHTML5.

Also, it is painful for me to make some of the suggestions in this article. These solutions come from a long experience of pain, and this is the only way to avoid bugs that are difficult to reproduce and strange incompatibilities when mixing XML into the world of real HTML. This is certainly not to say that I have stopped advocating careful XML design and best practices. It is a good idea to save the outermost component of the connection browser as XHTML5. All types of XHTML are better at rendering languages than in languages that carry information. You should transfer the primary information for most systems in other XML formats. Then convert it to XHTML5 at the end. You may be curious about the meaning of creating XHTML5 at the last minute, but keep in mind the Postel rule, which recommends that the generated content be strictly executed. By generating XHTML5 for browsers, it is easy for others to extract information from your Web site and applications. This is an important feature in this mashup, Web API, and data Project era.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.