The risk of using XHTML correctly

Source: Internet
Author: User
Tags cdata lowercase xml stylesheet

Junchen Note: The Omemo.net site seems to have been hung up, links are invalid. The article is very well written and has been the best of 456 Berea Street. I made a few changes to the code and translations here to be faithful to the original.

I've been using XHTML for years, but until last summer I looked at how to use it correctly, that is to say, to application/xhtml+xml MIME type to servo (server) it. Although I have encountered these problems, I know that the problem is far from the case. As you're about to discover, when you start using real XHTML, you get a lot of seemingly small but confusing questions.

Please note that this is not a discussion article that supports or opposes the use of XHTML. I just write down the potential mistakes I know and let you decide for yourself: HTML 4.01, XHTML 1.0 for all browsers, or text/html for browsers that can handle it application/xhtml+ XML while the other browsers are servo-text/html XHTML 1.0. Otherwise some things will be completely different.

Only when the problem happens, I have the opportunity to understand and know these things. In some cases I have to spend a lot of time looking for problems and asking someone else for a solution. But I learned a lot from it, and I'll tell you what I should know after I've used XHTML.

Note that the problem I mentioned here only occurs in user proxies that can handle application/xhtml+xml MIME types correctly, and therefore XHTML is used as XML. This may also be the reason why the early use of XHTML is not mentioned here--few people use such browsers, so almost no one is upset by the text/html XHTML.

Today, the fact that XHTML Servo is application/xhtml+xml is slowly becoming commonplace. There are two reasons I know:

The number of people using Firefox,mozilla,opera,safari and other XHTML-compliant browsers has increased, so you're no longer just doing this for yourself and your partner. Well. Maybe you do that when it will affect more people.

Among web developers, there is more and more awakening to what the true face of XHTML is. There have been a lot of heated discussions about using XHTML, especially when Servo is text/html. If you're involved in any discussion, you know what I'm talking about.

If you, like me, decide to implement certain types of content negotiation and use the right media type when delivering XHTML, you need to know what can (and will) happen in your published documents and how to avoid the problem. For readers who are interested in the content negotiation with the content negotiation, I recommend that you read content negotiation and serving up XHTML with the correct The MIME type. There are many of these types of articles, but this is the most wonderful two I have read.

Each basic tutorial has a clear distinction between HTML and XHTML: element and attribute names use lowercase, and property values are always quoted. Do not use simplified attributes to ensure that all elements have an end tag and no incorrect nesting, and so on. However, there is more to be learned when XHTML Servo is application/xhtml+xml.

A good structure is a must.

The document must be well-formed (well-formed) XML (not necessarily the same as legitimate (valid) XHTML). It is necessary, not possible.

If the document is poorly structured, a standard browser (currently I know that Mozilla,firefox,netscape,camino,opera,safari and omniweb--quite a few browsers except IE) will display an error message and abort the document in some way.

In addition, this means that an unsigned "&" number is no longer used.

The XML declaration may be required

If you want to use a UTF-8 or UTF-16, you must have an XML declaration, unless the HTTP header already provides the encoding.

Whether the character encoding is specified in the HTTP header is somewhat blurry, architecture of the World Wide Web, Volume one:media Types FOR xml: Overall, you should not specify character encoding for XML data in the protocol header because the data itself has been described.

On the other hand, XHTML 1.0, Second edition:character encoding writes:

The best way to get the document to use the specified character encoding is to ensure that the Web server sends the correct headers.

That is, it is a good practice to specify a character encoding in an XML declaration:

<?xml version= "1.0" encoding= "Iso-8859-1"?>

Only five entities are safe.

Only five pre-defined entities (&lt, &gt, &amp, &quot, and &apos;) are guaranteed. Others may be completely ignored or directly output. For example, if the XHTML document contains entities such as &nbsp; or &rdquo;, Safari will output directly. Instead, opera chooses to ignore unknown entities, while the Mozila family recognizes these entities and processes them in HTML "if documents refer to identifiers in the common mapped browser pseudo DTD directory and do not have a separate declaration document."

Using UTF-8 character encoding is the most recommended, allowing you (almost) to use any character you need to type a document without an entity or character number. If you are unable or unwilling to use UTF-8, numeric character numbers can be supported and safely used.

The contents of an SGML annotation may be ignored

SGML Comments (HTML style annotations, <!--annotation-->) may be (and will) be commented by the browser, even within the script or style element.

In HTML, the contents of script and style are generally loaded into annotations in order to hide them in browsers that do not know the script or style elements, and to generate plain text on the page.

In XHTML, doing so causes the browser to ignore any content in the annotation.

The habit of hiding script and style in old browsers dates back to the middle of the 1990. My experience is that browsers with such performance are very rare, so you can safely ignore them and stop the SGML annotations in scripts and styles, even if you're using HTML.

The contents of scripts and style elements are also treated as XML

The style and script elements are pcdata (parsed character data, parsing character datasets) blocks, not CDATA (character data, character data) blocks. So anything that looks like XML in it is parsed as XML and can cause errors unless it is well-formed.

To use <, &, or--in a script or style block, you need to use CDATA:

<script type= "Text/javascript" >
<! [cdata[
...
]]>
</script>

In CDATA, you can have any sequence of characters that will not be parsed as XML (except to end CDATA part]]>).

In documents that need to be sent as text/html, the start and end tags of the CDATA section need to be commented out to hide in browsers that cannot handle CDATA sections:

<script type= "Text/javascript" >
<! [cdata[
...
]]>
</script>
<style type= "Text/css" >
/* <! [cdata[* *
...
*]]> * *
</style>

If you want to make sure that very old browsers hide CDATA Parts, you need to use more sophisticated methods, as described in Ian Hickson's sending XHTML as text/html considered harmful:

<script type= "Text/javascript" >
<!--//--><! [cdata[//><!--
...
--><!]] >
</script>
<style type= "Text/css" >
<!--/*--><! [cdata[/*><!--* *
...
/*]]>*/-->
</style>

A better approach might be to use the content negotiation script to remove any CDATA portions before sending a text/html document.

Of course, the smartest and safest way to do that is to move all of the CSS and JavaScript to an external file, but not always the real thing.

No elements that will automatically complement

In HTML, if the TBODY element of the table is omitted, the browser is automatically filled, and XHTML does not. If you do not add tbody clearly, it will not appear. Keep in mind when writing CSS selectors and JavaScript.

Scripts written with document.write no longer work

Using Javascript,document.write in XHTML does not work. Ian Hickson explained the reason in the Why document.write () doesn ' t work in XML. You need to use Document.createelementns () instead. Find out more about the forum topics that are available in experts Exchange.

This is one of the reasons Google AdSense does not work in XHTML. For those who want to use Application/xhtml+xml servo XHTML and Google ads, here's a workaround: Simon Jessey's making AdSense work with XHTML. Although a bit of trouble, but still work (I also used here), at the same time by Google's endorsement.

Introducing Style elements

In XHTML, in order to be compatible with XML methods that define CSS rules, you should use XML stylesheet declarations (access to XHTML 1.0, Second edition:referencing Style Elements when serving as XML XML-like Table declarations and associating style Sheets with XML documents xml-stylesheet processing instructions). To load an external CSS file, we need to use the style element, and we should use an XML stylesheet declaration to introduce the style element. To do this, use the id attribute to give the style element a decomposed identifier, and then introduce the identifier in an XML stylesheet declaration:

<?xml-stylesheet href= "Stylesheet1.css" type= "Text/css"?>
<?xml-stylesheet href= "#stylesheet2" type= "Text/css"?>
<! DOCTYPE html
Public "-//w3c//dtd
XHTML 1.0 strict//en "
"Http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >
lang= "EN" >
<title>xml stylesheet declaration</title>
<style type= "Text/css" id= "Stylesheet2" >
@import "Stylesheet2.css";
</style>

I don't know how much is necessary in practice, and what's wrong with not using XML stylesheet declarations. Maybe someone will enlighten me.

CSS application rules are somewhat different.

CSS applied to the nature of the body does not apply to the entire document of XHTML. The most noteworthy is the application background color or picture. In HTML, the background applied to the BODY element will overwrite the entire page. In XHTML, you have to format HTML at the same time. This behavior is demonstrated in Juicy Studio's CSS body Element test.

Element and attribute names in XHTML as CSS rules are case sensitive (and must be lowercase). The easiest way to avoid a problem is to keep everything lowercase in both html,xhtml and CSS.


It's a challenge, but it's not impossible.

When I started hosting XHTML for compatible browsers for Application/xhtml+xml, I might have a lot less headaches if I could read the same article before making a decision. I even consider using HTML 4.01 Strict. Even so, I learned a lot from experience, and learning is always a good thing.

Using true XHTML correctly, hopefully this article will provide you with some more useful information and can provide more of a basis for deciding whether or not to take this route.

HTML and XHTML may be more different than what I mentioned here, so here's what you encounter when you use Application/xhtml+xml xhtml, and if you know any errors or ignore them, be sure to tell me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.