Thinking xml: Good advice for creating XML

Source: Internet
Author: User
Tags definition mixed serialization xml example xml parser ibm developerworks

XML is used more and more widely, but the structure of many XML is not good. Even if the structure is good, often poorly designed, making handling and maintenance very difficult. And most of the infrastructure used in XML makes the problem worse. There was a public discussion about XML best practices, such as Henri Sivonen's article "HOWTO avoid being called a Bozo when producing XML." Uche Ogbuji often discusses XML best practices on IBM Developerworks, where he presents the main points discussed in these articles.

For years I've been discussing XML best practices in this column and other series of articles. Others, like my fellow columnist Elliotte Rusty Harold, also talked about the issue. The better the XML expert participates in the XML design principles discussion, the better the community will provide consistent advice to developers who adopt XML at different levels. This article introduces the XML best practices in more detail, combining the latest and previous articles.

There's no more idiots.

Henri Sivonen wrote a useful article "HOWTO avoid being called a Bozo when producing XML" (see Resources). He uses the idea of an xml-based WEB feed format, such as RSS and Atom, and suggests a guide to what should and shouldn't be done using namespaces to generate well-formed XML. As he said in the introduction:

Some developers seem to think that it is very difficult (if not impossible) for XML to be built programmatically, but others can do it and wonder why others are so incompetent. I assume that no one wants to appear incompetent or to be named. Therefore, I hope that the following suggestions will help developers turn from the first category to the second category of people.

The first piece of advice given by Henri is "don't think of XML as a text format", which I believe is a dangerous proposition. Of course, the basic idea is correct-not to generate and edit XML as easily as a simple text document does, but this requirement applies to all structured text formats. However, saying that XML is not text betrays one of the most important features of XML, which is mantras in the canonical XML definition. (The text object is a well-formed XML document [if this specification is compliant]. Henri's formulation is also confusing because there is a technical definition of XML text, which is largely interpreted as a sequence of characters in XML. Text is not just a leaf element or a major component of a property--technically this type of text is called character data. Text is also a major component of all XML entities, so it is paradoxical to say that XML is not text. I think it makes more sense to emphasize the distinction between XML and the text format that developers are already familiar with.

The comments above suggest that Henri's recommendations may be biased by overly concerned about the problem of generating a well-formed Web feed. He is right to warn people that simply piling strings and expecting it to become well-formed XML is dangerous. I also suggested in the article that people use a sophisticated XML toolbox rather than using a simple text tool to create XML (see Resources). My concern is that the way Henri describes this proposal is a bit confusing and misleading in the broader context of XML processing. He repeated this view in the "Don t use text-based templates" and "Don t print" sections. I think it's possible to generalize his proposal to "do not use a mechanism that does not guarantee well-formed XML." "This is indeed a very important proposal. As Herni mentioned, one way to securely create XML is to send a SAX event, "using a tree or stack (or XML parser)." But even this does not give you peace of way. The SAX tools used do not necessarily require all the necessary structural good checks. For example, some Unicode characters are prohibited in XML. Additional checks may be required to address these issues.

Henri recommends that users do not attempt to manually manage namespaces, which is correct. As I've discussed on DeveloperWorks, I have to be very careful with the XML namespaces. He advises developers to consider the general situation in terms of the uniform name [namespace Uniform Resource Identifier (URI) plus local name], but sometimes it is unavoidable to face prefixes or XML declarations. In a specification such as XSLT, the QName (prefix/local name combination) can be used in attribute values, and the prefix is assumed to be interpreted according to namespace declarations within the scope. This pattern is called QName in the context. In this case, the developer must control the prefix of the Declaration, or the XML processing will fail. If developers manage their namespace declarations, the results tend to be messy because of the complexity of XML namespaces.

Because namespace syntax can become very confusing after XML processing pipelines, one workaround is to insert a normalization step at the end of the pipeline. XML normalization eliminates various syntactic variants that XML 1.0 and XML namespaces allow, including different namespace declarations. Normalization does not eliminate all issues that make namespace declarations dangerous for developers. Normalization also does not solve the QName problem in the context because it does not change the prefix used in the document, but it does mitigate the confusion of namespace declarations, making it easy to identify the problem, or even write code to automatically correct the problems. The GenX Library is one of the XML creation tools recommended by Henri, which automatically generates canonical XML, and many other toolkits provide the normalization functionality as an option.

Henri's recommendations about Unicode and character processing are basically completely correct. But I think the section "avoid adding pretty-printing white spaces in character data" is a bit exaggerated. In most cases, fine print between elements, rather than elements with character data, is safe. As Henri, listing 1 shows that it is usually unsafe to render in Listing 2.

Listing 1. XML Example

<foo>bar</foo>

Listing 2. An XML example that adds whitespace to the character data

<foo>
 bar
</foo>

But it is usually safe to print the XML in Listing 3, and the output is shown in Listing 4.

Listing 3. Another XML example

<doc><foo>bar</foo></doc>

Listing 4. The XML in Listing 3 adds a space to the character data

<doc>
 <foo>bar</foo>
</doc>

Many XML serialization tools can understand relatively secure and unsafe print formats. It's important to know that if you add spaces to mixed content, the fine print form shown in listings 3 and 4 can cause distortions. If you use pattern-guided serialization, you can avoid this type of problem. In practice, however, most vocabularies that use mixed content are less sensitive to whitespace normalization, so you don't have to worry too much about fine print. You should be fully aware of the problem and know that there is no way to turn off fine print (preferably by default without fine printing). Henri the fine print practice shown in Listing 5, but I disagree because I don't think those ugly tags are easy to understand.

Listing 5. Henri Sivonen Recommended But the authors disagree with the fine print method

<foo
  >bar</foo
>

Monastery's advice

For now, the second piece of information to be explored in this article is "monastic XML" written by Simon St. Laurent (see Resources). This is a small set of essays around how to make the most of XML and make some suggestions on how to deal with and think about XML. Simon uses monasteries and asceticism as a metaphor to suggest that increasing the burden of XML that does not fit into its simple text root (textual root) is dangerous. In the marking-up at the Foundation, he discusses the essential role of character data and tags (elements and attributes). In "naming things and reading names", he explains why a generic identifier (also known as an element type name) is an important concept and should be the only key component of the tag information structure. Ideally, if you are using an XML namespace, the key is to unify the name (namespace URI plus this place name), which is one of the reasons Simon snapped at namespaces as opportunity. "Accepting the discipline of trees" reveals an unfortunate secret of XML: Although it seems that XML hierarchies can easily be extended into graphics structures, it is hard to use XML to model diagrams. But so far, the most important suggestion on the "monastic XML" site is that "the processing of optimized tags is always immature." XML is a declarative technique, and for many developers there is a lot of misinformation about its strength and inadequacy. Developers who try to bring XML design and processing details closer together often make processing more difficult in the long run. The key to XML success is to focus on the characteristics of the information that needs to be abstracted, separating it from the technical design of the system that needs to handle the information.

Conclusion

There are always different ideas when discussing XML best practices, especially in the early stages, but it's good to hear different sounds. There are few references on this topic and I will continue to discuss it in this column. If you have any information or suggestions for best practices or wish to share your views, please join the discussion at the Thinking XML forum.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.