ThinkingXML: good suggestions for creating XML

Source: Internet
Author: User
XML is widely used, but the structure of many XML files is not good. Even if the structure is good, it is often poorly designed, making processing and protection very hard. Most of the infrastructure used for XML makes the title worse. So the XML

XML is widely used, but the structure of many XML files is not good. Even if the structure is good, it is often poorly designed, making processing and protection very hard. Most of the infrastructure used for XML makes the title worse. Therefore, a public discussion on XML best practices was presented, such as Henri Sivonen's article "HOWTO Avoid Being Called a Bozo When Producing XML ". Uche Ogbuji often discusses XML best practices on IBM developerWorks, where he points out the points discussed in these articles.

Over the past few years, I have been discussing XML best practices in this column and other series of articles. Other people, such as my own columnist Elliotte Rusty Harold, also talked about this title. The better the XML experts participate in the discussion of XML design principles, so that the community will provide consistent suggestions to developers who adopt XML at different levels. This article will join the latest and past articles to introduce XML best practices first.

No more dummies

Henri Sivonen wrote a useful article "HOWTO Avoid Being Called a Bozo When Producing XML" (see references ). He adopted the XML-based Web feed pattern (such as RSS and Atom) and put forward a guide to what should be done and should not be taken as a well-structured XML with namespaces. As he said in his introduction:

Some developers seem to think that it is very difficult to keep the structure well (if not impossible) because of the inherent XML programming method, while others can do this, and wonder why others are so incompetent. I suppose no one is willing to appear incompetent or be named. Therefore, I look forward to seeing the following suggestions help developers transform from Category 1 to Category 2.

Henri gave the first suggestion that "do not regard XML as a text pattern", which I think is a dangerous suggestion. Of course, its basic point of view is accurate-you cannot create XML as you would in a simple text document, but such requests are useful for all structured text patterns. However, XML is not the most important feature of XML, which is regarded as a standard in the XML definition. ("The text object is a well-structured XML document [if it complies with this specification]".) The definition of Henri is also confusing. because of the technical definition of XML text, it is generally described as a string of characters in XML. Text is not just an important component of leaf elements or attributes-technically, this type of text is called character data. Text is also an important component of all XML Entities. Therefore, XML is not self-contradictory to text. I think the differences between exaggerated XML and text patterns that developers are already familiar with are more meaningful.

The above comments indicate that Henri's suggestion may be somewhat excited due to overhead of the title of a naturally well-structured Web feed. Warning people to simply pile up strings. it is dangerous to look at it as a well-structured XML, which is accurate. I also recommend that people use sophisticated XML toolboxes instead of simple text tools to create XML (see references ). What I doubt is that Henri's method of describing this suggestion is a bit confusing, which may cause misinterpretation in the broader XML processing context. He reiterated this idea in "Don't use text-based templates" and "Don't print. I think it is possible to return his suggestion as "do not apply a mechanism that cannot guarantee a sound structure of XML ." This is indeed a very important suggestion. As Herni mentioned, one way to securely create XML is to send a SAX event, "application tree or stack (or XML parser )". But even if you do this, you cannot rest assured. The application's SAX tool does not have to perform all necessary structural goodness checks. For example, some Unicode characters are prohibited in XML. Additional checks may be required to determine these titles.

Henri recommends that you do not manually manage namespaces, which is accurate. I have discussed it on developerWorks and must be very cautious with XML namespaces. He suggested that developers consider the general situation by referring to the same [namespace, same Resource Identifier (URI), and local place name, however, sometimes it is inevitable that the prefix or XML declaration should be faced. In XSLT, QName (prefix/local place name combination) can be applied to attribute values, and it is assumed that the prefix is based on the namespace declaration description in the scope of the role. This mode is called the QName in the context. In this case, the developer must hold the declared prefix; otherwise, XML processing will fail. If developers govern their own namespace declarations, the results may be messy due to the complexity of the XML namespace.

The namespace syntax may become messy after an XML Pipeline is processed. one solution is to insert a canonicalized step at the end of the pipeline. XML Normalization removes the syntax variations agreed by XML 1.0 and XML namespaces, including different namespace declaration methods. Normalization cannot eliminate all titles that make namespace declarations dangerous to developers. Normalization does not solve the problem of QName titles in high and low levels. since it does not change the prefix applied in the document, it can indeed reduce the confusion of namespace declarations, you can easily determine the title, or even write code to actively correct the title. GenX Library is one of the XML creation tools recommended by Henri, which can take the initiative to generate a standard XML. many other toolboxes also provide the standardization function as an option.

Henri's suggestions on Unicode and character processing are complete and accurate. However, I think the "Avoid adding pretty-printing white space in character data" section is a bit exaggerated. In most cases, exquisite printing between elements rather than those with character data is safe. As stated in Henri, listing 1 is usually insecure if listing 2 appears.

Listing 1. XML Example

Bar

Listing 2. XML Example after adding a vacancy to the character data


Bar

However, it is usually safe to print XML in the form of listing 3, and the output result is shown in listing 4.

Listing 3. another XML Example

Bar

Listing 4. adding spaces to the character data in XML in listing 3


  Bar

Many XML serialization tools can understand relatively secure and insecure print patterns. It is necessary to know that, if spaces are added to the mixed content, the exquisite print situation shown in listing 3 and 4 may cause distortion. This type of title can be avoided if the serialization of application mode guidance is used. However, in practice, most of the vocabulary using mixed content is not so sensitive to vacancy normalization, so you don't have to worry too much about exquisite printing. You should fully understand the title and know that there is no way to close the exquisite printing (it is best not to use exquisite printing by default ). Henri proposed the exquisite print practices shown in listing 5, but I don't approve it because I don't think it's easy to understand the embarrassing labels.

Listing 5. exquisite printing methods recommended by Henri Sivonen but not approved by the author of this article

> Bar>

Advice of the monastery

The second article to be discussed in this article is the "Monastic XML" written by Simon St. Laurent (see references ). This is a group of short articles, which throw some suggestions on how to make full use of XML to process and think about XML. As a metaphor, Simon used the monastery and assilience to propose that it is dangerous to add excessive burdens of the plain text root (textual root) to XML. In "Marking-up at the foundation", he discussed the essential functions of character data and tags (elements and attributes. In "Naming things and reading names", he explains why a general identifier (also known as an element type name) is an important concept and should be the only key component for marking the information structure. In fantasy situations, if an XML namespace is used, the key is the same name (the namespace URI with the local name). This complexity is one of the reasons why Simon shouted in "Namespaces as opportunity. "Accepting the discipline of trees" reveals an unfortunate secret of XML: Although it seems that XML hierarchies are easy to expand into a graphic structure, it has proved a little hard to create a graph using XML. But so far, the most important recommendation on the Monastic XML website is "the process of optimizing tags is always immature ". XML is a declarative technique. for many developers, there are many false words about its strength and weakness. Developers who try to narrow down the XML design and processing details in the long term usually make the processing harder. The key to XML success is to focus on the characteristics of the information that needs to be abstracted, and separate it from the technical design of the system that needs to process the information.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.