Decode XML and DTD-guide for beginners who write correctly formatted and clearly defined XML

Source: Internet
Author: User
Tags xml example xml parser xsl

Level: elementary

Jane Fung, jcyfung@ca.ibm.com, visualage for Java support, IBM Canada

July 01, 2001

This introductory article explains how to create an XML "Document Type Definition (DTD)" and an XML file with a correctly defined format that can be validated by the XML syntax analyzer you selected. Although you do not have to include DTD in every generated XML file, this will make your life very easy. DTD not only enforces the syntax created for the XML file, it also allows the file to perform syntax analysis by the validation XML syntax analyzer. Code samples include DTD and XML document examples.

"Extensible Markup Language" has been around for a long time, so most people are now familiar with its basic requirements: All XML documents must be in correct format and valid format. But how can we determine whether your XML document meets these requirements? The short answer is that you don't have to be sure. Or at least not. Most of the time, you will rely on the XML syntax analyzer to manage these hardships for you.

After some small surveys (see references), you will find that the market is filled with XML syntax analyzers, most of which can be obtained from the web for free. A basic XML syntax analyzer emphasizes both XML syntax rules (that is, ensure that the file format is correct) and object validity. XML syntax analyzer can be used in almost every related computer language, including C, C ++, Perl, Python, TCL, and Java.

When it comes to ensuring that the XML format is correct, you can point to a syntax analyzer more or less and then execute it. However, to ensure that the document is valid, you must provide the document type definition or DTD for the syntax analyzer.

What is the mode?

Recently, W3C has promoted the XML schema specification that has been discussed for a long time to the "proposal" state, which means it may be widely used by developers. In some ways, XML Schema replaces DTD. In other aspects, DTD is still the best solution. For more information about how to interpret XML schema and compare it with DTD in terms of functionality and processing, see references.

This article reviews what is the correct XML document format, and then talks about a few topics to confirm-more specifically, it is a DTD. I will discuss why you need to include a DTD in an XML file, introduce some of the most common DTD syntax, and use a few simple samples to teach you how to write your own DTD.

Why is the format correct?

When XML developers talk about XML with the correct format and incorrect format, we are not involved in aesthetic discussions. Of course, the correctly formatted XML document meets the following three basic structure requirements:

  • There is a parent (or root) element that contains all other elements
  • Each start tag has an end tag.
  • All elements are correctly nested

Listing 1 is an XML example with the correct format. Note: The parent element of this document is<person>, Each start tag has an end tag, and each end tag has a Definition identical to its start tag. Generally, the start tag and end tag contain information or text. However, in some cases, no information or text is included between tags. An empty tag must end with a right slash. <nothing/>Is an empty tag.

Listing 1. correctly formatted XML

<person>
<firstname>Jane</firstname>
<lastname>Fung</lastname>
<nothing/>
</person>

Listing 2 is an incorrect XML example. It illustrates three common errors. First, start and end<firstname>The tag does not completely match. Second,<lastname>The tag does not have an end tag. Finally, the empty tag does not end with a right slash.

Listing 2. incorrectly formatted XML

<person>
<Firstname>Jane</firstname>
<lastname>Fung
<nothing>
</person>


Back to Top

What content does DTD contain?

XML allows you to define meaningful tags, so you can customize documents to the maximum extent possible. But XML is XML (eXtensible), and people are people (crazy people), which may soon be uncontrollable. The solution is DTD, which specifies the tag of the XML document. In short, DTD specifies the elements that can exist in the document, the attributes that can be possessed by those elements, the hierarchy of elements within the element, and the sequence in which the elements appear throughout the document.

Although DTD is not necessary, it does bring convenience. DTD is suitable for three basic purposes. It can:

  • Document tagging
  • Enhance Internal consistency of tag Parameters
  • Enable the XML syntax analyzer to confirm the document

If you do not define a DTD for an XML document, the document cannot be confirmed by the XML syntax analyzer. How can I replace DTD with an XML schema instance? (See what is the sidebar mode ?) Listing 3 is the DTD of the XML document displayed in Listing 1.

Listing 3. Streamline the DTD of person. xml

<!ELEMENT person (firstname, lastname)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT nothing EMPTY>

Some examples

In listing 3, the first line of DTD defines the parent element of the XML document:person. The person element has two child elements:firstnameAndlastname.

The second and third rows contain element attributes.#PCDATA, It indicatesfirstnameAndlastnameAn element may contain characters that have undergone Syntactic Analysis (in this case, text ). The last line of the DTD file describes an empty Tag:nothing.

As you can see from the DTD in listing 3, anyone who reads our XML document (and the syntax analyzer that performs syntax analysis on it) knowspersonThe element only contains two text elements:firstnameAndlastname. In addition, the DTD stipulates that, throughout the document,firstnameThe element must be inlastnameElement.

Before turning to a more complex example, let's review some of the most common DTD syntax elements. You can find the complete DTD specification on the W3C homepage (see references ).



Back to Top

Quick Guide to DTD syntax

A, B, C, and D are the variables that represent elements in the following example.

The element must have a correct element.AAt least oneB(Represented by the plus sign), zero or multipleC(Represented by an asterisk) and zero or oneD(Represented by question marks ):

<!ELEMENT element (A, B+, C*, D?)>

Elements may haveAOrBOrCOne:

<!ELEMENT element (A | B | C)>

The element does not contain any content:

<!ELEMENT element EMPTY>

Elements can contain any elements listed in the DTD:

<!ELEMENT element ANY>

The element may contain characters that have been analyzed by syntax or another element (element2). The asterisk (*) indicates the mixed content model. The elements can contain different types of attributes.

<!ELEMENT element (#PCDATA|element2)*>

In the following example, the text "entity reference" is inserted anywhere in the document:

<!ENTITY element "entity reference">

The reference elements of the object in the XML document are as follows:

&element;

The following example shows that the element is an empty tag containing three attributes: attribute 1 (att1Is an optional attribute, attribute 2 (att2) Is a fixed value"A"Attribute, attribute 3 (att3Is a required text attribute.

   <!ELEMENT element EMPTY>


<!ATTLIST element
att1 ID #IMPLIED
att2 CDATA #FIXED "A"
att3 CDATA #REQUIRED>

The element used in the XML document is as follows:

<element att2="A" att3="MustHave"/>

AttributeCDATAIndicates that the information included should be text.IDAttribute indicates that a unique identifier must be entered. Each element can have only oneIDAttribute. In addition,CDATAIndicatesatt2Andatt3It may contain any string.

If you are not fully familiar with the syntax, continue to read it. The examples in the next section should help you eliminate your doubts.



Back to Top

Example

You can use Microsoft Internet Explorer 5 or later to view the XML document displayed in Listing 4-the extended version of the people. xml file used in the previous example. If people. XML is opened in ie5, a tree structure is displayed. This is because ie5 has an XML syntax analyzer capable of analyzing XML Document Syntax into an element tree.

You can also find this file and Its DTD in references.

Listing 4. Complete list of people. xml

<?xml version="1.0"?>
<!DOCTYPE people SYSTEM "people.dtd">
<people>
<person>
<name>
<firstname>Jane</firstname>
<lastname>Fung</lastname>
</name>
<look>good-looking</look>
<possession>
<car>
<model>Civic</model>
</car>
<job>&IBM;</job>
</possession>
</person>
<person>
<name>
<firstname>G.I.</firstname>
<lastname>Jane</lastname>
</name>
<look>tough</look>
<possession>
<townhouse townhouse_type="good" />
<bankaccount bankaccount_number="sg-123">
<![CDATA[<greeting>5000</greeting>]]>
</bankaccount>
</possession>
<other>
<car>she has a car</car>
<townhouse townhouse_type="good" />
</other>
</person>
</people>

Description of XML

The main consideration for an in-depth study of XML is the several elements in the Document Header, starting from the following:

<?xml version="1.0"?>

Each XML document must contain such a header, indicating to the XML syntax analyzer that it is an XML document. The next line in the header tells the XML syntax analyzer what character encoding this document uses to create:

<!DOCTYPE people SYSTEM "people.dtd">

The XML documents created on UNIX systems and those created on Windows systems may be encoded differently.

You can also set the OptionalstandaloneAttribute. The default value of standalone isNo.NoThe value indicates that the DTD definition is described in another file.YesThe value indicates that the DTD should be defined within the XML document. I have not set this attribute for the example; if you want to set it, it should look as follows:

   <?xml version="1.0" standalone='yes'?>
<!DOCTYPE people [
<!ELEMENT people (person+)>
<!ELEMENT person (#PCDATA)>
]>

Make sure that the document format is correct. For example, all empty tags end with a right slash, as shown below:

<townhouse townhouse_type="good" />

Note thatCDATAThis function is used to escape any data that is interpreted in XML without escaping. For example:

<![CDATA[<greeting>5000</greeting>]]>

If properly formatted, this line is displayed in text:

<greeting> 5000 </greeting>

You can benefit from further research on XML files, or even from running an XML syntax analyzer on your own files (see references ). But now, let's take a look at the dtd of the people. xml file.

Listing 5. Complete list of people. DTD

<!ELEMENT people (person+)>
<!ELEMENT person (name, look*, possession?, other?)>
<!ELEMENT name (firstname, lastname)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT look (#PCDATA)>
<!ELEMENT possession (car?, house?, bankaccount?, job?)>
<!ELEMENT car (#PCDATA|model)*>
<!ELEMENT model (#PCDATA)>
<!ELEMENT house (apartment|standalone|townhouse)>
<!ATTLIST house house_area ID #IMPLIED country CDATA #FIXED
"CANADA" city CDATA #IMPLIED>
<!ELEMENT apartment EMPTY>
<!ELEMENT standalone EMPTY>
<!ELEMENT townhouse EMPTY>
<!ATTLIST townhouse townhouse_type ID #IMPLIED>
<!ELEMENT bankaccount (#PCDATA)>
<!ATTLIST bankaccount bankaccount_number ID #REQUIRED>
<!ELEMENT job (#PCDATA)>
<!ELEMENT other ANY>
<!ENTITY IBM "Proud to work for IBM">

Some notes about DTD

The Quick Guide serves as a reference. By comparing the XML file and Its DTD, you should be able to easily define the relationship between the DTD and the elements in the XML file. However, there are two other elements that you may be interested in.

Listing 4 contains references to objects.

<job>&IBM;</job>

Entity references are used to replace a specific character or string defined in the DTD document. After the syntax analysis, the object reference will be read:

<job> Proud to work for IBM </job>

You should also note that,<other>The marked content type isANY. This indicates<other>It may contain all elements that have previously been declared in the DTD. Therefore,otherElement may containcarAndhouseElement, as follows:

   <other>
<car>she has a car</car>
<townhouse townhouse_type="good" />
</other>


Back to Top

Conclusion

This ends the basic introduction to the XML file with the correct creation format and definition. You may want to continue studying the people. xml and people. DTD files by yourself. If you want to use the XML syntax analyzer to perform syntax analysis on these files, see references to find a list of syntax analyzers available for download.

References

  • For more information, see the original article on the developerworks global site.

  • To learn more about XML and DTD syntax, XML 1.0 W3C recommendation should be your first stop.
  • Tim Bray is one of the original edits to the XML 1.0 specification. He maintains textuality.com, where he can find his ideas about XML, DTD, and other content. You can also find the XML syntax analyzer of lark, larval, and Bray.
  • Doug Tidwell's Tutorial: XML introduction shows a nearly complete discussion of "extensible markup language.
  • You may also want to carefully view the XML for the absolute beginner published by Mark Johnson on javaworld.
  • Download the people. xml and people. DTD files used in this example for further research and analysis.

XML syntax analyzer: simple table

  • Ibm xml Parser for Java (xml4j), which is currently version 3.1.1, is a validation XML syntax analyzer written in Java 100%. Package (COM. IBM. xml. parser) Contains classes and methods for syntax analysis, generation, manipulation, and validation of XML documents.

  • Ibm xml for C ++ Parser (xml4c) is based on Apache xerces-c xml syntax analyzer. It is a validation XML syntax analyzer written in C ++'s portable subset.
  • Tclxml is a fully-tcl XML syntax analyzer.
  • Xerces is a Java syntax analyzer from the Apache Software Foundation. It is currently in version 1.4.0.
  • Lars Marius goshol is responsible for maintaining this detailed list of XML syntax analyzers and other XML tools as public services.

Related Links

  • View the latest information on the XML area page.

  • Like DTD, style sheets are not required when creating XML files, but they are important if you want to control document display in your browser. Alan Knox's developerworks article, style sheets can write style sheets, too, describes how to use XSL to convert XML data into complex display tags for browsers.
  • After reading this article, you may want to view the XSL editor of IBM alphaWorks.
  • To explore XML schema and Its Relationship with DTD, see comparison of DTD and XML schema in David Mertz's developerworks column "XML question 7, kevin Williams's temporary forum on developerworks agrees to use XMLSCHEMA to understand the structured definition of XML documents for data. For a brief description of how to use XML schema, see the introductory article basics of using XML schema to define elements.

About the author

Jane Fung is currently working in the IBM visualage for Java Technical Support Team, which provides support for enterprise developers who use visualage for Java. Jane earned a bachelor's degree in electrical engineering application science from waterlu University, Ontario, Canada, and is a Sun Java 2 certified programmer. You can contact Jane through a jcyfung@ca.ibm.com.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.