Comparison and analysis of XML Schema and XML DTD technology

Source: Internet
Author: User
Tags cdata comparison documentation xml parser xmlns xpath


Zhou (zhoujtnet@yahoo.com.cn) National Professional Laboratory of CAD/CAM, Northwestern Polytechnical University
Wang Ming Micro (wangmv@hotmail.com) Northwest University of Technology CAD/CAM National Professional Laboratory

July 01, 2002
XML DTD is currently the most widely used XML schema, XML Schema has become the official recommendation standard, and there is a tendency to replace XML DTD. So, from a technical point of view, there are some differences between XML schema and XML DTD, and the XML schema has those advantages.

Introduction

XML DTD (document type definition of XML) is one of the most widely used patterns in XML technology in recent years. However, because XML DTD does not fully meet the requirements of XML automation processing, for example, the application of different modules can not be well coordinated, lack of document structure, attributes, data types and other constraints of adequate description, etc., so the May 2001 formally recommended XML Schema for XML The standard mode. Obviously, the consortium wants to use XML Schema as the mainstream of the XML Schema description language, and gradually replace the XML DTD. So what are the advantages of XML schemas compared to XML DTDs, and do XML DTDs actually fade away in the XML Schema description domain?


Back to top of page





XML Schemas and XML formats

XML Schemas are languages that describe factors such as XML structure, constraints, and so on, such as XML Schemas, XML DTDs, Xdr,sox, and so on. The XML format is the format that the XML document itself has. In this paper, XML Schema is represented as the standard of XML Schema schema, and the XML schema is represented in all XML Schema description languages.

From the description language of the schema, both the XML schema and the XML DTD belong to the syntax pattern. Unlike conceptual patterns, syntax patterns can use different syntaxes when describing the same thing, for example, when describing a relational schema, whether using an XML Schema or an XML DTD, you can use elements or attributes to describe the columns of a relational schema.

Schemas must be represented in some format, the XML schema is very different from the format of the XML DTD, XML Schema is actually an application of XML, which means that the XML Schema format is exactly the same as the XML format, and as SGML A subset of DTDs with XML DTDs that have a completely different format from the XML format. This distinction brings many benefits to the use of XML schemas:
XML users, when using XML Schema, do not need to re-learn to understand XML schema, saving time;
Because the XML schema itself is also an XML, many XML editing tools, API development packages, and XML parser can be applied directly to the XML schema without modification.
As an application of XML, XML Schema naturally inherits the self-description and extensibility of XML, which makes XML schema more readable and flexible.
Because the format is exactly the same as XML, XML Schema can be stored in the same way as XML documents it describes, in addition to being processed like XML, to facilitate management.
The consistency of XML schema and XML format makes it easy to exchange the schema between the application systems with XML as data exchange.
XML has very high legitimacy requirements, XML DTD description of XML, often also used as a basis to validate the legitimacy of XML, but the validity of XML DTD itself is lack of a good authentication mechanism, must be handled independently. XML Schema is different, it has the same legitimacy authentication mechanism with XML.


Back to top of page





Data type


Perhaps, for many developers, one of the most salient features of XML schemas compared to XML DTDs is their support for data types. This is entirely because the XML DTD provides only 10 types of built-in (built-in) data types, such as CDATA, enumerated, NMTOKEN, Nmtokens, and so on. Such a small data type usually does not meet the needs of document understanding and data exchange. XML schema is different, it contains 37 kinds of data types, such as long,int,short,double and other common data types, and by the data type represented by the value space, lexical Space and facet three-part ternary group for greater flexibility. However, the true flexibility of the XML schema data type comes from its support for user-defined types. XML schema provides two ways to implement the definition of a data type.

1) Simple type definition (SimpleType), which is based on the data types built into the XML schema or other simple data types derived from the data types built into the XML Schema (SimpleType), through restriction, The list or union method defines the new data type.

For example:


Source Code 1 Restriction Way definition <simpletype name= ' Sku ' >
<restriction base= ' string ' >
<pattern value= '/d{3}-[a-z]{2} '/>
</restriction>
</simpleType>


Source 2 list mode definition <simpletype name= ' listofdouble ' >
<list itemtype= ' Double '/>
</simpleType>


Source 3 Definition of union method <xsd:attribute name= "Size" >
<xsd:simpleType>
<xsd:union>
<xsd:simpleType>
<xsd:restriction base= "Xsd:positiveinteger" >
<xsd:mininclusive value= "1"/>
<xsd:maxinclusive value= "/>"
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType>
<xsd:restriction base= "Xsd:string" >
<xsd:enumeration value= "Month"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:union>
</xsd:simpleType>
</xsd:attribute>


2) Composite type definition (COMPLEXTYPE), which provides a powerful, complex data type definition mechanism that can implement complex data types including structural descriptions. The following is an example of a table structure in a relational schema implemented in the complextype definition, with table T_c_type (Psign,count), where Psign is a char data type and Count is the number data type. Then there are:

SOURCE 4 complextype Definition <!--table structure type definition--
<complextype name= "T_c_type" >
<sequence minoccurs= "0" maxoccurs= "unbounded" >
<element name= "Psign" >
<complexType>
<simpleContent>
<restriction base= "string" >
<attribute name= "Value" type= "string"/>
</restriction>
</simpleContent>
</complexType>
</element>
<element name= "Count" minoccurs= "0" >
<complexType>
<complexContent>
<restriction base= "AnyType" >
<attribute name= "value" type= "int" use= "optional"/>
</restriction>
</complexContent>
</complexType>
</element>
</sequence>
</complexType>



Moreover, the XML schema allows the content of the element to be empty, which expands the scope of the XML Schema's description of the data, while the XML DTD is powerless. For example:

SOURCE 5 XML Schema element null value definition <element name= ' test ' nullable= ' true '/>



Back to top of page





Support for element order

XML DTD and XML Schema both support the description of the order of the child element nodes, but the XML DTD does not provide a description of the disorder, that is, if an XML DTD is used to describe an element's unordered appearance, it must be implemented in the form of various possible permutations of the exhaustive element, This method is not only cumbersome, sometimes even unrealistic. For example, A/b child element of table, if you want them to appear in any order, is described with an XML DTD:

Source code 6 A and B child elements in any order the XML DTD definition <?xml version= "1.0" encoding= "UTF-8"?>
<! ELEMENT Enter_name_of_root_element_here empty>
<! ELEMENT table ((A, B) | ( B,a)) >
<! ELEMENT A (#PCDATA) >
<! ELEMENT B (#PCDATA) >



The XML schema provides a <all> tag to describe this situation:

Source code 7 A and B child elements in any order the XML Schema definition <xsd:element name= "a" type= "xsd:string"/>
<xsd:element name= "B" type= "xsd:string"/>
<xsd:element name= "Table" >
<xsd:complexType>
<xsd:all>
<xsd:element ref= "a"/>
<xsd:element ref= "B"/>
</xsd:all>
</xsd:complexType>
</xsd:element>



It can be seen that it is much simpler to use XML schema to implement the unordered description of child elements.


Back to top of page





Name space

The purpose of introducing namespaces in XML is to be able to use some common definitions (usually definitions of elements or data types, etc.) in an XML document for other XML documents and to ensure that there are no semantic conflicts. XML DTDs do not support this feature, which further limits the scope of application of XML DTDs. The XML Schema is a good fit for this.

Also, the XML schema provides methods for include and import two reference namespaces. In the following example, the XML Schema document references the definitions of the other two XML schemas, using import to achieve the purpose of mixing different namespaces. The example also defines the keyref constraints between elements in different namespaces.

SOURCE 8 XML Schema use of namespaces schema targetnamespace= "http://202.117.84.144"
Xmlns:xs= "http://202.117.84.144"
Xmlns= "Http://www.w3.org/2001/XMLSchema"
Xmlns:a= "http://202.117.84.228/middlewareSqlServer2000sqlservertest20211784228"
xmlns:b= "http://202.117.84.228/middlewareOracle805ioracletest20211784144"
elementformdefault= "qualified" >
<import namespace= "http://202.117.84.228/middlewareSqlServer2000sqlservertest20211784228"
schemalocation= "F:/xml schema/middlewaresqlserver2000sqlservertest20211784228.xsd"/>
<import namespace= "http://202.117.84.228/middlewareOracle805ioracletest20211784144"
schemalocation= "F:/xml schema/middlewareorcal805ioracletest20211784144.xsd"/>
<annotation>
<documentation xml:lang= "cn" >
Schema for middleware
Copyright 2001 Zhou Jingtao. All rights reserved.
</documentation>
</annotation>
<element name= "Combinedatabase" >
<complexType>
<sequence>
<element name= "Combinglobeschema" >
<complexType>
<sequence>
<element ref= "A:h-database"/>
<element ref= "B:h-database"/>
</sequence>
</complexType>
<keyref name= "sqlservertest_t_c_psign" refer= "B:GZ_JGXX_ID_PK" >
<selector xpath= "A:h-database/a:sqlservertest/a:t_c/a:count"/>
<field xpath= "@value"/>
</keyref>
</element>
</sequence>
</complexType>
</element>
</schema>



Back to top of page




Support for the API

When mastering and using XML Technology, DOM and sax are probably the most common XML APIs used by technicians. Dom and sax are only valid for XML instance documents, although they can be implemented with XML DTDs to validate XML documents, but Dom and sax do not provide the ability to parse XML DTD document content, that is, we cannot get the elements in the DTD by DOM or sax, A description of the declaration and constraint of the property. However, in the process of data exchange based on XML+DTD, some applications need to get the content and structure of the DTD itself to facilitate the processing of the data in the XML document, such as the problem of how to map the XML DTD to the relational schema in the process of storing XML documents using the relational database. In order to interpret the XML DTD, the researchers must develop new interfaces or special tools for XML DTD, which has caused great inconvenience.

Since the XML schema itself is an XML document, we can easily parse the XML schema by using XML APIs such as DOM, sax, or Jdom, which realizes the consistency of the XML document with its description pattern and facilitates the transmission and exchange of the data.


Back to top of page





More explicit restrictions on the occurrence of attributes, as well as default values and enumerations

The XML DTD Specifies whether the attribute appears with the keyword #implied, #FIXED和 #required, and supports the definition of the property's default value. The XML schema provides more explicit markup for a clear and understandable representation. The XML Schema discards the #implied of the XML DTD and no longer supports the implied state of the attribute, but requires a definite state and a prohibited to represent the disabling of the attribute. The expression for the default value is more intuitive and is given directly by default.

SOURCE 9 XML DTD, XML Schema restrictions on the occurrence of attributes <! Attlist testdtd testAr1 CDATA #IMPLIED >
<! Attlist testdtd testAr2 CDATA #REQUIRED >
<! Attlist testdtd testAr3 CDATA #FIXED "3" >
<! Attlist testdtd testAr4 CDATA "3" >
<xsd:attribute name= "TestAr1" type= "xsd:string" use= "optional" default= "3"/>
<xsd:attribute name= "TESTAR2" type= "xsd:string" use= "prohibited"/>
<xsd:attribute name= "TestAr3" type= "xsd:string" use= "Required" fixed= "3"/>



For an enumeration of XML schema improvements, see the article "XML Problem #7 a literature-type definition (DTD) comparison" in Resources, "Document 9".


Back to top of page





Comments

The XML DTD and XML schema both support <!-annotation content-such as annotation methods, but the XML schema provides a more flexible and useful way of commenting: Documentation and appinfo. They provide comments for readers and apps.

Source code XML schema annotations <xsd:annotation>
<xsd:documentation> comments for users and apps </xsd:documentation>
<xsd:appinfo>
This is a C language code.
#include stdio.h
void Main ()
{
int i,j;
i = 1;
j=i+1;
}
</xsd:appinfo>



Back to top of page




Support for the database

How to represent relational data as XML data and how to implement XML data storage, query and update based on relational database has become a research hotspot. deutsh,florescu[5],kossman[5],shanmugasundaram[6,7] and D W Lee[8] and others have done more in-depth research on the problem of the mutual transformation between XML and relational data. However, since XML schema becomes the formal recommendation, and the XML DTD syntax is relatively simple, most of the research and applications are now based on XML DTDs. However, XML DTDs have obvious shortcomings in the description of relational data, such as the fact that XML DTD-constrained data types cannot complete one by one mappings of relational data types at all, nor can they implement most of the data rule descriptions. XML Schema provides more built-in data types and supports user extensions of data types, which basically satisfies the need for relational patterns in data descriptions, which can be a major reason why XML Schemas are better suited to describe relational data than XML DTDs.


Back to top of page





A conclusion

By comparison, it can be seen that XML schema is more expressive than XML DTD, and can better meet the needs of different domain applications. So, is it possible to say that XML DTDs are quickly replaced by XML schemas and eventually disappear. From the author's point of view, although XML Schema has a tendency to replace XML DTD in most fields of application, XML DTD still has its scope of application and cannot be completely replaced by XML Schema:
XML DTDs are published as part of the XML standard, and the consortium does not seem to be ready to remove them from the XML standard, and support for XML DTDs will continue.
Most of the XML-oriented applications are now supported by XML DTDs, and XML DTD tools are relatively mature, and in general, these applications and tools do not choose to upgrade them in the same way as XML schemas instead of XML DTDs, and more choices should be supported. Of course, for applications where data exchange or descriptive capabilities are high and XML DTDs are not sufficient for functional requirements, XML Schema substitution for XML DTDs has become an inevitable trend.
At present, most of the algorithm research related to XML Schema is based on XML DTD, and as a continuation of the research, the research results of XML DTD will not be discarded, but the research on XML schema becomes a new hotspot.
In some relatively simple processing environments, the XML DTD still occupies its place.
As with the development of other technologies, the role of XML DTDs will fade as new standards emerge, but just as the hierarchical database is still being used today, it seems premature to make a conclusion about whether the XML schema will completely replace the XML DTD.

Therefore, as a strong standard, XML Schema as the mainstream of XML Schema has become a trend, but as one of the simplest XML schema, XML DTD will also play its due role for a period of time.


Note:
DTD and Schema
Some people ask that DTDs and schemas are a constraint on XML documents, so why not choose one and schema. Because DTD security is too low, that is, it has insufficient constraint definitions to make more granular semantic restrictions on XML instance documents. In fact, the careful person will find that in the DTD, there is only one data type, that is pcdata (used in the element) and CDATA (used in the attribute), in which to write the date also line, the number is OK, the character is no problem. The schema is designed for the shortcomings of these DTDs, the schema is completely using XML as a description means, has a strong ability to describe, expand capacity and processing maintenance capabilities. Let's look at a simple example:
Hello.xml
-------------------
<?xml version= "1.0"?>
<greeting>hello world!! </greeting>


Description
A root element: greeting; This element has no attributes, no child elements, and the content is a string.

Hello.xsd
----------
<?xml version= "1.0"?>
<xsd:schema xmlns:xsd= "Http://www.w3.org/2001/XMLSchema" >
<xsd:element name= "Greeting" type= "xsd:string"/>
</xsd:schema>
Description
The XML Schema document suffix is. xsd, fully conforms to the XML syntax, the root element is schema, namespace xmlns:xsd= "Http://www.w3.org/2001/XMLSchema, with elements <element> Defines elements in an instance document, such as greeting. Xsd:string is defined data type, which has a lot of data types, such as: Int,double,datetime,boolean,long,integer,float, and so on, in short, Java and other languages in the data type it has, but to " xsd: "begins.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.