Understanding XML to achieve universal data access

Source: Internet
Author: User
Tags empty end expression generator net object model string version
xml| Access | data

Learn how Extensible Markup Language (XML) can help us achieve universal data access. XML is a plain text meta language based on Unicode, a language used to define markup languages. It is not dependent on any programming language, operating system, or software vendor. XML provides access to a variety of data processing, building, transformation, and query technologies. (This article contains links to English-language sites.) )

Brief introduction
The Extensible Markup Language (XML) originally conceived is used to define the format of new documents for the WEB. XML is derived from standard Universal Markup Language (SGML) and can be considered a meta language, a language that defines markup languages. Both SGML and XML are text-based formats that provide a mechanism for describing the structure of a document using tags (literals surrounded by ' < ' and ' > '). WEB developers may notice that XML has something in common with HTML because both are derived from SGML.

With the increasing popularity of XML, it is now widely accepted that XML not only helps to describe the format of new documents for the WEB, but also applies to describing structured data. The so-called structured data includes information that is typically contained in spreadsheets, program profiles, and network protocols.

XML is superior to earlier data formats because XML can easily represent tabular data, such as relational data or spreadsheets in a database, and semi-structured data such as WEB pages or business documents. Some formats that are already in use and widely applied, such as comma-separated values [CSV] files, can effectively process tabular data, but do not handle semi-structured data well, and RTF can only be used exclusively for semi-structured text documents. Therefore, XML is widely accepted as the common language of information exchange.

Ubiquitous XML
In addition to being able to represent structured and semi-structured data, XML has many other features that make it a widely used data representation format. XML is extensible, platform-independent, and supports internationalization due to its full adoption of Unicode. XML is a text-based format, so users can read and edit XML documents using standard text-editing tools as needed.

The extensibility of XML is manifested in several aspects. First, unlike HTML, XML does not have a fixed vocabulary. Instead, users can use XML to define a particular application or industry-specific glossary. Second, applications that process or use XML format are more "resistant" to changes in XML structures than applications that use other formats, as long as those changes are additional. For example, if an application is primarily handling a
Customer-id
property of
<Customer>
Element, if the
<Customer>
element to add a
Last-purchase-date
property, the application is usually not corrupted. Such adaptability is rarely seen in other data formats, which is also a significant advantage in using XML.

XML is not dependent on any programming language, operating system, or software vendor. In fact, XML can be easily generated or used using a variety of programming languages. Platform independence enables XML to facilitate interoperability between different programming platforms and operating systems.

Many people have realized that publishing data to XML has many advantages, thus driving a large number of applications of XML data sources. People are or have converted information sources such as business documents, databases, and business communications to using XML as the presentation format. Microsoft products such as Microsoft office®, Microsoft SQL Server™, and the Microsoft. NET Framework enable end users and developers to make documents, network information, and other data XML or as XM L use.

XML 1.0 Syntax
As mentioned earlier, the 1.0 recommendation of the consortium XML describes a text-based format that uses HTML-like syntax to describe structured and semi-structured data.

Comparison of XML and HTML
Both HTML and XML documents consist of elements, and each element contains a "start tag" (for example,
<order>
), a "closing tag" (for example,
</order>
, and the information between two tags, which refers to the content of the element. An element can be annotated with a property that contains metadata about the element and its contents.

However, there is a significant difference between HTML and XML, where XML is case-sensitive and HTML is case-insensitive. In other words, in XML, the start tag
<Table>
And
<table>
are different, and in HTML they are the same. Another difference between HTML and XML is that XML introduces the concept of "good structure." The "good structure" rule of XML enforces some rules to eliminate some of the inherent ambiguity in handling markup languages such as HTML, if it enforces that all attributes must be enclosed in parentheses, all elements must have a pair of start and end tags, or explicitly indicate that they are empty elements. For a brief description of the good structure, see the D.2 section of the XML FAQ.

The most significant difference between HTML and XML is that HTML has predefined elements and attributes, and the behavior of elements and attributes is fully specified, and XML is not. Instead, document authors can create their own XML vocabularies that are specific to their application or business needs. Existing XML vocabularies are available for many industries and applications, from financial information Reporting (XBRL), financial Services (FpML) to Web documents (XHTML), network Protocol (SOAP). Because you don't have to focus on predefined elements and attributes that specify how to render or display XML documents, document authors can focus on semantic information related to their specific problem areas when they create documents. The XML glossary brings the separation of content and form, so that information and content can be reused on a larger scale.

Parsing XML Documents
The following example is an XML document that represents a customer order for a video store. Note that this document represents both rigorous structured data (used to describe disc information) and semi-structured data that describes special notes and comments about specific customers, and is very simple to represent.

<?xml version= "1.0" encoding= "Iso-8859-1"?>
<?xml-stylesheet href= "Orders.xsl"?>

<order id= "ord123456" >
<customer id= "cust0921" >
  <first-name >Dare</first-name>
  <last-name>Obasanjo</last-name>
  <address>
   <street>one Microsoft way</street>
   <city>Redmond</city>
   <state>WA</state>
   <zip>98052</zip>
  </address >
</customer>
<items>
  <compact-disc>
   <price>16.95< /price>
   <artist>Nelly</artist>
   <title>nellyville</title >
  </compact-disc>
  <compact-disc>
   <price>17.55</price >
    <artist>baby d</artist>
    <title>lil Chopper Toy </title>
  </compact-disc>
</items>

<!--to walk a few miles to find customers-->
<special-instructions xmlns:html= "http://www.w3.org/1999/ Xhtml/' >
     leave Packa GE at one of the following locations listed in order of
   which should is attempted a
  <HT Ml:ol>
             < Html:b>note    to pick up the package.
  </special-instructions>
</order>

The beginning of the document is an optional XML declaration that specifies the version of the XML that is used, followed by the character encoding used by the document. Next is the XML stylesheet processing instruction, which is used to bind the style sheet. The formatting instructions for the XML document contained in the stylesheet can render the XML document in a more vivid way in a user application, such as a Web browser. Processing instructions are typically used to embed application-specific information in an XML document. For example, most applications that work with the above documents ignore XML stylesheet processing directives, while applications that display XML documents, such as Web browsers, use the information in the processing instructions to determine where the style sheet that contains the special instructions for displaying the document is positioned.


Unicode + angle Bracket = Interop
The XML 1.0 syntax is text-based and can be easily parsed, making XML the preferred format for data interchange when it needs to interoperate across platforms. XML parsers can be used in a variety of commonly used operating systems, so disparate parts of different platforms can be easily standardized when they need to share information, using XML as an interchange format.

Unicode based XML is also useful for sharing information on a global network, such as on the Web.

Information set (Infoset) and XML series technology
Although using XML as a data representation format can offer a great advantage: platform interoperability and scalability through the use of text-based XML syntax, this is just one of the benefits of XML for application developers. Another major benefit of using XML is that users have access to a variety of data processing, construction, transformation, and query technologies.

XML Information Set
The information Set recommendation of the XML document describes the abstract representation of XML documents. The XML information set is used primarily as a set of definitions used by various XML technologies to formally describe the part of the XML document that requires technical processing. Several of the common-sense XML technologies are described in terms of XML information sets, including SOAP 1.2, XML schemas, and XQuery.

An XML information set is a tree-level representation of an XML document. The information set of an XML document contains a number of information items. These information items are abstract representations of XML document components, including information items that represent documents, the elements of a document, attributes, processing instructions, comments, characters, representations, namespaces, entities that are not parsed, an entity reference that is not expanded, and a document type declaration. An XML information set is an officially recommended mechanism by which to define important information that should be valued in an XML document. For example, an information set does not distinguish between two forms of empty elements. Therefore, according to the XML information set, the following two representation methods

<test></test>
<test/>

is the same. Similarly, the type of quotation mark used by the attribute is also unimportant, so based on the XML information set, the element

<test attr= ' value '/>
<test attr= "Value"/>

is the same. The XML 1.0 syntax content list, which is considered unimportant by the XML information set, is provided in appendix D recommended by the information set of XML.

The XML information set recommendation introduces the concept of an integrated information set (synthetic infosets). An integrated information set is a set of information that is created in other ways in addition to an XML document in the form of parsing text. The integrated information set provides the basis for processing non-XML data using XML technology, provided that such data can be mapped to an XML information set. An example of processing a comprehensive information set is ObjectXPathNavigator, which allows users to query objects in the. NET Framework using Xpath, or to transform objects using XSLT.
Schema language
The XML Schema language is used to describe the structure and content of XML documents. For example, you can use a schema to specify that a document contains one or more
Compact-disc
Element, and each
Compact-disc
element contains child elements
Price

Title
And
Artist
。 In the process of exchanging documents, the XML schema can describe the conventions between the XML generator and the use program, because it describes the composition of valid XML messages between the two. Although there is a large number of architectural languages for XML, from DTDs to XDR, the most authoritative of the current XML Schema definition language is the common name XSD.

XSD is unique in the XML Schema language because it first attempts to extend the role of the XML schema so that it is no longer limited to the conventions used only to describe the two entity exchange documents. XSD introduces the concept of the post schema validation information set (Post Schema Validation INFOSET,PSVI). A complete XSD processor accepts an XML information set as input and converts it to a post-schema validation information set (PSVI) at validation time. PSVI is the initial input XML information set, with new information items added and new attributes added to existing information items. The XML schema recommendation for the consortium lists the components of the information set for post schema validation.

Type annotation is a very important class of PSVI components. Elements and attributes require strict type definitions and have data type information associated with them. XML with a strict type definition can be used to map to objects using techniques such as the XmlSerializer of the. NET Framework, which can be mapped to relational tables using the DataSet technology of the SQLXML and. NET Framework, or XML query languages with strict typing mechanisms, such as XPath 2.0 and XQuery, are processed.

The following example is a schema fragment that describes the example document in the parsing section of the XML document
Items
Elements.

<xs:schema xmlns:xs= "Http://www.w3.org/2001/XMLSchema" >

<xs:element name= "Items" >
<xs:complexType>
<xs:sequence>
<xs:element ref= "Compact-disc" minoccurs= "0" maxoccurs= "unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element name= "Compact-disc" >
<xs:complexType>
<xs:sequence>
<xs:element name= "Price" type= "Xs:decimal"/>
<xs:element name= "artist" type= "Xs:string"/>
<xs:element name= "title" Type= "Xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>

</xs:schema>

Based on the tree model API
The tree model API renders an XML document as a tree of nodes, which can usually be loaded into memory immediately. The most common XML tree model API is the Document Object Model (DOM) of the consortium. The DOM supports programmatically reading, processing, and modifying XML documents.

The following example uses the XmlDocument class in the. NET Framework to obtain
Items
Element in the first
Compact-disc
Artist's name and title.

Using System;
Using System.Xml;

public class test{

public static void Main (string[] args) {

XmlDocument doc = new XmlDocument ();
Doc. Load ("Test.xml");

XmlElement FIRSTCD = (XmlElement) doc. Documentelement.firstchild;
XmlElement artist =
(XmlElement) Firstcd.getelementsbytagname ("artist") [0];
XmlElement title =
(XmlElement) Firstcd.getelementsbytagname ("title") [0]

Console.WriteLine ("Artist={0}, Title={1}", Artist. InnerText, title. InnerText);
}
}

Cursor-based APIs
The XML cursor API is like a lens moving through an XML document, aligning all aspects of the document being directed. The XPathNavigator class in the. NET Framework is an XML cursor API. The XML cursor API has the advantage of not having to load the entire document into memory, as compared to the tree model API, which makes it easy to optimize the part of the XML generator that is required to generate the document.

The following example uses the XPathNavigator class in the. NET Framework to obtain
Items
Element in the first
Compact-disc
Artist's name and title.

Using System;
Using System.Xml;
Using System.Xml.XPath;

public class test{

public static void Main (string[] args) {

XmlDocument doc = new XmlDocument ();
Doc. Load ("Test.xml");

XPathNavigator nav = doc. CreateNavigator ();

Nav. MoveToFirstChild (); Move from the root node to the document element (items)
Nav. MoveToFirstChild (); Move from the items element to the first COMPACT-DISC element

Move from Compact-disc element to artist element
Nav. MoveToFirstChild ();
Nav. MoveToNext ();
string artist = Nav.value;

Move from artist element to title element
Nav. MoveToNext ();
string title = Nav.value;

Console.WriteLine ("Artist={0}, Title={1}", Artist, Title);
}
}

Streaming API
When using a streaming API that processes XML, users can work with XML documents by simply storing the context of the current node to be processed in memory. Such APIs can handle large XML files without consuming a large amount of content space. There are two main types of streaming APIs for XML processing: A push-based XML parser and a pull-based XML parser.

A propulsion parser, such as SAX, works by moving through an XML data stream and "pushes" the event to a registered event handler (callback method) when it encounters an XML node. A pull-based parser, such as the XmlReader class in the. NET Framework, is used as a forward-only cursor in an XML data stream.

The following example uses the XmlReader class in the. NET Framework to obtain
Items
Element in the first
Compact-disc
Artist's name and title.

Using System;
Using System.Xml;

public class test{

public static void Main (string[] args) {

string artist = null, title = NULL;
XmlTextReader reader = new XmlTextReader ("Test.xml");

Reader. MoveToContent (); Move from root node to document element (items)

/* Keep reading until you get the first <artist> element * *
while (reader. Read ()) {

if (reader. NodeType = = xmlnodetype.element) && reader. Name.equals ("artist")) {

Artist = reader. Readelementstring ();
title = Reader. Readelementstring ();
Break
}
}
Console.WriteLine ("Artist={0}, Title={1}", Artist, Title);
}
}

XML Query
In some cases, using an API to extract information from an XML document can be cumbersome, either because the conditions for finding the data are too simplistic, or because the API fails to render specific content for the XML document for a particular query. XML query languages, such as XPath 1.0 and upcoming XQuery, provide a rich mechanism for extracting information from the XML information set.

The following example shows how to use XPath to get
Items
Element in the first
Compact-disc
Artist's name and title.

Using System;
Using System.Xml.XPath;

public class test{

public static void Main (string[] args) {

XPathDocument doc = new XPathDocument ("Test.xml");
XPathNavigator nav = doc. CreateNavigator ();

XPathNodeIterator iterator = nav. Select ("/items/compact-disc[1]/artist | /items/compact-disc[1]/title ");

Iterator. MoveNext ();
Console.WriteLine ("Artist={0}", iterator. Current);

Iterator. MoveNext ();
Console.WriteLine ("Title={0}", iterator. Current);

}
}

XML transformations
Users often need to convert XML documents from one glossary to another. This is sometimes done to render the document in a format that is easy to print or in a Web browser, and sometimes it may be necessary to convert a document received from an external entity to a more familiar format.

XSLT is an excellent XML conversion language. The transformations described in XSLT illustrate the rules for converting a source tree to a result tree. Transformations are done through association patterns and templates. A pattern is an XPath expression that can be treated as a regular expression that matches the part of the XML source tree, relative to the matching part of the string. Pattern matches the elements in the source tree. After a successful match, the template becomes an example of the section that creates the result tree. When you build a result tree, you can filter and reorder elements in the source tree, and you can add arbitrary structures.

The following XSLT style sheet will
Items
element into an XHTML Web page that contains the disc information table.

<xsl:stylesheet xmlns:xsl= "Http://www.w3.org/1999/XSL/Transform" version= "1.0" xmlns= "http://www.w3.org/1999/xhtml" >

<xsl:output method= "xml" indent= "Yes"
Doctype-system= "Http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
doctype-public= "-//W3C//DTD XHTML 1.0 transitional//en"/>


<xsl:template match= "/" >

<title>order information-ord123456</title>
<body>
<table border= "1" >
<tr><th>Artist</th><th>Title</th><th>Price</th></tr>

<xsl:for-each select= "Items/compact-disc" >
<tr>
<td><xsl:value-of xmlns= "" select= "artist"/></td>
<td><xsl:value-of xmlns= "" select= "title"/></td>
<td><xsl:value-of xmlns= "" select= "Price"/></td>
</tr>
</xsl:for-each>

</table>
</body>

</xsl:template>

</xsl:stylesheet>

The XHTML document is generated by the style sheet shown below:

<! DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 transitional//en" "Http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">
<title>order information-ord123456</title>
<body>
<table border= "1" >
<tr>
<th>Artist</th>
<th>Title</th>
<th>Price</th>
</tr>
<tr>
<td>Nelly</td>
<td>Nellyville</td>
<td>16.95</td>
</tr>
<tr>
<td>baby d</td>
<td>lil Chopper toy</td>
<td>17.55</td>
</tr>
</table>
</body>

It is shown in the Web browser as follows.

Artist Title Price
Nelly Nellyville 16.95
Baby D Lil Chopper Toy 17.55

Summary
XML is not only a text format for describing documents, but also a mechanism for describing structured and semi-structured data, providing a range of techniques needed to handle this type of data. Powerful extraction capabilities such as the XML information set will help us use XML technology to better handle non text data, such as file systems, Windows® registries, relational databases, and even programming language objects. XML gives us another step towards universal data access.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.