XML Dom Beginner's Guide (including validating DTDs)

Source: Internet
Author: User
Tags object model xml parser advantage

XML Dom Beginner's Guide

Author: Dong Shentao

General: This article focuses on how to access and maintain XML documents using XMLDOM, which is implemented by the Microsoft Parser xmldom.

Directory:

Introduction

what is DOM?

How to use the DOM

How to load a document

Handling Errors

How to get information from an XML document

How to traverse an XML document

what to do next

Brief introduction:

As a developer of VB you may be exposed to extensible Markup Language (XML) documents. You now want to work with the XML document and integrate it into your scenario. You can programmatically parse it and treat it as a normal text document, but it is inefficient and does not take advantage of the power of XML: It can represent data in a structured way.

The best way to get information from an XML file is to use an XML parser. The parser, simply speaking, is a software that makes the data in an XML file easy to use. As a developer of VB, you might want a parser that supports the document Object Model (DOM). The DOM describes a series of standard methods for accessing XML and HTML documents, which should be implemented by the parser. A parser that supports DOM should turn the data in XML into a series of objects so that you can program these objects two times. In this article, you will learn how to use the DOM structure implemented by the Microsoft Parser (Msxml.dll) to access and maintain XML documents.

As we move forward, let's look at an XML code to see how this parser makes life easier for us. The following document, called Cds.xml, is used to represent the individual items of a record, and each item contains information such as the lead singer, title, and audio track.

<? XML version= "1.0"?>

<! DOCTYPE Compactdiscs SYSTEM "CDS.DTD" >

<compactdiscs>

<compactdisc>

<artist type= "Individual" >frank sinatra</artist>

<title numberoftracks= "4" >in the Wee Small hours</title>

<tracks>

<track>in the Wee Small hours</track>

<track>mood indigo</track>

<track>glad to be unhappy</track>

<track>i Get along without you Very well</track>

</tracks>

<price>$12.99</price>

</compactdisc>

<compactdisc>

<artist type= "Band" >the offspring</artist>

<title numberoftracks= "5" >Americana</title>

<tracks>

<track>Welcome</track>

<track>have you ever</track>

<track>staring at the sun</track>

<track>pretty Fly (for A white guy) </track>

</tracks>

<price>$12.99</price>

</compactdisc>

</compactdiscs>

The second line of the above document refers to an external DTD (document type description), which describes the hierarchy of a particular type of XML and what can be included. The XML parser uses the DTD to verify the correctness of the XML document. A DTD is just one way you can use the parser to verify that an XML document is legitimate, and another increasingly popular approach is XML Schemas, which uses XML to describe Schemas rather than DTDs. Unlike a DTD, the schema is described in XML, which uses its own "interesting" syntax.

The following document is the CDS.DTD used by Cds.xml.

<! ELEMENT Compactdiscs (compactdisc*) >

<! ELEMENT Compactdisc (artist, title, tracks, price) >

<! ENTITY% Type "Individual | Band ">

<! ELEMENT artist (#PCDATA) >

<! attlist artist Type (%Type;) #REQUIRED >

<! ELEMENT title (#PCDATA) >

<! Attlist title Numberoftracks CDATA #REQUIRED >

<! ELEMENT tracks (track*) >

<! ELEMENT Price (#PCDATA) >

<! ELEMENT Track (#PCDATA) >

This article does not discuss the DTD and XML schemas too deeply, xml-data-based XML Schema reference has been submitted to the consortium.

What the DOM is:

 

The XML DOM structure implements the content of an XML document as an object model. The DOM Level 1 description defines how the DOM structure implements properties, methods, events, and so on. Microsoft's DOM implementations fully support the standard, and there are many new features that make it easier for programs to access XML files.

How to use the DOM

To use the DOM, you need to create an instance of the XML parser. Microsoft has created a series of standard COM interfaces in Msxml.dll to make it possible to create instances. Msxml.dll contains type libraries and applicable code that you can use to work with XML files. If you use scriptable clients, such as VBScript and IE, you can use the CreateObject method to get an instance of the parser.

Set Objparser = CreateObject ("Microsoft.XMLDOM")

If you are using ASP (Active Server Page), you use the Server.CreateObject method.

Set Objparser = Server.CreateObject ("Microsoft.XMLDOM")

If you are using VB, you can create a reference to the MSXML type library so that you can access the DOM. To use MSXML in VB6.0, do this: Open the project references item from the COM object to select Microsoft XML, version 2.0, if you cannot find this item, you need to get it. You can create an instance of a parser.

Dim XDoc as MSXML. DOMDocument Set xDoc = New MSXML. DOMDocument

You can get Msxml.dll in two ways. You can install the Ie5.0,msxml parser which is part of the integration. Or you can download it on the relevant website

Once you have a reference to the type library, you can perform the parsing, transfer the document, and in short, you can work with the XML document.

You may have some confusion about what I should do. If you open the MSXML library and look at the object model in the Visual Basic 6.0 Object Viewer, you'll find it rich. This article will show you how to access XML documents using the DOMDocument class and the IXMLDOMNode interface.

 

How to load a document:

To tune into an XML document, you must first create an instance of DOMDocument.

Dim XDoc as MSXML. DOMDocument

Set XDoc = New MSXML. DOMDocument

When you get a valid reference, you can use the Load method to transfer a document into it. The parser can be transferred from the local hard drive or raised from the network via UNC and URL.

Dial in from the hard drive as follows:

If xdoc.load ("c:/my documents/cds.xml") Then

' The document was successfully transferred into

' Do what we like to do

Else

' Document failed to import

End If

When you are done working, you need to release this reference, MSXML does not implement the Close method directly, you'd better set it to nothing to close it.

Set XDoc = Nothing

When you invoke a document that is asynchronous by default, you can change it by modifying the Async property. If you want to manipulate the document, you must first check the ReadyState property to confirm the status of the document, and it will return five possible results.

State

Property value

Uninitialized: The incoming document does not start

0

Incoming: The Load method is executing

1

Incoming completion: The Load method has been completed

2

Interactive phase: Dom can be read-only, data part parsing

3

Complete: The data is fully parsed and can be read/write. 4

The MSXML parser implements a number of useful methods that you can use to track the state of the incoming process when you are entering a large document. These methods are also useful for asynchronous incoming documents from the Internet.

To open a document on the Internet, you need to provide an absolute URL, and you must add the http://prefix. Here is an example.

Xdoc.async = False

If xdoc.load ("Http://www.develop.com/hp/brianr/cds.xml") Then

' The document was successfully transferred into

' Do what we like to do

Else

' Document failed to import

End If

Set the Async property to False so that the parser does not give control to your code until the document is entered. If you save Async as true, you must check the ReadyState property when accessing the document or use the DOMDocument event to prompt your code when the document is accessible.

Handling Errors:

Your document may fail for a variety of reasons, most often because the document name provided to the parser is incorrect, and another common reason is that the XML document is not legal.

The default parser verifies that your document conforms to a DTD or schema, and you can not allow the parser to perform validation and validateonparse the DOMDocument property to False before executing the Load method.

Dim XDoc as MSXML. DOMDocument

Set XDoc = New MSXML. DOMDocument

Xdoc.validateonparse = False

If xdoc.load ("c:/my documents/cds.xml") Then

' The document was successfully transferred into

' Do what we like to do

Else

' Document failed to import

End If

It's not a good idea to turn off the verification feature beforehand, it can cause a lot of problems, at least it will be the data you provide in the wrong format to your users.

You can get information about the type of error from the parser by accessing the ParseError object. Create a reference to the Ixmldomparseerror interface, and then point it to the ParseError object of the document itself. The Ixmldomparseerror interface implements seven properties to get you the wrong reason.

The following example shows an information box that lists all the error messages in the ParseError object.

Dim XDoc as MSXML. DOMDocument

Set XDoc = New MSXML. DOMDocument

If xdoc.load ("c:/my documents/cds.xml") Then

' The document was successfully transferred into

' Do what we like to do

Else

' Document failed to import

Dim Strerrtext as String

Dim XPE as MSXML. Ixmldomparseerror

' Get ParseError Object

Set XPE = Xdoc.parseerror

With XPE

Strerrtext = "Your XML Document cannot be transferred into" & _

"The reason is." & VbCrLf & _

"Bug #:" &. ErrorCode & ":" & Xpe.reason & _

"Line #:" &. Line & VbCrLf & _

"Line Position:" & Linepos & VbCrLf & _

"Position in File:" &. Filepos & VbCrLf & _

"Source Text:" &. Srctext & VbCrLf & _

"Document URL:" &. URL

End with

 

MsgBox Strerrtext, vbexclamation

End If

 

Set XPE = Nothing

You can use the ParseError object to report an error message to your users, or write it to your log, and you can try to solve the problem yourself.

 

How to get information from an XML document:

Once you have successfully transferred the document, the next step is how to get information from it. When you manipulate a document, you often use the IXMLDOMNode interface, which you use to read/write separate node elements. Before using it, you must first understand the 13 types of node elements that MSXML supports, and the following are the most common ones.

 

DOM node type

Example

Node_element

<artist type= "Band" >the offspring</artist>

Node_attribute

Type= "Band" >the Offspring

Node_text

The Offspring

Node_processing_instruction

<?xml version= "1.0"?>

Node_document_type

<! DOCTYPE Compactdiscs SYSTEM "CDS.DTD" >

 

You access the type of the node through the two properties implemented by the IXMLDOMNode interface. The NodeType property lists all Domnodetype items (some items are listed on the table above). Alternatively, you can use the nodeTypeString property to get a string representing the node type.

Once you have a DOM reference to the document, you can traverse the hierarchy of nodes. With a document reference, you can access the ChildNodes property, which gives a top-down directory containing all the nodes. The ChildNodes property implements the IXMLDOMNodeList, which supports the For/each structure of Visual Basic, so you can enumerate all the nodes in the childnodes. In addition, the ChildNodes property implements the Level property, which returns the number of all child nodes.

Not just the Document object has the ChildNodes property, each node has a ChildNodes property. Because of this, the ChildNodes property and IXMLDOMNode's HasChildNodes property mates make it very convenient for you to traverse documents, access elements, attributes, and values.

It is worth mentioning that there is a parent-child relationship between element and element values. For example, in the CDs XML document, Element <title> represents the name of the song, to know the value of <title>, you need to access the properties of the node Node_text. If you find a node with data that you are interested in, you can access its properties, or you can access its parent node through the ParentNode property.

How to traverse an XML document:

The node where you can traverse the document is to take advantage of the document object, because XML itself is a hierarchy, so it's easy to write recursive code to access the entire document.

The Loaddocument program opens the XML document and then calls another program, Displaynode, which is used to display the structure of the document. Loaddocument passes a reference to the ChildNodes property of the XML document being opened as its argument, passing an integer to indicate the level at which the display begins. The code takes advantage of parameters to format the text in the Visual Basic Document Structure Display window.

The function of the Displaynode property iterates through the document looking for the desired Node_text node type string, and once the code finds a Node_text node, it takes advantage of the NodeValue property to get the corresponding text string. In addition, the ParentNode property of the current node points to a node of an element type. The element node implements a NodeName property,

 

The NodeName and NodeValue properties are displayed.

If a node also has child nodes, by detecting the HasChildNodes property acknowledgment, Displaynode will call itself to know all of the traversing documents.

 

The following Displaynode program writes related information to the Visual Basic window using Debug.Print.

Public Sub loaddocument ()

Dim XDoc as MSXML. DOMDocument

Set XDoc = New MSXML. DOMDocument

Xdoc.validateonparse = False

If xdoc.load ("c:/my documents/sample.xml") Then

' The document loaded successfully.

' Now do something intersting.

Displaynode xdoc.childnodes, 0

Else

' The document failed to load.

' See the previous listing for error information.

End If

End Sub

Public Sub Displaynode (ByRef Nodes as MSXML. IXMLDOMNodeList, _

ByVal Indent as Integer)

Dim XNode as MSXML. IXMLDOMNode

Indent = Indent + 2

 

For each xnode in Nodes

If Xnode.nodetype = Node_text Then

Debug.Print space$ (Indent) & XNode.parentNode.nodeName & _

":" & Xnode.nodevalue

End If

If Xnode.haschildnodes Then

Displaynode Xnode.childnodes, Indent

End If

Next XNode

End Sub

Displaynode Use the HasChildNodes property to confirm if you want to call yourself again, you can also use the Level property of the node, if it is greater than 0, there are child nodes.

What to do next:

This is just the beginning and you can now get a deeper understanding of XML and Microsoft Parsers (Msxml.dll). You can make some interesting examples such as changing the value of a node, searching for documents, creating your own documents, and so on. Visit the MSDN Online XML Developer Center to get more examples, articles and downloads.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.