Use VC ++ and MSXML to parse XML documents

Source: Internet
Author: User
Tags abstract definition xml parser
I. Document Object Model (DOM)
Dom is short for Document Object Model. It is an application interface (API) for application development and programming of XML documents ). As a cross-platform, language-independent interface specification published by W3C, Dom provides standard program interfaces in different environments and applications, which can be implemented in any language.
Dom uses Object Models and a series of interfaces to describe the content and structure of XML documents, that is, using objects to model documents. The basic functions of this object model include:
● Interface for describing document representation and operations;
● Interface attributes and methods;
● Relationship between interfaces and interoperability.
Dom can parse structured XML documents. All the instructions, elements, entities, attributes, and other content in the documents are represented by object models. The entire document is considered as a structured information tree, instead of a simple text stream, the generated object model is the node of the tree, and the object contains both methods and attributes. Therefore, all operations on the document are performed on the object tree. In Dom, everything in the tree is an object, whether it is a root node or an object attribute.
Dom mainly has the following three objects:
● XML Document Object
XML document is both an object and an XML document. It consists of the root element and child element.
● XML Node object
An XML Node object represents nodes in an XML document, such as elements, comments, and namespaces.
● XML node list
The XML document module list represents the set of nodes.
With Dom, developers can dynamically create XML documents, traverse structures, add, modify, and delete content. Its object-oriented feature saves a lot of effort in processing XML parsing-related transactions and is a powerful programming tool that conforms to the code reuse idea. Ii. Four Basic Dom Interfaces(Reference from: http://bbs.xml.org.cn/dispbbs.asp? Boardid = 11 & id = 9220)

In the DOM interface specification, there are four basic interfaces: Document, node, nodelist, and namednodemap. Among the four basic interfaces, the document interface is the entry for document operations, which is inherited from the node interface. The node interface is the parent class of most other interfaces. interfaces such as javaset, element, attribute, text, and comment are inherited from the node interface. The nodelist interface is a collection of nodes that contain all the subnodes of a node. The namednodemap interface is also a set of nodes. Through this interface, you can establish a one-to-one ing between node names and nodes, so that you can directly access specific nodes by using node names. The following describes the four interfaces.
1. Document Interface
The document interface represents the entire XML/html document. Therefore, it is the root of the entire document tree and provides an entry for accessing and operating the data in the document.
Because elements, text nodes, comments, and processing commands cannot exist independently from the context of the document, the document interface provides a method to create other node objects, the Node object created using this method has an ownerdocument attribute to indicate who created the current node and the connection between the node and the document.
In the DOM tree, the document node is the root node in the DOM tree, that is, the entry node for XML document operations. With the docuemts node, you can access other nodes in the document, such as instructions, annotations, document types, and root element nodes of the XML document. In addition, in a DOM tree, the document node can contain multiple processing commands and comments as its subnodes, and the document type node and XML document root element node are unique.

The document interface's IDL (Interface Definition Language) Definition and some of the more common attributes and methods can be found in msdn.
2. node interface
Node interfaces play an important role in the entire DOM tree. A large part of Dom interfaces are inherited from node interfaces, such as element, ATTR, and cdatasection interfaces, all are inherited from node. In the DOM tree, the node interface represents a node in the tree.
3. nodelist Interface
The nodelist interface provides an abstract definition of a node set. It does not include how to implement the definition of this node set. Nodelist is used to represent a group of nodes with sequential relationships, such as the subnode sequence of a node. In addition, it also appears in the return values of some methods, such as getnodebyname.
In Dom, The nodelist object is "live". In other words, changes to the document are directly reflected in the relevant nodelist object. For example, if a nodelist object is obtained through DOM and the object contains a set of all subnodes of an element node, when the element node is operated by Dom (add, delete, and modify the child nodes in the node), these changes are automatically reflected in the nodelist object, the Dom application is not required to perform other operations.
Each item in nodelist can be accessed through an index. The index value starts from 0.
4. namednodemap Interface
Objects that implement the namednodemap interface include a set of nodes that can be accessed by name. However, note that namednodemap does not inherit from nodelist, and the nodes in the node set contained in it are unordered. Although these nodes can also be accessed through indexes, it only provides a simple way to enumerate nodes contained in namednodemap, it does not indicate that a sort order is defined for the nodes in namednodemap In the DOM specification.
Namednodemap represents the one-to-one correspondence between a group of nodes and their unique names. This interface is mainly used to represent attribute nodes.
Like nodelist, namednodemap objects in Dom are also "live.

Iii. MSXML  
Theoretically, according to the definition of XML format, we can write an XML syntax analyzer by ourselves, but in fact Microsoft has provided us with an XML syntax parser called MSXML. DLL dynamic link library is actually a COM (Component Object Model) Object Library, which encapsulates all the objects required for XML parsing. Because COM is a language-independent reusable object in binary format, you can use any language (such as VB, Vc, Delphi, c ++ builder or even a scripting language) to parse XML documents in your application.
MSXML. dll includes the following main com interfaces:
1. ixmldomdocument (Document Interface)
The domdocument object is the basis of xml dom. You can use its exposed attributes and methods to browse, query, and modify the content and structure of XML documents. Domdocument indicates the top-level node of the tree. It implements all the basic methods of Dom documents and provides additional member functions to support XSL and XSLT. It creates a Document Object. All other objects can be obtained and created from this document object.
2. ixmldomnode (node interface)
Ixmldomnode is the basic object in the Document Object Model (DOM). Elements, attributes, annotations, process commands, and other document components can all be considered ixmldomnode. In fact, the domdocument object itself is also an ixmldomnode object.
3. ixmldomnodelist
Ixmldomnodelist is actually a collection of node objects. node addition, deletion, and changes can be immediately reflected in the collection. You can use ". loop Structure to traverse all nodes.
4. ixmldomparseerror
The ixmldomparseerror interface is used to return detailed information during parsing, including error numbers, line numbers, character positions, and text descriptions.
You can use the load method of domdocument to load XML documents in specific applications, and use selectnodes of ixmldomnode (multiple query results are available to obtain the linked list storing search results) or selectsinglenode (one of the query results, returns the first node found when there are multiple results) method for query, use createnode and a method to create nodes and append nodes, use the setattribute and getattribute methods of ixmldomelement to set and obtain node attributes.

The image of this topic is as follows:

Iv. programming examples

1. Target document:

<Book id = "bk101">
<Author> lizlex </author>
<Title> XML developer's guide </title>
</Book>

2. steps:

(1) introduce the dynamic link library MSXML. dll in stdafx. H (c: \ windows \ system32 \ msxml4.dll)
# Import <msxml4.dll>

(2) Interface Design:
Put three texts respectively for the input data and use the content of the displayed document. Then, add the associated member variables m_strid, m_strauthor, and m_strtitle. Then, add the OK button:

(3) program snippets that generate documents
Void cxmlparsedlg: onbuttongenerate ()
{
Updatedata ();

Msxml2: ixmldomdocumentptr pdoc;
Msxml2: ixmldomelementptr xmlroot;

// Create a domdocument object
Hresult hR = pdoc. createinstance (_ uuidof (msxml2: domdocument40 ));
If (! Succeeded (HR ))
{
MessageBox ("The domdocument object cannot be created. Check whether the ms xml Parser Runtime library is installed! ");
Return;
}
 
// The root node name is book
// Create an element and add it to the document
Xmlroot = pdoc-> createelement_x_x_x (_ bstr_t) "book ");
 
// Set attributes
Xmlroot-> setattribute ("ID", (const char *) m_strid );
Pdoc-> A (xmlroot );
Msxml2: ixmldomelementptr pnode;

// Add the "author" element
Pnode = pdoc-> createelement_x_x_x (_ bstr_t) "author ");
Pnode-> puttext (_ bstr_t) (const char *) m_strauthor );
Xmlroot-> A (pnode );
 
// Add the "title" element
Pnode = pdoc-> createelement_x_x_x ("title ");
Pnode-> puttext (const char *) m_strtitle );
Xmlroot-> A (pnode );
 
// Save to file
// If it does not exist, it is created and overwritten if it exists.
Pdoc-> Save ("D: \ he. xml ");

}

(4) program snippets for reading XML documents
Void cxmlparsedlg: onbuttonload ()
{
Msxml2: ixmldomdocumentptr pdoc;
Hresult hr;
HR = pdoc. createinstance (_ uuidof (msxml2: domdocument40 ));
If (failed (HR ))
{
MessageBox ("The domdocument object cannot be created. Check whether the ms xml Parser Runtime library is installed! ");
Return;
}
 
// Load the file
Pdoc-> load ("D: \ he. xml ");
 
Msxml2: ixmldomnodeptr pnode;
 
// Search for a node named book in the tree. "//" indicates searching at any Layer
Pnode = pdoc-> selectsinglenode ("// book ");

Msxml2: domnodetype nodetype;
 
// Obtain the node type
Pnode-> get_nodetype (& nodetype );
 
// Node name
Cstring strname;
 
Strname = (char *) pnode-> getnodename ();
 
// Store node attributes in the linked list
Msxml2: ixmldomnamednodemapptr pattrmap = NULL;
Msxml2: ixmldomnodeptr pattritem;
_ Variant_t variantvalue;
Pnode-> get_attributes (& pattrmap );
 
Long Count;
Count = pattrmap-> get_length (& COUNT );
 
Pattrmap-> get_item (0, & pattritem );
// Obtain the node Value
Pattritem-> get_nodetypedvalue (& variantvalue );
M_strid = (char *) (_ bstr_t) variantvalue;
 
Updatedata (false );
 
}

The image of this topic is as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.