Original: http://blog.sina.com.cn/s/blog_48f93b530100ejv9.html
This article is written in the previous article, "MSXML application summary concept paper", mainly summarizes the application of the MSXML DOM interface. DOM (document Object Model) is an API standard library that Microsoft provides for processing XML documents, which we can interpret as a set of interfaces that abstract the structure of an XML document.
The DOM model of MSXML is in line with the system DOM standard, and the DOM API is provided as COM interface in Windows, please consult the relevant information about COM. In a nutshell, COM provides an environment and a set of rules that standardize the design of the interface to the creation, use, and release of objects, enabling COM to support cross-platform and cross-language, and more importantly, adhering to the COM specification to separate the interface and implementation of our code, and unify the stability and expansion of the program framework. For people who use COM interfaces, it's simpler and more intuitive. A very important concept in COM is the RefCount, which is the access count of the interface object, which is controlled by the AddRef and release two interface functions. It's more difficult to use good refcount, so I recommend that you use smart pointers. Using a smart pointer is like using a simple pointer, we don't have to care about the release of the pointer to the memory space at all.
This summary uses the API version is MSXML2.0.
First, let's look at the commonly used interfaces:
Ixmldomdocument:xml document interface, the root node of the DOM tree structure, is the entrance to the document access and operation;
IXMLDOMNode: Node interface, the interface is a common node interface, many types of node interface derived from it, including IXMLDOMDocument;
IXMLDOMNodeList: Node list interface, which represents a set of associated nodes, the node element in this list is accessed by index (starting at 0), and the elements in the interface are dynamic, and are updated as the XML document changes;
Ixmldomnamednodemap: The node Collection interface, also represents a set of associated nodes, but unlike list, the collection is unordered, which is often used to represent the set of properties of a node, and the interface is dynamic;
IXMLDOMElement: An element interface, typically used to represent a node and its attributes;
Ixmldomattribute: Node attribute interface, access and operation of node properties;
Ixmldomtext: Text control interface in the node;
Ixmldomcomment:xml the annotation interface in the document;
Ixmldomparseerror: Error handling interface, including details of errors.
These are the most commonly used DOM interfaces, and there are some interfaces that are not listed here. For the interface, there is a corresponding smart pointer interface, usually with the interface name plus PTR, such as the IXMLDOMDocument Smart pointer interface is IXMLDOMDOCUMENTPTR. Here is an interface inheritance relationship:
In the VS2005 environment for DOM application development, first set up the DOM interface application environment, add the statement in the StdAfx.h file:
#import <msxml3.dll> raw_interfaces_only
If you have a Msxml6.dll file under your system folder, #import语句将成生MSXML库类型信息, You will typically generate MSXML6.TLH and msxml6.tli two files under your Project compilation folder, open and look at these two files contain declarations of COM interface types and functions, as well as some library information. In fact, #import指令使dll库中的类型信息导出为描述的COM接口的c + + class header file. The "raw_interfaces_only" attribute makes the generated file only one MSXML6.TLH, and the interface function only has the HRESULT return type one form, and omit the raw_ prefix; If you remove this property, In addition to declaring an interface function in the MSXML6.TLH file that returns an HRESULT type with the RAW_ prefix, a wrapper interface function with no raw_ prefix is also generated in the MSXML6.TLH. The wrapper interface function that returns the type of the interface pointer is generated in the Msxml6.tli file. Therefore, when we apply the DOM interface, we find that there are two sets of interface functions that do the same function, return the HRESULT type and the interface pointer type respectively, because of the above reasons, this should be the rules of COM interface description in the Windows environment, the more in-depth introduction please refer to this article:/http Www.cnblogs.com/xiaotaoliang/archive/2005/07/20/196257.html. For ease of use, the following example code does not necessarily use the "raw_interfaces_only" attribute after the interface function, it is recommended that you can remove the property, which is only described here.
Another way to load the DOM interface is to add the path to the MSXML library directly in the engineering environment and link the Msxml6.lib file, which is no longer detailed.
After you set up the DOM environment, you also initialize the COM application environment, call CoInitialize in the application thread initialization function, and call CoUninitialize when threads exit.
Now we can use the DOM interface to manipulate the XML file, and I'll summarize it by Operation classification.
I. Loading and saving of XML files
Since the DOM model is oriented to the entire XML file, we need to create an interface that is ixmldomdocument only, and other interfaces are obtained directly or indirectly from it, and the loading and saving functions of the XML file are also implemented in the IXMLDOMDocument interface. The code to create the IXMLDOMDocument interface is as follows:
msxml2::ixmldomdocumentptr Pxmldoc;
HRESULT hr = pxmldoc.createinstance ( __uuidof(MSXML2::D OMDocument60), NULL, clsctx_inproc_server);
if (FAILED (HR))
printf ("Failed to create DOM document interface pointer.\n");
The load XML file code is:
Try
{
pxmldoc->async = variant_false;
pxmldoc->validateonparse = variant_false;
pxmldoc->resolveexternals = variant_false;
if (Pxmldoc->load ("Test.xml")! = VARIANT_TRUE)
{
printf("Fail Reason:%s\n", (LPCSTR) Pxmldoc->parseerror->getreason ());
}
Else
{
//Success
}
}
Catch (_com_error errorobject)
{
printf ("Exception, HRESULT = 0x%08x", Errorobject.error ());
}
In the above code, the first 3 sentences are the 3 attribute values that set the IXMLDOMDocument interface.
Async represents the invocation of blocking mode, which is true when asynchronous, when the load function call returns immediately, regardless of whether the file is loaded or not, and the synchronous mode when it is finished, that is, the function returns after loading. In asynchronous mode, you can query the ReadyState property value to determine whether the load is complete, or you can set onreadystatechange handler or onreadystatechange event for processing. The default value for async is true.
Validateonparse indicates whether to continue parsing when the XML file structure has errors, and the default value is true.
Resolveexternals Indicates whether an external definition or document type definition (DTD) is processed when parsing XML, and the default value in MSXML6.0 is false.
In addition to explain the variant type, generally used in COM more. Variant types are used to represent a variety of data types, and it is convenient to apply them in an interface. In fact, it is defined as a struct, where a variable indicates the true type of the data, and a union variable consisting of various types of data members. In this way, the variant can support various types of data. It is worth mentioning that the string type in the variant is represented by a BSTR, and the BSTR is also a generic string type in COM programming, which is a Unicode string. The memory allocations for BSTR strings are managed uniformly by the system, controlled by SysAllocString and SysFreeString. Windows provides specialized classes to handle variants and BSTR, which can be referred to in this article: http://www.vckbase.com/document/viewdoc/?id=1096.
The load function can either load a local file or load a remote file in the form of a URL (without testing). There is also a corresponding loadxml function that can directly load XML in the form of a string, but only supports UTF-16 and UCS-2 of two encodings.
The code to save the XML file is:
Try
{
if (FAILED (Pxmldoc->save (L"Mydata.xml")))
{
printf ("Fail Reason:%s\n", (LPCSTR) Pxmldoc->parseerror->getreason ());
}
Else
{
//Success
}
}
Catch (_com_error errorobject)
{
printf ("Exception, HRESULT = 0x%08x", Errorobject.error ());
}
Second, get the root node pointer
With the IXMLDOMDocument interface pointer, it is convenient to get the root node interface pointer. There are 3 ways to load XML, with the following code:
msxml2::ixmldomelementptr Prootnode = pxmldoc->documentelement;
Or
msxml2::ixmldomelementptr Prootnode;
pxmldoc->get_documentelement (&prootnode);
Or
msxml2::ixmldomnodeptr Prootnode, Pnode;
Pxmldoc->get_firstchild (&prootnode);
While (Prootnode)
{
MSXML2::D omnodetype type;
Prootnode->get_nodetype (&type);
if (type==node_element)
break;
Pnode = Prootnode;
Pnode->get_nextsibling (&prootnode);
}
The most common and simple method is the first. The latter two methods are written to illustrate two issues, and the following methods will only describe the most commonly used methods.
You can see that the second method is not directly accessed by the property value of the IXMLDOMDocument interface, but is obtained through the function. For the properties of the DOM interface, there is usually a corresponding get or put function to read and write to the property.
The third approach is to let everyone understand the connection and difference between the various types of node again, and we can see that both IXMLDOMDocument and IXMLDOMElement are a ixmldomnode, We can get the root node by traversing IXMLDOMDocument's child nodes. Just note that the node returned by IXMLDOMDocument's get_firstchild is not necessarily root, possibly some comment or space line, and we need to determine the node type. The types and descriptions of the nodes are as follows:
Kinds |
Value |
Significance |
Child node type |
Parent node Type |
Node_element |
1 |
Represents an Element |
ProcessingInstruction, Text, Comment, Cdatasection, EntityReference, Element |
Document, DocumentFragment, EntityReference, Element |
Node_attribute |
2 |
Represents the attributes of an element |
Text, EntityReference |
— |
Node_text |
3 |
The text that represents a label |
— |
Attribute, DocumentFragment, Element, EntityReference |
Node_cdata_section |
4 |
Represents a CDATA section |
— |
DocumentFragment, EntityReference, Element |
Node_entity_reference |
5 |
Represents an entity reference |
Element, Text, ProcessingInstruction, Comment, Cdatasection, EntityReference |
Attribute, DocumentFragment, Element, EntityReference |
Node_entity |
6 |
Represents an extended entity |
A node type that can represent the entity |
DocumentType |
Node_processing_instruction |
7 |
Represents an action indication |
— |
Document, DocumentFragment, Element, EntityReference |
Node_comment |
8 |
Represents a comment |
— |
Document, DocumentFragment, Element, EntityReference |
Node_document |
9 |
Represents an XML document |
Element, ProcessingInstruction, Comment, DocumentType |
— |
Node_document_type |
10 |
Represents the document type declaration, which appears in the <! In the Doctype> tab |
Notation, Entity |
Document |
Node_document_fragment |
11 |
Represents a document fragment or with a document |
Element, ProcessingInstruction, Comment, Text, Cdatasection, EntityReference |
— |
Node_notation |
12 |
Represents a representation of a declaration in a DTD |
— |
Document |
For a newly created XML, when we create the IXMLDOMDocument interface, the first node created by calling the Createelement_x function is the root node.
"Reprint" MSXML Application Summary Development chapter (Part One)