Original: http://blog.sina.com.cn/s/blog_48f93b530100e9tr.html
Microsoft provides a wealth of XML development tools and technologies, and Smxml (Microsoft XML Core Services) should be the most common technology in general development. MSXML supports including DOM (Document object Model), SAX (simple API for XML), XMLHttpRequest, XPath, SOM (Schema object Model) and many other API interfaces and XML manipulation tools. This article first introduces the concepts and instructions involved in applying MSXML, and the next one summarizes the application of the MSXML DOM SDK for XML document processing.
One, MSXML release notes
There are currently four versions of MSXML, MSXML3, MSXML4, MSXML5, and MSXML6, respectively. Because MSXML is primarily used to support browsers and office in Microsoft's Windows systems, these versions are mostly related to different browser versions and Office versions. Microsoft introduced IE6 with the MSXML3, so generally on the XP system is supported MSXML3, MSXML3 also become the most widely used version. MSXML5 is primarily used to support Microsoft Office, and MSXML4 is quickly replaced by newer versions of MSXML6, which Microsoft recommends to apply the latest MSXML6, followed by MSXML3.
Ii. Introduction to XML (xtensible Markup Language)
The first thing to learn about MSXML is XML. XML is a markup language designed to store and exchange data in a networked or cross-platform environment to achieve data storage analysis consistency. The structure of XML is similar to the familiar HTML, but the use of the two is not related, the biggest difference is that the XML is used to store the data and allows the user to define their own elements. The flexible extensibility of XML content and its easy-to-use architecture make it very fast, and it is now the most popular technology for data transfer and exchange between programs in a variety of applications, and XML is becoming an application standard in the field of information storage and description.
About XML terminology You can go to a dedicated learning site to understand, here is simply a description of the XML document structure. Let's look at an example XML document.
<?xml version= ' 1.0 ' encoding= ' GB2312 '?>
<!--This is a XML example-
<root>
<item type= "text" > Text </item>
<item type= "CDATA" >
<! [cdata[text in CDATA is a standalone string that is not parsed and can contain some special characters]]>
</item>
<item type= "Sub" >
<subitem> Sub-nodes </subitem>
</item>
</root>
The 1th behavior XML declaration, which defines the version and encoding used by the XML. If the encoding in XML is not UTF-8 or UTF-16, you must declare the encoding, or the resolution may fail. The questions about XML encoding are described in more detail below.
Line 2nd is a comment, in the same form as HTML.
The root element of line 3rd is the roots (document Element), and all other elements are contained between the start and end tags of the element. The XML file must have a root element, and the other elements appear inside the root element.
The child node elements of line 4th through the 10th behavior root element, which can have duplicate names. An element can have attributes, text, and child elements. If the item element on line 4th has the "type" attribute, the text between the item label is the text of the element, and the item on line 8th has the child element "subitem".
<! on line 6th [cdata[]]> segment is a special syntax, called CDATA segment. Indicates that the characters do not have to be parsed, such as some special characters like ' < ', '/', ' > ', which prevents the structure of the XML from being corrupted.
There are also some things to note:
1, the XML tag is case-sensitive;
2, some escape characters in the case of non-escaping expression, should use the entity reference, such as the ' < ' into ' < ', will ' > ' into ' > ', will ' & ' into ' & ', will ' change ' to ' &apos ', will ' ' ' Into ' " ';
3. XML should use as few attributes as possible and use child elements more.
Third, encoding in XML
The character set problem on a computer is a headache for all developers, due to the historical reasons and the different needs of specific applications, there are many character encoding sets, the common can be divided into two kinds: single-byte encoding and multibyte encoding, the former is ASCII, the latter is represented as Unicode. But in addition to these two, there are many character sets, and single Unicode provides 3 encoding methods: Utf-8,utf-16 and UTF-32. On the specific knowledge of the character set, you can check some information yourself.
The specification of the XML for character encoding is that if the XML specifies the encoding format, it is processed in the specified format, which must ensure that the character encoding format of the document store is uniform with the specified, or that the parser can handle the encoding format, and that the parser gives a hint if an error occurs. When XML does not specify an encoding format, the default encoding format is UTF-8, and if the document is encoded in a format other than UTF-8, then parsing will be an error.
Iv. DOM (Document Object Model)
The DOM is a standard model of MSXML that processes XML documents, loading XML documents into memory to form a tree structure, on which the XML is manipulated as an object. The DOM provides a series of APIs and defines the corresponding object interfaces for the various types of elements in the XML structure. With these interfaces, the XML document can be created, traversed, added, deleted, modified and other dynamic operation of the document content.
The application of DOM is described in detail in the next article.
V. SAX (simple API for XML)
Look at the name. Sax is a collection of APIs for XML operations, why is this collection "simple"? is actually relative to the DOM.
As described above, the DOM model each time the entire XML document is loaded into memory to maintain a tree structure, it is conceivable that when the XML document is more complex or large size, maintenance will certainly affect the efficiency. As a result, applying sax can be a good choice when developers pay more attention to efficiency.
The biggest feature of Sax is event-driven. When the XML file is loaded, sax traverses the document and produces events such as the start and end parsing elements, notifying the external app of processing. All processing is done in a single traversal, so it is highly efficient for sax to process XML documents. In addition, for large documents, sax can load only a portion of memory at a time, which can be effectively improved both in terms of space efficiency and time efficiency.
Of course, the scope of application of Sax is limited, and the external processing is very trivial, in short, the DOM should be a good complement.
VI, XPath
XPath is a language that can look for information in an XML document, navigating through elements and attributes in an XML document, very similar to the path expression we normally use.
XPath has a number of functions built in to help with navigation, selecting a node or node collection in an XML document by specifying a path expression. There are seven types of nodes in XPath: elements, attributes, literals, namespaces, processing instructions, annotations, and root nodes. Specific path expression rules you can refer to the relevant information.
Vii. MSXML API version
This section is up-to-date because the first section above describes the MSXML version, but there is also a version of the MSXML API, which is more confusing, as explained here. The first section is about the version of the MSXML DLL library file, and the version of the API is another set of rules. Historical versions are:MSXML1.0,MSXML1.0 sp1/sp2,MSXML2.0, MSXML2.6, MSXML3.0, MSXML4.0, MSXML5.0 for Microsoft Office applications, earlier MSXML1.0 and MSXML1.0 sp1/sp2 in these versions are no longer supported, MSXML2.0 is the most common version of XML documents, and later versions add some new feature interfaces.
This is the basic concept, and hopefully it will help you understand MSXML.
Reprint MSXML Application Summary concept article