In the. NET Framework, the XmlTextReader and XmlTextWriter classes provide read and write operations on xml data. In this article, the author describes the architecture of the XML Reader and how they are combined with the XMLDOM and the SAX interpreter. The author also demonstrated how to use a reader to analyze and verify XML documents, how to create well-formed XML documents, and how to use functions to read/write large XML documents encoded Based on Base64 and BinHex. Finally, the author explains how to implement a stream-based read/write analyzer, which encapsulates all the readers in a separate class.
About three years ago, I attended a software seminar on "No XML, no programming future ". XML is indeed developing step by step. It has been embedded in. NET Framework. In this article, I will explain the roles and internal features of the APIs used to process XML documents in. NET Framework. Then I will demonstrate some common functions.
XML from MSXML to. net
Before. NET Framework appeared, you used to use the MSXML service-a COM-based class library-to write windows XML drivers. Unlike classes in the. NET Framework, some code in the MSXML class library is deeper than the API, and it is completely embedded at the underlying layer of the operating system. MSXML can indeed communicate with your application, but it cannot really be combined with the external environment.
The MSXML class library can be imported in win32 and used in CLR, but it can only be used as an external server component. However, Applications Based on. NET Framework can directly use XML classes and Other Namespaces of. NET Framework, and the written code is easy to read.
As an independent component, MSXML analyzer provides some advanced features such as Asynchronous analysis. This feature is available in.. NET Framework and. other classes in the NET Framework are not provided. However, the integration of XML classes in the NET Framework with other classes can easily obtain the same functions, on this basis, you can add more features.
The XML class in. NET Framework provides basic functions for analyzing, querying, and transforming XML data. In. NET Framework, you can find classes that support Xpath query and XSLT conversion, and read/write XML documents. In addition ,. NET Framework also contains other XML processing classes, such as object serialization (XmlSerializer AND the SoapFormatter class), application configuration (AppSettingsReader class), and data storage (DataSet class ). In this article, I will only discuss classes that implement basic xml I/O operations.
XML Analysis Mode
Since XML is a markup language, there should be a tool to analyze and understand the information stored in the document according to certain syntaxes. This tool is an XML analyzer-a component used to read markup text and return objects of the specified platform.
All XML analyzers, no matter which operating platform they belong to, are classified into two types: tree-based or event-based processors. These two types are usually implemented using XMLDOM (the Microsoft XML Document Object Model) and SAX (Simple API for XML. XMLDOM analyzer is a common tree-based API that treats XML documents as a memory structure tree. The SAX analyzer is an event-based API that processes every element in the XML data stream (it puts XML data into the stream for processing ). Generally, DOM can be loaded and executed by a SAX stream. Therefore, these two types of processing are not mutually exclusive.
In general, the comparison between the SAX analyzer and the XMLDOM analyzer is quite different in their analysis modes. XMLDOM is well defined in its functionalition set. You cannot extend it. When processing a large file, it takes a lot of memory space to process the huge set of functionalition.
The SAX analyzer uses client applications to Process Analysis events through instances of objects on the existing specified platform. The SAX analyzer controls the entire processing process and "releases" the data to the processing program, which accepts or rejects the data in turn. This mode requires only a small amount of memory space.
. NET Framework fully supports the XMLDOM mode, but it does not support the SAX mode. Why? Because. NET Framework supports two different analysis modes: XMLDOM analyzer and XML reader. It obviously does not support the SAX analyzer, but it does not mean it does not provide functions similar to the SAX analyzer. All the functions of the XML reader SAX can be easily implemented and used more effectively. Unlike the SAX analyzer, the. NET Framework reader operates under the client application. In this way, the application itself can only "launch" The actually needed data and then jump out of the XML data stream. The SAX analysis mode must process all useful and useless information for applications.
A reader works based on the. NET Framework stream mode, which is similar to a database cursor. Interestingly, classes similar to the cursor Analysis Mode provide underlying support for XMLDOM analyzer in. NET Framework. The XmlReader and XmlWriter abstract classes are the basic classes of XML classes in all. NET frameworks, including the XMLDOM class, ADO. NET Driver Class, and configuration class. Therefore, in. NET Framework, you have two optional methods to process XML data. Use the XmlReader and XmlWriter classes to directly process XML data, or use the XMLDOM mode for processing. For more information about reading documents in. NET Framework, see the Cutting Edge topic article published in The August 2002 issue of MSDN.
XmlReader class
XML Reader supports a programming interface used to connect XML documents and "launch" the data you want. If you have a deeper understanding of the reader, you will find that the reader works in a way similar to how our desktop applications extract data from the database. Database Service returns a cursor object that contains all query result sets and returns a reference to the start address of the target dataset. The XML reader client receives a reference pointing to the reader instance. This instance extracts the underlying data streams and presents the retrieved data as an XML tree. The reader class provides read-only and forward cursors. You can use the method provided by the reader class to scroll the cursor to traverse each piece of data in the result set.
From the reader, the XML document is not a label text file, but a serialized node set. It is a special cursor mode in. NET Framework. In. NET Framework, you cannot find any other similar API function.
The reader and XMLDOM analyzer are different. An XML reader is read-only and has no concept of parent, child, ancestor, or sibling node. In. NET Framework, reading and writing XML documents is divided into two completely different functions, which are completed by the XmlReader and XmlWriter classes respectively. To edit an XML document, you can use the XMLDOM analyzer or design a class to implement these two functions. Let's start to analyze the reader's program functions.
XmlReader is an abstract class. You can inherit and extend its functions. User Programs are generally based on the following three types: XmlTextReader, XmlValidatingReader, or XmlNodeReader class. All these classes have attributes and the fig method. It should be noted that the values of some attributes depend on an actual reader class. Different classes may be different from the base class. Therefore, the description of each attribute in Figure 1 is based on the base class. For example, the CanResolveEntity attribute returns only true in the XmlValidatingReader class, but can be set to false in other reader classes. Similarly, the actual return values of some methods in Figure 2 may be different for different classes. For example, if the node type is not an element node, the return value type of all methods containing Atrributes is void.
The XmlTextReader class uses a read-only method to quickly access the XML data stream. The reader first verifies whether the XML document is in good format. If not, an exception is thrown. XmlTextReader checks whether the DTD format is good, but does not use the DTD to verify the document. XmlTextReader uses the XML document file name, or its URL, or loads an XML document from the file stream, and then quickly processes XML document data. If you need to verify the document data, you can use the XmlValidatingReader class.
You can use multiple methods to create an XmlTextReader class instance, load files from the hard disk, or load files from the URL address, stream (streams), and read XML document data from the text:
XmlTextReader reader = new XmlTextReader (file );
Note that all public constructors of the XmlTextReader class require you to specify a data source, which can be stream, file, or other data sources. XmlTextReader's default constructor is protected, so it cannot be used directly. Like all reader classes in the. NET Framework (such as the SqlDataReader class), once the reader object is connected and opened, you can use the Read method to access data. At the beginning, only the Read method can be used to move the pointer to the first element. Then, we can use the Read method or other methods (such as Skip, MoveToContent, and ReadInnerXml) to move the pointer to the next node element. To process the content of the entire XML document, use a loop to traverse the document content based on the return value of the Read method, because the Read method returns a Boolean value. When reading the End Node of the document, the Read method returns false; otherwise, it returns true.
Figure 3 Outputting an XML Document Node Layout
String GetXmlFileNodeLayout (string file)
{
// Create an XmlTextReader class to point it to the target XML document
XmlTextReader reader = new XmlTextReader (file );
// Cyclically retrieve the node text and put it into the StringWriter object instance
StringWriter writer = new StringWriter ();
String tabPrefix = "";
While (reader. Read ())
{
// Write Start flag. If the node type is element
If (reader. NodeType = XmlNodeType. Element)
{
// Add the reader. Depth tab Based on the Depth of the node where the element is located, and then write the element name to <>.
TabPrefix = new string ('\ t', reader. Depth );
Writer. WriteLine ("{0 }<{ 1}>", tabPrefix, reader. Name );
}
Else
{
// Write end flag. If the node type is element
If (reader. NodeType = XmlNodeType. EndElement)
{
TabPrefix = new string ('\ t', reader. Depth );
Writer. WriteLine ("{0}", tabPrefix, reader. Name );
}
}
}
// Output to the screen
String buf = writer. ToString ();
Writer. Close ();
// Close the stream
Reader. Close ();
Return buf;
}
Figure 3 demonstrates a simple function for outputting node elements of a given XML document. This function first opens an XML document and then processes all the content in the XML document cyclically. Each time you call the Read method, the reader pointer moves down a node. In most cases, the Read method can be used to process element nodes, but sometimes, when you move from a node to the next node, it may be moving between two different types of nodes. However, the Read method cannot move between attribute nodes. The MoveToContent method of the Reader allows the pointer to jump from the header node to the first content node. You can also use the Skip method to move pointers in ProcessingInstruction, DocumentType, Comment, Whitespace, and SignificantWhitespace nodes.
The type of each node is one of the XmlNodeType enumeration. In the code shown in 3, we only use the two types: Element and EndElement. The output source code re-customizes the original document structure. It discards or ignores the attributes and node content of the XML element, and only outputs the element node name. Suppose we use the following XML snippet:
<Mags>
<Mag name = "MSDN Magazine">
MSDN Magazine
</Mag>
<Mag name = "MSDN Voices">
MSDN Voices
</Mag>
</Mags>
The output result of the above program is as follows:
<Mags>
<Mag>
</Mag>
<Mag>
</Mag>
</Mags>
The subnode indentation is set based on the reader's Depth attribute (Depth attribute). The Depth attribute returns an integer representing the nested hierarchy of the current node. All text is placed in the StringWriter object (a stream-based class that encapsulates the StrigBuilder class is very convenient ).
As mentioned above, the reader does not automatically access attribute nodes through the Read method. To access the attribute node set of the current element, you must use a simple loop controlled by the return value of the MoveToNextAttribute method to traverse the set. The following code accesses all the attributes of the current node and combines the attribute name and its value with a comma to form a string:
If (reader. HasAttributes)
While (reader. MoveToNextAttribute ())
Buf + = reader. Name + "= \" "+ reader. Value + "\",";
Reader. MoveToElement ();
When you complete attribute set processing, call the MoveToElement method to return the pointer to the element node to which the attribute belongs. To be accurate, the MoveToElement method is not a real moving pointer, because the pointer has never been removed from the element node when processing the attribute set. The MoveToElement method only points to an internal Member and obtains the value of the Member in sequence. For example, you can use the Name attribute to obtain the attribute Name of an attribute, and then call the MoveToElement method to move the pointer to the element node to which it belongs. However, when you do not need to process other nodes, you do not have to call the MoveToElement method.