Easily process XML data in the. NET Framework (i)

Source: Internet
Author: User
Tags abstract object end functions net object model string access
xml| data in the. NET framework, the XmlTextReader and XmlTextWriter classes provide read and write operations on XML data. In this article, the authors describe the architecture of the XML Reader (reader) and how they are combined with the XMLDOM and sax interpreters. The authors also demonstrated how to use the reader to analyze and validate XML documents, how to create well-formed XML documents, and how to read/write large XML documents based on BASE64 and BinHex encoded by functions. Finally, the author tells how to implement a stream based read/write analyzer, which encapsulates the reader in a separate class.

About three years ago, I attended a software workshop on the theme "No XML, no programming future." XML is indeed evolving step-by-step, it has been embedded. NET Framework. In this article, I'll explain. NET Framework, the role of the API for processing XML documents and its internal characteristics, and then I'll demonstrate some of the common features.

XML from MSXML to. Net

In. NET framework, you are accustomed to using the MSXML service----a COM-based class library---write Windows XML drivers. Not like. NET Framework, some of the code for the MSXML class library is deeper than the API, and it is completely embedded in the bottom of the operating system. MSXML does have the ability to communicate with your application, but it does not really combine with the external environment.

The msxml class library can be imported into the Win32, and can be used in the CLR, but it is only available as an external server component. However, applications based on the. NET Framework can be used directly to integrate XML classes with other namespaces of the. NET Framework, and the written code is easy to read.

As a stand-alone component, the MSXML parser provides some advanced features such as asynchronous parsing. This feature is not provided in the XML classes in the. NET Framework and other classes of the. NET Framework, but the XML classes in the NET Framework can easily get the same functionality with other classes, and on this basis you can add more functionality.

The XML classes in the .net framework provide basic functionality for parsing, querying, and transforming XML data. In the. NET framework, you can find classes that support XPath queries and XSLT transformations, and classes that read/write XML documents. In addition, the. NET Framework also contains other classes that handle XML, such as serialization of objects (XmlSerializer and the SoapFormatter Class), application configuration (Appsettingsreader classes), Data store (DataSet Class). In this article, I only discuss classes that implement basic XML I/O operations.

xml Analysis Mode

Since XML is a markup language, there should be a tool that analyzes and understands the information stored in the document in a certain syntax. This tool is the XML parser---a component that reads the markup text and returns the object for the specified platform.

All XML parsers, regardless of which operating platform they belong to, are the following two categories: tree-based or event-based processor. These two classes are usually implemented with XMLDOM (the Microsoft XML Document Object Model) and sax (simple APIs for XML). The XMLDOM parser is a generic, tree-based API---it renders an XML document as a memory structure tree. The SAX parser is an event-based API----it processes every element in the XML data stream (it puts the XML data into the stream and processes it). Typically, the DOM can be loaded and executed by a sax stream, so the two classes of processing are not mutually exclusive.

Overall, the SAX parser is the opposite of the XMLDOM parser, and their analysis patterns are very different. XMLDOM is well defined in its functionalition set, you can't expand it. When it is working on a large document, it takes up a lot of memory to handle the huge collection of functionalition.

The SAX parser uses client applications to handle profiling events through an instance of an existing, specified platform object. The SAX analyzer controls the entire process and "launches" the data into the handler, which in turn accepts or rejects processing data. The advantage of this pattern is that you need very little memory space.

The .net framework fully supports the XMLDOM mode, but it does not support sax mode. Why, then? Because the. NET framework supports two different profiling modes: The XMLDOM parser and the XML reader. It obviously does not support the SAX parser, but that does not mean that it does not provide functionality similar to the SAX parser. All of the functionality of Sax through the XML reader is easy to implement and more efficient to use. Unlike the SAX parser, the reader for the. NET Framework operates entirely under the client application. In this way, the application itself can "roll out" the data that is really needed, and then jump out of the XML data stream. The SAX parsing model handles all the information that is useful and useless to the application.

The reader is based on the. NET framework flow pattern, which works like a database cursor. Interestingly, classes that implement similar cursor profiling patterns provide low-level support for the XMLDOM parser in the. NET framework. XmlReader, XmlWriter Two abstract classes are the underlying classes of XML classes in all. NET frameworks, including XMLDOM classes, ADO. NET driver class and configuration class. So in the. NET framework you have two options for working with XML data. Process XML data directly with XmlReader and XmlWriter classes, or in XMLDOM mode. More information about reading documents in the. NET framework is available in the MSDN August 2002 issue of the cutting edge column.

xmlreader class

The xml reader supports a programming interface that interfaces with XML documents and "launches" the data you want. If you go deeper into the reader, you'll find that the reader works like the way our desktop application pulls data out of the database. The database service returns a cursor object that contains all the query result sets and returns a reference to the start address of the destination dataset. The client of the XML reader receives a reference to the reader instance. The instance extracts the underlying data stream and renders the extracted data as an XML tree. The reader class provides a read-only, forward-only cursor, and you can scroll through each of the data in the result set by scrolling the cursor in the method provided by the reader class.

From the reader the XML document is not a label text file, but rather a serialized node collection. It is a special cursor pattern in the. NET Framework, and you cannot find any of the other similar API functions in the. NET Framework.

There are a few different places for reader and XMLDOM Analyzer. XML reader is only entered, it does not have the concept of parent, child, ancestor, sibling node, and is read-only. In the. NET framework, reading and writing XML documents is divided into two completely different functions, which are done by the XmlReader and XmlWriter classes respectively. To edit an XML document, you can do both by using the XMLDOM parser, or by designing a class yourself. Let's start by analyzing the reader's program features.

xmlreader is an abstract class that you can inherit and extend its functionality. User programs are generally based on the following three categories: XmlTextReader, XmlValidatingReader, or XmlNodeReader. All of these classes are like the properties of diagram one and the method of figure two. Note that the value of some properties actually depends on the actual reader class, and that different classes may be different from the base class. Therefore, the description of each attribute in figure I is based on the base class. For example, the Canresolveentity property returns True only in the XmlValidatingReader class, but it can be set to false in other reader classes. Similarly, the actual return value of some methods in figure II may be different for different classes. For example, if the node type is not an element node, the return value type of all methods containing atrributes are void.

The xmltextreader class quickly accesses the XML data stream in a forward-only, read-only manner. The reader first verifies that the XML document is well-formed and throws an exception if it is not. XmlTextReader checks that the DTD is well-formed, but does not use a DTD to validate the document. XmlTextReader the file name of an XML document, or its URL, or loads an XML document from a file stream, and then quickly processes XML document data. If you need to validate the document's data, you can use the XmlValidatingReader class.

You can create instances of XmlTextReader classes in several ways, load files from your hard disk, load them from URL addresses, stream (streams), and read XML document data from text:

xmltextreader reader = new XmlTextReader (file);

Note that all XmlTextReader class public constructors require you to specify a data source, which can be a stream, file, or other. XmlTextReader the default constructor is protected (protected), so it cannot be used directly. Like all reader classes in the. NET framework (such as the SqlDataReader Class), once the reader object is connected and opened, you can use the Read method to access the data. You can only move the pointer to the first element using the Read method, and then we can move the pointer to the next node element using the Read method or other methods (such as Skip, movetocontent, and ReadInnerXml). To process the contents of an entire XML document, you can iterate through the contents of the document based on the return value of the Read method, because the Read method returns a Boolean value that returns False when read to the end node of the document, otherwise it returns true.

figure 3 outputting an XML Document Node Layout

string getxmlfilenodelayout (string file)
{
//creates a XmlTextReader class to point it to the target XML document
xmltextreader reader = new XmlTextReader (file);

//loops out the text of the node and puts it into the StringWriter object instance
stringwriter writer = new StringWriter ();
string tabprefix = "";

while (reader. Read ())
{
//Write start flag, if node type is element
if (reader. NodeType = = XmlNodeType.Element)
{
//adds reader according to the depth of the node at which the element is. Depth a tab character, and then writes the element name to the <>.
tabprefix = new string (' \ t ', reader. Depth);
writer. WriteLine ("{0}<{1}>", Tabprefix, reader. Name);
£}
else
{
//Write-end flags if the node type is an element
if (reader. NodeType = = xmlnodetype.endelement)
{
tabprefix = new string (' \ t ', reader. Depth);
writer. WriteLine ("{0}", Tabprefix, reader.) Name);
£}
£}
£}

//Output to screen
string buf = writer. ToString ();
writer. Close ();

//Close Stream
reader. Close ();

return buf;
£}


Figure III illustrates a simple function for outputting the node elements of a given XML document. The function first opens an XML document and then loops through all the content in the XML document. Each time you call the Read method, the reader's pointer moves down one node. In most cases, you can handle element nodes with the Read method, but sometimes when you move from one node to the next, you may be moving between two different types of nodes. But the Read method cannot move between attribute nodes. The MoveToContent method of the reader allows the pointer to jump from the head node position to the first content node position. You can also move the pointer using the Skip method in ProcessingInstruction, DocumentType, Comment, whitespace, and significantwhitespace type nodes.

The type of each node is one of the XmlNodeType enumerations, and in the code shown in figure three, we use only two of these types: Element and endelement. The output source has redefined the original document structure, discarding or ignoring the attributes and node contents of the XML element, and outputting only the element node name. Let's say we use the following XML fragment:

£

MSDN Magazine
£

MSDN Voices
£
£
The results of the above program output are as follows:

£
£
£
£
£
£

The indent of the child node is based on the reader's depth property (depth property), and the Depth property returns an orthopedic data that represents the nesting level of the current node. All text is placed in the StringWriter object (a very handy class that encapsulates the Strigbuilder class based on the flow).

As mentioned earlier, the reader does not automatically access the property node through the Read method. To access the collection of attribute nodes for the current element, you must traverse the collection with a simple loop that is controlled by the return value of the MoveToNextAttribute method. The following code is used to access all the properties of the current node and to combine the name of the property and its value into a string, separated by commas:
if (reader. HasAttributes)
while (reader. MoveToNextAttribute ())
buf + = reader. Name + "=\" + reader. Value + "\", ";
reader. Movetoelement ();

When you finish processing the property set, call the Movetoelement method to return the pointer to the element node to which the property belongs. To be exact, the Movetoelement method is not really a moving pointer, because the pointer never moves away from the element node when the property set is processed. The Movetoelement method simply points to an internal member and obtains the member's value in turn. For example, use the Name property to get the property name of a property, and then call the Movetoelement method to move the pointer to the element node to which it belongs. However, when you do not need to continue processing other nodes, you do not have to call the Movetoelement method.

Author: Chyich (translated)/aspcool



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.