Use C # To Read xml documents (Stream Model)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the System. Xml namespace, XmlReader and XmlWriter classes are abstract classes used to read and write xml documents. They use stream models.
The XmlReader class is used to read XML documents. It provides fast, non-cached, forward-only, and read-only access to XML data.

1. XmlReader has three subclasses:
1) XmlTextReader: the fastest XmlReader implementation. It checks whether the XML format is correct, but does not support verification. The reader cannot expand regular entities (Concepts in dtd) and does not support default attributes.
XmlReader causes an XmlException exception when an XML analysis error occurs.

2) XmlValidatingReader: You can use DTD or Schema to verify the implementation of XmlReader. The reader can expand regular entities and support default attributes.

3) XmlNodeReader: XmlReader implementation for reading XML data from XmlNode

Note: Read xml documents. The XmlTextReader class is generally used.

========================================================== ==============

2. XmlTextReader class
1) XmlTextReader regards the xml document as a serialized node set, that is, a node stream

2) start tag of elements in xml documents (the node type is Element), Attribute of elements, and Text content of Elements) the blank or newline (SignificantWhitespace or Whitespace) between tags and the end tag (EndElement) of the elements are considered as nodes.

3) The Read () method is the main instance method of the XmlTextReader class. It executes a node in the stream of a Read node at a time.
However, the Read () method does not automatically Read Attribute nodes. If the current node is an Element node, the next read node is a Text node (if the current element has Text content) or a Whitespace node (if there is a blank or line feed between the pre-marker of the current node, if there is no blank line feed, there is no Whitespace node)
That is, the Read () method only reads nodes of the Element, Text, Whitespace, and EndElement types.

3) if the current node is an Element node, you can use the following method to read the attribute node of the Element:
MoveToAttribute, MoveToFirstAttribute, and MoveToNextAttribute
If the current node is an Attribute node and you want to return to the node of the element to which the Attribute belongs, you can use the MoveToElement method.

4) as the node types of the current node are different, some attributes of the XmlReader class object will also be invalid.
For example, the AttributeCount attribute of the XmlTextReader object is meaningless for nodes that are not of the Element, DocumentType, and XmlDeclaration types.

5) Processing of SignificantWhitespace/Whitespace nodes:
The WhitespaceHandling attribute of the XmlTextReader object specifies how to handle the blank space.

3. Node Type
========================================================== ==================================
Node Type Description Example XML
========================================================== ==================================
Attribute id = '000000'
---------------------------------------------------------------------
CDATA section <! [CDATA [my escaped text]>
---------------------------------------------------------------------
Comment <! -- My comment -->
---------------------------------------------------------------------
Document is the root Document Object of the Document tree to provide access to the entire XML Document.
---------------------------------------------------------------------
DocumentFragment
---------------------------------------------------------------------
DocumentType document type declaration <! DOCTYPE...>
---------------------------------------------------------------------
Element <item>
---------------------------------------------------------------------
EndElement End Element tag </item>
---------------------------------------------------------------------
EndEntity is returned when XmlReader reaches the end of object replacement because ResolveEntity is called.
---------------------------------------------------------------------
Entity declaration <! ENTITY...>
---------------------------------------------------------------------
EntityReference references to objects & num;
---------------------------------------------------------------------
None if the Read method is not called, XmlReader returns
---------------------------------------------------------------------
Representation in the Notation document type declaration <! NOTATION...>
---------------------------------------------------------------------
ProcessingInstruction Processing Command <? Pi test?>
---------------------------------------------------------------------
SignificantWhitespace: gaps in the tag space or in the xml: space = "preserve" range in the mixed content Model
---------------------------------------------------------------------
Text Content of the Text node. It can appear as a subnode of the Attribute, DocumentFragment, Element, and EntityReference nodes.
---------------------------------------------------------------------
Whitespace white space
---------------------------------------------------------------------
XmlDeclaration XML declaration <? Xml version = '1. 0'?>
========================================================== ==================================

4. XmlTextReader application example: read an xml document-tmp. xml

Content of tmp. xml:
-----------------------------------------------------------------------
<? Xml version = "1.0" encoding = "GB2312"?>
<Bookstore>
<Book name = "Cultural hardships">
<Author nation = "China" age = "Contemporary"> Yu Qiuyu & amp; </author>
<Price> 32 </price>
<Press> China Publishing House </press>
</Book>
</Bookstore>
// The document has a blank ending line
------------------------------------------------------------------------

C # code:
------------------------------------------------------------------------
XmlTextReader xr4 = new XmlTextReader ("temp. xml"); // load the entire document to the memory
Xr4.WhitespaceHandling = WhitespaceHandling. All; // set how to handle blank nodes
While (xr4.Read ())
{
Console. Write ("Type:" + xr4.NodeType). PadRight (20); // print the node Type
Console. Write ("Name:" + xr4.Name). PadRight (18); // print the node Name
Console. WriteLine ("Value:" + xr4.Value); // print the node Value

If (xr4.HasAttributes) // if the current node has an attribute (this attribute of a non-element node is null)
{
Int tmp = xr4.AttributeCount;
For (int I = 0; I <tmp; I ++) // print attributes of element nodes cyclically
{
Xr4.MoveToAttribute (I); // move to the nth attribute
Console. WriteLine ("Type:" + xr4.NodeType). PadRight (20)
+ ("Name:" + xr4.Name). PadRight (16)
+ ("Value:" + xr4.Value. PadRight (10 ))
+ ("AttributeCount:" + xr4.AttributeCount ));
Xr4.MoveToElement (); // return to the Element Node
}
}
}
Xr4.Close (); // close the Stream Object
Console. ReadLine ();
------------------------------------------------------------------------

Output:
----------------------------------------------------------------------------------
Type: XmlDeclaration Name: xml Value: version = "1.0" encoding = "GB2312"
Type: Attribute Name: version value: 1.0 AttributeCount: 2
Type: Attribute Name: encoding value: GB2312 AttributeCount: 2
Type: Whitespace Name: Value:

Type: Element Name: bookstore Value: // This node has no attributes. AttributeCount is null.
Type: Whitespace Name: Value: // This node has no attributes. AttributeCount is null.

Type: Element Name: book Value:
Type: Attribute Name: name value: AttributeCount: 1
Type: Whitespace Name: Value:

Type: Element Name: author Value:
Type: Attribute Name: nation value: AttributeCount: 2
Type: Attribute Name: age value: Contemporary AttributeCount: 2
Type: Text Name: Value: Yu Qiuyu &
Type: EndElement Name: author Value:
Type: Whitespace Name: Value:

Type: Element Name: price Value:
Type: Text Name: Value: 32
Type: EndElement Name: price Value:
Type: Whitespace Name: Value:

Type: Element Name: press Value:
Type: Text Name: Value: China Publishing House
Type: EndElement Name: press Value:
Type: Whitespace Name: Value:

Type: EndElement Name: book Value: // indicates the end mark of the book element. </book>
Type: Whitespace Name: Value:

Type: EndElement Name: bookstore Value: // indicates the end mark. </bookstore>
Type: Whitespace Name: Value: // blank ending line of the document
----------------------------------------------------------------------------------

5. Common Methods of the XmlTextReader class
1) move between element nodes and attribute nodes
If the current node is an element node and the element has attributes:
· You can use the MoveToAttribute () method to move to an attribute node. This method requires that the attribute name or location be specified.
· You can use the MoveToFirstAttribute () method to move to the first attribute and return true.
· If the current node is an element node, the MoveToNextAttribute () method is equivalent to the MoveToFirstAttribute method.
If you have already moved to the attribute node and the next attribute exists, calling this method will move to the next attribute node.
Otherwise, the reader position remains unchanged and false is returned.

If the reader locates on the attribute and uses the MoveToElement () method, the current node moves to the element node to which the attribute belongs.

2) skip content-two methods
· One method is to call the MoveToContent method to directly move to the content node.
The MoveToContent () method checks the current node to see if it is a content node.
The content node is defined as any Text, CDATA, Element, EndElement, EntityReference, or EndEntity node. If the current node is not one of the preceding content node types, the node is skipped and the next content node or the end of the file is skipped.
It keeps jumping until it finds the next content node or ends at the end of the file.
If the current node is an attribute node, this method moves the reader back to the element with this attribute.

Example:
-----------------------------------------------------------------------------
If (reader. MoveToContent () = XmlNodeType. Element & reader. Name = "price ")
{
_ Price = reader. ReadString ();
}
-----------------------------------------------------------------------------

· Another method is to directly call the Skip method. This method skips all subnodes from the current node to the next peer node.
If the current node type is XmlNodeType. Element, the method to call skip will jump to the next node at the same level.
If the current node is a property node, the skip method will be called to jump to the next same-level node of the element node to which the property belongs.

3) read Method
· Read () method: true if the next node is successfully Read; false if no other node is available
No information is available when the reader is created and initialized for the first time. Read () must be called to Read the first node.

· ReadStartElement () method:
Check whether the current node is an Element (Element-type node) and push the reader to the next node.

· ReadEndElement () method:
Check whether the current node is an end flag (type: EndElement node) and push the reader to the next node.

· ReadAttributeValue () method:
Splits the attribute values of the current attribute node into one or more Text, EntityReference, or EndEntity nodes.
If a node can be returned, true is returned. If the reader is not located on the attribute node during the initial call, or if all attribute values have been read, false is returned.
If it is a null attribute (such as misc = ""), true is returned, and the attribute value is decomposed into a single node of String. Empty.
Generally, when the reader moves to an attribute node, the ReadAttributeValue method is called cyclically to break down the attribute value.

Example: Read the xml document <book genre = 'novel' misc = 'sale-item & h; 000000'> </book>
---------------------------------------------------------------------
.......
Reader. MoveToAttribute ("misc"); // move to attribute misc
While (reader. ReadAttributeValue () // The misc attribute value contains the object application, and the attribute value is decomposed.
{// The loop ends after the attribute value is decomposed
If (reader. NodeType = XmlNodeType. EntityReference) // encounters an object reference node
Console. WriteLine ("{0} {1}", reader. NodeType, reader. Name );
Else
Console. WriteLine ("{0} {1}", reader. NodeType, reader. Value );
}
--------------------------------------------------------------------

4) use the complete stream to read the complete content
Methods ReadChars, ReadBinHex, and ReadBase64 are used to read large streams. The ReadChars method reads the text (US-ASCII) as is, the ReadBase64 method decodes the Base64 encoded text, and the ReadBinHex method decodes the binhex encoded data.
The ReadChars, ReadBinHex, and ReadBase64 methods can only be used on elements. Using these methods on other node types does not work.
All three methods return all content between the start and end tags of an element, including all the tags, just like reading a stream.

· ReadChars method: public int ReadChars (char [] buffer, int index, int count );
Read the content of an element into the character buffer. By calling this method consecutively, you can read a very large embedded text stream of the current element.
This method is designed only for element nodes. Other node types cause ReadChars to return 0
This method returns the actual character content of the element and all content between the start and end tags of the element, including the marker
The ReadChars method ignores XML tags with incorrect formats.
When the ReadChars method reaches the end of the complete stream, it returns 0 and positions the reader after the end mark

· ReadBase64 method: public int ReadBase64 (byte [] array, int offset, int len );
Like the ReadChars method, you can call this method consecutively to read Large Embedded text streams.
It decodes Base64 content and returns the decoded binary bytes (such as an inline Base64 encoded GIF image) to the buffer zone.

· ReadBinHex method: public int ReadBinHex (byte [] array, int offset, int len );
Like ReadChars, this method can be called consecutively to read Large Embedded text streams.
It decodes BinHex content and returns the decoded binary bytes (such as an inline BinHex encoded GIF image) to the buffer zone.

5) read character content
· ReadElementString: Method for reading simple text elements
When the ReadElementString method is called, the reader moves to the next node and reads its simple text content. If the node is not a simple text element, an error is returned. After reading, the reader moves down a node.

· ReadString method: Read the content of an element or text node as a string.
If the reader is located outside the element or text node, or there is no other text content in the current context, an empty string is returned.

· ReadInnerXml: read all the content of a node (including child elements and text content) as a string.
If the current Reader is in the start tag, this method returns all content between the start tag and the corresponding end tag.
If the current Reader is on the attribute node, this method returns the attribute value.
If the current node is neither an element nor an attribute, an empty string is returned.

· ReadOuterXml method: This method is similar to ReadInnerXml, but it also returns the Start mark and end mark.
If the current Reader is at the start mark, the method returns the <start Mark>... content... <End mark/> string
If the current Reader is on the property node, this method returns the property name = "property value" String
If the current node is neither an element nor an attribute, an empty string is returned.

========================================================== ==============

2. XmlValidatingReader class
1) XmlValidatingReader is a reader that can provide DTD, XDR, and XSD verification. It can provide data verification, parse regular entities, and support for default attributes, this class is also inherited from the XmlReader class.

2) The XmlValidatingReader class is basically similar to the XmlTextReader class, but it adds new attributes such as ValidationType, Schema, SchemaType, and XmlResolver.
· The ValidationType attribute indicates the verification type. The value range is Auto, DTD, Schema, XDR, and None.
· The Schema attribute is used when multiple xdr or xsd instances are required for verification. The Schema attribute is essentially an XmlSchemaCollection
· The SchemaType property returns the type of the current node: XSD built-in type, user-defined type (simpleType/complexType)
· XmlResolver attributes are used to parse external entities (for example, external entities defined in DTD)

3) the best operation to verify using the XmlValidatingReader class:
· Create an XmlTextReader object tr and pass the tr object to the XmlValidatingReader constructor to generate an object trv
· Set the ValidationType attribute of the XmlValidatingReader type object trv (default value: Auto)
· Define and assign event handling methods for event ValidationEventHandler
Trv. ValidationEventHandler + = new ValidationEventHandler (this. ValidationEvent)
If a verification error occurs, the ValidationEventHandler event of the trv object is triggered and the event needs to be processed.
· Use the XmlValidatingReader object trv as the XmlTextReader Class Object

4) The ValidationEventHandler verification event may only occur when the ValidationType attribute of the XmlValidatingReader object trv is not ValidationType. None while the Read, ReadInnerXml, ReadOuterXml, or Skip method is called.
If the ValidationEventHandler event handler is not provided, when a verification error of the level of Warning is encountered, the system will continue to read data without causing an exception. When the first verification Error is returned, the XmlValidatingReader object trv will cause an exception XmlException, And the trv object will not be restarted.
XmlSchemaException is also thrown if a verification error occurs during schema or DTD verification.
If an element reports a verification error, the remaining Content Model of the element is not verified, but its child level is verified. The reader only reports the first error of a given element.
The ValidationEventHandler event handler can use the Severity attribute of the ValidationEventArgs Class Object e to ensure that the XML instance document is verified according to the architecture. The Severity attribute distinguishes between verification errors and verification warnings. A verification Error (Severity equals XmlSeverityType. Error) indicates a fatal Error, and a verification Warning (Severity equals XmlSeverityType. Warning) indicates that no available architecture information is available.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use C # To Read xml documents (Stream Model)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support