XmlReader and XElement combination to read large xml documents, xmlreaderxelement

Last Update:2016-05-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

XmlReader and XElement combination to read large xml documents, xmlreaderxelement
Introduction

There are a large number of class libraries and APIs for operating xml data in. NET framework. However, after. NET framework 3.5, we generally prefer to use linq to xml.

Whether it is XElement. the Load method is still XElement. the Parse method loads the entire xml file into the memory, which is not suitable when the xml file is too large.

The best method for large xml files is to read only a part of the file each time. This gradually reads the entire xml file, which exactly corresponds to the XmlReader class.

XmlReader is highly efficient to use, but it is not convenient to operate on linq to xml, so we hope to take the advantages of both: it is as convenient to use efficiently as it is with linq to xml.

Ideas

The XElement class has a method ReadFrom, which accepts an XmlReader parameter: XNode. ReadFrom method (XmlReader)

In the above link, MSDN actually has a combination method, and the name is also good: Execute streaming conversion for large XML documents

static IEnumerable<XElement> StreamXElements(string uri, string matchname){    XmlReaderSettings settings = new XmlReaderSettings();    settings.IgnoreComments = true;    settings.IgnoreWhitespace = true;    using (XmlReader reader = XmlReader.Create(uri, settings))    {        reader.MoveToContent();        while (reader.Read())        {            switch (reader.NodeType)            {                case XmlNodeType.Element:                    if (reader.Name == matchname)                    {                        XElement el = XElement.ReadFrom(reader) as XElement;                        if (el != null)                        {                            yield return el;                        }                    }                    break;                                   }        }    }}

The above code is to use XmlReader to Read it all the time. When the XmlNodeType. Element type is encountered, XElement. ReadFrom (reader) can be used to construct XElement. The most important thing is the final yield return.

So far, so far so good.

However, during the test, we found that this method has a serious bug. Every time we read an XElement, We will skip an XElement:

For example, after reading the first 470002048 nodes, the 470002049 nodes are skipped.

This is actually a problem of XmlReader's accidental Read too far. read too far is actually read once more, which can be understood as follows:

initial read;(while "we're not at the end") {    do stuff;    read;}

Return to the code above, in fact, in XElement. after ReadFrom (reader) constructs an XElement, it has been read once internally, but we are still in reader in the while statement, so that the next XElement won't be read.

After knowing the cause, the solution is simple. Here we use reader. EOF to determine the condition and remove the extra read. The specific code is as follows:

static IEnumerable<XElement> StreamXElements(string uri, string matchname){    XmlReaderSettings settings = new XmlReaderSettings();    settings.IgnoreComments = true;    settings.IgnoreWhitespace = true;    using (XmlReader reader = XmlReader.Create(uri, settings))    {        reader.MoveToContent();        while (!reader.EOF)        {            if (reader.NodeType == XmlNodeType.Element                && reader.Name == matchname)            {                XElement el = XElement.ReadFrom(reader) as XElement;                if (el != null)                {                    yield return el;                }            }            else            {                reader.Read();            }        }    }}

Summary

The combination of XmlReader and XElement has already been introduced in the relevant articles in MSDN, but there are still many gains in the process of self-exploration. refer to the following article:

Http://stackoverflow.com/questions/2299632/why-does-xmlreader-skip-every-other-element-if-there-is-no-whitespace-separator

Https://msdn.microsoft.com/en-us/library/mt693229.aspx

Http://stackoverflow.com/questions/2441673/reading-xml-with-xmlreader-in-c-sharp

Https://blogs.msdn.microsoft.com/xmlteam/2007/03/24/streaming-with-linq-to-xml-part-2/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

XmlReader and XElement combination to read large xml documents, xmlreaderxelement

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

XmlReader and XElement combination to read large xml documents, xmlreaderxelement

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support