XmlReader and XElement combination to read large xml documents, xmlreaderxelement

Source: Internet
Author: User

XmlReader and XElement combination to read large xml documents, xmlreaderxelement
Introduction

There are a large number of class libraries and APIs for operating xml data in. NET framework. However, after. NET framework 3.5, we generally prefer to use linq to xml.

Whether it is XElement. the Load method is still XElement. the Parse method loads the entire xml file into the memory, which is not suitable when the xml file is too large.

The best method for large xml files is to read only a part of the file each time. This gradually reads the entire xml file, which exactly corresponds to the XmlReader class.

XmlReader is highly efficient to use, but it is not convenient to operate on linq to xml, so we hope to take the advantages of both: it is as convenient to use efficiently as it is with linq to xml.

Ideas

The XElement class has a method ReadFrom, which accepts an XmlReader parameter: XNode. ReadFrom method (XmlReader)

In the above link, MSDN actually has a combination method, and the name is also good: Execute streaming conversion for large XML documents

static IEnumerable<XElement> StreamXElements(string uri, string matchname){    XmlReaderSettings settings = new XmlReaderSettings();    settings.IgnoreComments = true;    settings.IgnoreWhitespace = true;    using (XmlReader reader = XmlReader.Create(uri, settings))    {        reader.MoveToContent();        while (reader.Read())        {            switch (reader.NodeType)            {                case XmlNodeType.Element:                    if (reader.Name == matchname)                    {                        XElement el = XElement.ReadFrom(reader) as XElement;                        if (el != null)                        {                            yield return el;                        }                    }                    break;                                   }        }    }}

The above code is to use XmlReader to Read it all the time. When the XmlNodeType. Element type is encountered, XElement. ReadFrom (reader) can be used to construct XElement. The most important thing is the final yield return.

So far, so far so good.

However, during the test, we found that this method has a serious bug. Every time we read an XElement, We will skip an XElement:

For example, after reading the first 470002048 nodes, the 470002049 nodes are skipped.

This is actually a problem of XmlReader's accidental Read too far. read too far is actually read once more, which can be understood as follows:

initial read;(while "we're not at the end") {    do stuff;    read;}

Return to the code above, in fact, in XElement. after ReadFrom (reader) constructs an XElement, it has been read once internally, but we are still in reader in the while statement, so that the next XElement won't be read.

After knowing the cause, the solution is simple. Here we use reader. EOF to determine the condition and remove the extra read. The specific code is as follows:

static IEnumerable<XElement> StreamXElements(string uri, string matchname){    XmlReaderSettings settings = new XmlReaderSettings();    settings.IgnoreComments = true;    settings.IgnoreWhitespace = true;    using (XmlReader reader = XmlReader.Create(uri, settings))    {        reader.MoveToContent();        while (!reader.EOF)        {            if (reader.NodeType == XmlNodeType.Element                && reader.Name == matchname)            {                XElement el = XElement.ReadFrom(reader) as XElement;                if (el != null)                {                    yield return el;                }            }            else            {                reader.Read();            }        }    }}
Summary

The combination of XmlReader and XElement has already been introduced in the relevant articles in MSDN, but there are still many gains in the process of self-exploration. refer to the following article:

Http://stackoverflow.com/questions/2299632/why-does-xmlreader-skip-every-other-element-if-there-is-no-whitespace-separator

Https://msdn.microsoft.com/en-us/library/mt693229.aspx

Http://stackoverflow.com/questions/2441673/reading-xml-with-xmlreader-in-c-sharp

Https://blogs.msdn.microsoft.com/xmlteam/2007/03/24/streaming-with-linq-to-xml-part-2/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.