Read large XML documents in combination with XmlReader and XElement

Last Update:2016-05-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief introduction

There are a lot of class libraries and APIs in the. NET Framework that manipulate XML data, but after the. NET Framework 3.5 Our preference is generally LINQ to XML.

The LINQ to XML operation XML data, either the Xelement.load method or the Xelement.parse method, loads the entire XML file into memory, and LINQ to XML is less appropriate in the case of an XML file that is super large.

The best way for a large XML file is to read only one part at a time, so that the entire XML file is read gradually, which corresponds to the XmlReader class.

XmlReader is efficient to use, but it is not easy to operate with LINQ to XML, so you want to take both: efficiency is as easy as LINQ to XML.

Ideas

The XElement class has a method readfrom that accepts a XmlReader parameter: Xnode.readfrom method (XmlReader)

In the above link MSDN, actually already has the corresponding combination way, and the name is also good: the implementation of large XML document streaming conversion

Static ienumerable<xelement> streamxelements (string uri, String matchname) {    xmlreadersettings settings = new XmlReaderSettings ();    Settings. Ignorecomments = true;    Settings. Ignorewhitespace = true;    using (XmlReader reader = Xmlreader.create (URI, settings))    {        reader. MoveToContent ();        while (reader. Read ())        {            switch (reader). NodeType)            {case                XmlNodeType.Element:                    if (reader. Name = = matchname)                    {                        XElement el = xelement.readfrom (reader) as XElement;                        if (el! = null)                        {                            yield return el;                        }                    }                    Break;}}}}

The above code is to use XmlReader has been read down, and then encountered XmlNodeType.Element type can be xelement.readfrom (reader) build XElement, the most important is the final yield return.

So far so good.

However, when testing, it is found that this method has a serious bug, each time after reading a xelement will skip a xelement:

As in the XML above, after reading the first 470002048 node, the 470002049 nodes are skipped.

Here is actually XmlReader accidentally read too far of a problem, read too is actually more than read once, can understand this:

initial read; (  while " we ' re not at the end " ) {    do  stuff;    Read;}

Back to our code above, in fact, after Xelement.readfrom (reader) build XElement, the internal has been read once, but in the while statement we still in reader, so the next XElement will not read.

After knowing the reason, it's easy to solve, so use reader here. EOF makes the judging condition and removes the extra read at once, the exact code is as follows:

Static ienumerable<xelement> streamxelements (string uri, String matchname) {    xmlreadersettings settings = new XmlReaderSettings ();    Settings. Ignorecomments = true;    Settings. Ignorewhitespace = true;    using (XmlReader reader = Xmlreader.create (URI, settings))    {        reader. MoveToContent ();        while (!reader. EOF)        {            if (reader. NodeType = = XmlNodeType.Element                && reader. Name = = matchname)            {                XElement el = xelement.readfrom (reader) as XElement;                if (el! = null)                {                    yield return el;                }            }            else            {                reader. Read ();}}}

Summarize

The way of combining XmlReader and XElement in MSDN actually already has the corresponding article introduction, but own groping the process still has a lot of harvest, the reference article is as follows:

Http://stackoverflow.com/questions/2299632/why-does-xmlreader-skip-every-other-element-if-there-is-no-whitespace-separator

Https://msdn.microsoft.com/en-us/library/mt693229.aspx

Http://stackoverflow.com/questions/2441673/reading-xml-with-xmlreader-in-c-sharp

https://blogs.msdn.microsoft.com/xmlteam/2007/03/24/streaming-with-linq-to-xml-part-2/

Read large XML documents in combination with XmlReader and XElement

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Read large XML documents in combination with XmlReader and XElement

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Read large XML documents in combination with XmlReader and XElement

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support