Read large XML documents in combination with XmlReader and XElement

Source: Internet
Author: User

Brief introduction

There are a lot of class libraries and APIs in the. NET Framework that manipulate XML data, but after the. NET Framework 3.5 Our preference is generally LINQ to XML.

The LINQ to XML operation XML data, either the Xelement.load method or the Xelement.parse method, loads the entire XML file into memory, and LINQ to XML is less appropriate in the case of an XML file that is super large.

The best way for a large XML file is to read only one part at a time, so that the entire XML file is read gradually, which corresponds to the XmlReader class.

XmlReader is efficient to use, but it is not easy to operate with LINQ to XML, so you want to take both: efficiency is as easy as LINQ to XML.

Ideas

The XElement class has a method readfrom that accepts a XmlReader parameter: Xnode.readfrom method (XmlReader)

In the above link MSDN, actually already has the corresponding combination way, and the name is also good: the implementation of large XML document streaming conversion

Static ienumerable<xelement> streamxelements (string uri, String matchname) {    xmlreadersettings settings = new XmlReaderSettings ();    Settings. Ignorecomments = true;    Settings. Ignorewhitespace = true;    using (XmlReader reader = Xmlreader.create (URI, settings))    {        reader. MoveToContent ();        while (reader. Read ())        {            switch (reader). NodeType)            {case                XmlNodeType.Element:                    if (reader. Name = = matchname)                    {                        XElement el = xelement.readfrom (reader) as XElement;                        if (el! = null)                        {                            yield return el;                        }                    }                    Break;}}}}    

The above code is to use XmlReader has been read down, and then encountered XmlNodeType.Element type can be xelement.readfrom (reader) build XElement, the most important is the final yield return.

So far so good.

However, when testing, it is found that this method has a serious bug, each time after reading a xelement will skip a xelement:

As in the XML above, after reading the first 470002048 node, the 470002049 nodes are skipped.

Here is actually XmlReader accidentally read too far of a problem, read too is actually more than read once, can understand this:

initial read; (  while " we ' re not at the end " ) {    do  stuff;    Read;}

Back to our code above, in fact, after Xelement.readfrom (reader) build XElement, the internal has been read once, but in the while statement we still in reader, so the next XElement will not read.

After knowing the reason, it's easy to solve, so use reader here. EOF makes the judging condition and removes the extra read at once, the exact code is as follows:

Static ienumerable<xelement> streamxelements (string uri, String matchname) {    xmlreadersettings settings = new XmlReaderSettings ();    Settings. Ignorecomments = true;    Settings. Ignorewhitespace = true;    using (XmlReader reader = Xmlreader.create (URI, settings))    {        reader. MoveToContent ();        while (!reader. EOF)        {            if (reader. NodeType = = XmlNodeType.Element                && reader. Name = = matchname)            {                XElement el = xelement.readfrom (reader) as XElement;                if (el! = null)                {                    yield return el;                }            }            else            {                reader. Read ();}}}    
Summarize

The way of combining XmlReader and XElement in MSDN actually already has the corresponding article introduction, but own groping the process still has a lot of harvest, the reference article is as follows:

Http://stackoverflow.com/questions/2299632/why-does-xmlreader-skip-every-other-element-if-there-is-no-whitespace-separator

Https://msdn.microsoft.com/en-us/library/mt693229.aspx

Http://stackoverflow.com/questions/2441673/reading-xml-with-xmlreader-in-c-sharp

https://blogs.msdn.microsoft.com/xmlteam/2007/03/24/streaming-with-linq-to-xml-part-2/

Read large XML documents in combination with XmlReader and XElement

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.