Brief introduction
There are a lot of class libraries and APIs in the. NET Framework that manipulate XML data, but after the. NET Framework 3.5 Our preference is generally LINQ to XML.
The LINQ to XML operation XML data, either the Xelement.load method or the Xelement.parse method, loads the entire XML file into memory, and LINQ to XML is less appropriate in the case of an XML file that is super large.
The best way for a large XML file is to read only one part at a time, so that the entire XML file is read gradually, which corresponds to the XmlReader class.
XmlReader is efficient to use, but it is not easy to operate with LINQ to XML, so you want to take both: efficiency is as easy as LINQ to XML.
Ideas
The XElement class has a method readfrom that accepts a XmlReader parameter: Xnode.readfrom method (XmlReader)
In the above link MSDN, actually already has the corresponding combination way, and the name is also good: the implementation of large XML document streaming conversion
Static ienumerable<xelement> streamxelements (string uri, String matchname) { xmlreadersettings settings = new XmlReaderSettings (); Settings. Ignorecomments = true; Settings. Ignorewhitespace = true; using (XmlReader reader = Xmlreader.create (URI, settings)) { reader. MoveToContent (); while (reader. Read ()) { switch (reader). NodeType) {case XmlNodeType.Element: if (reader. Name = = matchname) { XElement el = xelement.readfrom (reader) as XElement; if (el! = null) { yield return el; } } Break;}}}}
The above code is to use XmlReader has been read down, and then encountered XmlNodeType.Element type can be xelement.readfrom (reader) build XElement, the most important is the final yield return.
So far so good.
However, when testing, it is found that this method has a serious bug, each time after reading a xelement will skip a xelement:
As in the XML above, after reading the first 470002048 node, the 470002049 nodes are skipped.
Here is actually XmlReader accidentally read too far of a problem, read too is actually more than read once, can understand this:
initial read; ( while " we ' re not at the end " ) { do stuff; Read;}
Back to our code above, in fact, after Xelement.readfrom (reader) build XElement, the internal has been read once, but in the while statement we still in reader, so the next XElement will not read.
After knowing the reason, it's easy to solve, so use reader here. EOF makes the judging condition and removes the extra read at once, the exact code is as follows:
Static ienumerable<xelement> streamxelements (string uri, String matchname) { xmlreadersettings settings = new XmlReaderSettings (); Settings. Ignorecomments = true; Settings. Ignorewhitespace = true; using (XmlReader reader = Xmlreader.create (URI, settings)) { reader. MoveToContent (); while (!reader. EOF) { if (reader. NodeType = = XmlNodeType.Element && reader. Name = = matchname) { XElement el = xelement.readfrom (reader) as XElement; if (el! = null) { yield return el; } } else { reader. Read ();}}}
Summarize
The way of combining XmlReader and XElement in MSDN actually already has the corresponding article introduction, but own groping the process still has a lot of harvest, the reference article is as follows:
Http://stackoverflow.com/questions/2299632/why-does-xmlreader-skip-every-other-element-if-there-is-no-whitespace-separator
Https://msdn.microsoft.com/en-us/library/mt693229.aspx
Http://stackoverflow.com/questions/2441673/reading-xml-with-xmlreader-in-c-sharp
https://blogs.msdn.microsoft.com/xmlteam/2007/03/24/streaming-with-linq-to-xml-part-2/
Read large XML documents in combination with XmlReader and XElement