Performance Comparison of XML data reading methods (II)

Source: Internet
Author: User
In the previous period, we summarized the general xml reading method, but we do not need to use all the data from the XML source at ordinary times. so I tried to read some of the data, for example, you can select a position based on the start letter of the title. In the previous period, we summarized the general xml reading method, but we do not need to use all the data from the XML source at ordinary times. so I tried to read some of the data, for example, you can select a position based on the start letter of the title.

For the three random read methods, you only need to change the query conditions.


XmlDocument:var nodeList = doc.DocumentElement.SelectNodes("item[substring(title,1,1)='M'][position() mod 10 = 0]");  XPathNavigator:var nodeList = nav.Select("/channel/item[substring(title,1,1)='M'][position() mod 10 = 0]");  Xml Linq:var nodelist = from node in xd.XPathSelectElements("/channel/item[substring(title,1,1)='M'][position() mod 10 = 0]")

To use XPath, you only need to change the code line. XPath is easier to master than SQL. You can refer To W3C Shcool syntax introduction and MSDN's LINQ To XML for XPath users. you will be able To grasp the mysteries within a quarter of an hour.

However, the XmlReader method is not that easy. it also reads the title starting with M and takes one item for every ten items. after half a day, I did not come up with an elegant implementation method, so:


static List
   testXmlReader2(){    var lstChannel = new List
   ();    var reader = XmlReader.Create(xmlStream);    int n = 0;Channel channel = null;Search:    while (reader.Read())    {        if (reader.Name == "item" && reader.NodeType == XmlNodeType.Element)        {              while (reader.Read())            {                if (reader.Name == "item") break;                if (reader.NodeType != XmlNodeType.Element) continue;                switch (reader.Name)                {                    case "title":                        var title = reader.ReadString();                        if (title[0] != 'M') goto Search;                                  n++;                        if (n % 10 != 0) goto Search;                         channel = new Channel();                        channel.Title = title;                        break;                    case "link":                        channel.Link = reader.ReadString();                        break;                    case "description":                        channel.Description = reader.ReadString();                        break;                    case "content":                        channel.Content = reader.ReadString();                        break;                    case "pubDate":                        channel.PubDate = reader.ReadString();                        break;                    case "author":                        channel.Author = reader.ReadString();                        break;                    case "category":                        channel.Category = reader.ReadString();                        break;                    default:                        break;                }                lstChannel.Add(channel);            }        }    }    return lstChannel;}

We can see that the code structure has changed significantly. For conditional filtering, we only need to add the local variable n, adjust the object class initialization, and add the position of the set statement, even forced to use the goto statement that has been forgotten for many years to jump (VB is better ). The business logic is infiltrated into the implementation of code details. in Lao Zhao's words, there is a burst of Syntactic noise.

XmlTextReader's implementation proxy class XmlTextReaderImp (internal, cannot be used directly) is a super class with tens of thousands of lines of code and encapsulates a large number of operations directly on Xml characters. Since the operation is very close to the underlying layer, it is difficult to find a very good code optimization method at the macro level. If the filtering conditions, that is, the business logic is a little more complex, the code will be completely invisible, and the maintainability of comprehensibility will be shown in the image.

Now let's compare the time performance:

XmlDocment    26ms    XPathNavigator    26ms    XmlTextReader    20ms    Xml Linq    28ms

The data in the four methods has become closer, the time consumed by Document and Navigator has been greatly reduced, and the Reader mode has not decreased much because it still needs to be Read from the beginning to the end, the cost of object creation is reduced by 3 ms. What is strange is that the Linq method has not changed, and it is at the end.

You can test different query conditions. it can be seen that these four methods have their own performance limits, which are related to the Xml source size. For example, the first two methods depend on the execution time of the XmlDocument. Load method. on my local machine, it takes 23 ms to Load the Xml. The Linq method does not remain unchanged. if the processing result is few, the execution time will be reduced by 1 ~ 2 ms.

Document and Navigator methods, the performance will decrease significantly as the data volume increases. It is easy to guess because they have created many useless objects. Take a look at the memory usage of each mode. when all data is loaded and not filtered, the Document mode occupies about MB of memory, while the Navigator mode only needs about MB, this also explains why the Document mode performance decline is more obvious. The Reader mode supports full data loading. it only takes about 1 MB of memory, excluding the overhead of the program startup, which is less than half of the memory occupied by the first two types. In terms of memory, the Linq method boasts amazing performance, with only less than 500 k more than the Reader method.

Further analysis draws a further conclusion: unless there is a special need, use XmlTextReader with caution. it is not prepared enough for changes and is prone to errors. It is strongly recommended to use the Linq method. although the time performance is slightly lower than that of the Navigator method in some cases, the outstanding memory usage has laid its first choice. And I believe that the future of Linq To XML will be more powerful.

The above content compares the performance of the XML data reading method (2). For more information, see PHP Chinese network ( )!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.