Development experience in improving XML processing performance of. NET Applications

Source: Internet
Author: User
Tags server memory

I will share my practical experience on optimizing. NET application performance over the past two weeks. I would like to share more with you. Due to factors such as business characteristics, overall architecture design, and peripheral systems, the performance bottleneck of this application is mainly caused by XML-related processing, including big XML data (more than 50 MB) parsing and querying, downloading XML data from the peripheral system, and concurrent processing in the B/S structure for fast response, etc. Through this practice, we have gained more in-depth research and understanding on the XML processing provided by. NET Framework, and have done some verification and implementation. The final result is still very satisfactory. The goal of reducing the CPU utilization of the server is also achieved, from more than 50% (dual-core machines, in fact a single core is saturated) to about 10%, some services can be reduced to about 5%.

 

First, list the XML processing methods that I know.In. NET Framework 3.5, if no other Parser (such as vtd-xml) is used, there are four main types:

 

Document Tree Processing Method Based on XML Dom

The main methods or attributes used include Load, LoadXml, OuterXMl, InnerXml, and SelectNodes (XPath query.

 

C # code
  1. XmlDocument doc = new XmlDocument ();
  2. Doc. Load ("data. xml ");
  3. XmlNodeList children = doc DocumentElement. SelectNodes ("*");
  4. XmlNodeList allauthors = doc. SelectNodes ("// authors/author ");
  5. Foreach (XmlNode node in allauthors)
  6. {
  7. // Handle node.
  8. }
XmlDocument doc = new XmlDocument();  doc.Load("data.xml");  XmlNodeList children = doc DocumentElement.SelectNodes("*");  XmlNodeList allauthors = doc.SelectNodes("//authors/author");  foreach (XmlNode node in allauthors) {     // handle node. } 

 

Quasi-SAX Processing Method Based on XML Dom

The main methods or attributes used include LoadXml, CreateNavigator, Select (XPath query), and MoveNext. The reason for this is the "quasi-SAX method", because although it has a one-way forward, read-only, and tag trigger nature, this is, after all, on the premise of using the Xml Dom, that is, the Document tree (called by the LoadXml method) must be constructed.

 

C # code
  1. XmlDocument dom = new XmlDocument ();
  2. Dom. Load ("data. xml ");
  3. XPathNavigator nav = dom. CreateNavigator ();
  4. Nav. MoveToRoot ();
  5. String xpath = @ "// Rows [OneColumn = 'somevalue']";
  6. XPathNodeIterator ite = nav. Select (xpath, null );
  7. While (ite. MoveNext ())
  8. {
  9. // Handle ite. Current. Value.
  10. }
XmlDocument dom = new XmlDocument();  dom.Load("data.xml");  XPathNavigator nav = dom.CreateNavigator();  nav.MoveToRoot();  string xpath = @"//Rows[OneColumn = 'SomeValue']";  XPathNodeIterator ite = nav.Select(xpath, null);  while (ite.MoveNext()) {     // handle ite.Current.Value. } 

 

XmlTextReader-based SAX Processing

The main methods or attributes used include Read, LocalName, and ReadString. This method is consistent with the SAX processing model, with one-way forward, tag triggering, and no Dom tree support.

 

C # code
  1. XmlReader = new XmlTextReader ("data. xml ");
  2. While (xmlReader. Read ())
  3. {
  4. Switch (xmlReader. LocalName)
  5. {
  6. Case "OneColumn ":
  7. {
  8. // Handle xmlReader. ReadString ().
  9. Break;
  10. }
  11. Case "OtherColumn ":
  12. {
  13. // Handle "othercolumn" element.
  14. Break;
  15. }
  16. Default:
  17. {
  18. Break;
  19. }
  20. }
  21. }
xmlReader = new XmlTextReader("data.xml");  while (xmlReader.Read()) {     switch (xmlReader.LocalName)     {         case "OneColumn":         {             // handle xmlReader.ReadString().              break;         }         case "OtherColumn":         {             // handle "OtherColumn" element.              break;         }                 default:         {             break;         }     } } 

 

It should be noted that when using this XmlTextReader to pull the model, the XML file is rarely used as the data source to construct the XmlTextReader object. More methods are constructed using streams, therefore, we can also think that it is easy to use this stream and one-way forward pushing method to achieve parallel processing of XML data loading, parsing, and processing, which can improve the overall performance, this will be mentioned later.

 

Processing Method Based on Linq to XML

This is. net Framework 3.5 provides new features, which are different from the above three methods. It uses xdocument as the underlying data structure support, and uses LINQ to entity to directly query the xdocument content of the set extension framework, you can attach conditions in order, group, and so on, and use delayed loading and processing methods to reduce unnecessary operations and improve performance. These methods or attributes include load, descendants, element (). Value, select, where, and order.

 

C # code
  1. Private void initdata ()
  2. {
  3. XDocument xdom = XDocument. Load ("data. xml ");
  4. // Query rows by "OonColumn" value equal "SomeValue"
  5. Var data = from d in xdom. Descendants ("Rows ")
  6. Where d. Element ("OneColumn"). Value = "SomeValue"
  7. Orderby d. Element ("OneColumn"). Value
  8. Select d;
  9. Buildcolumndropdownlist (data, "onecolumn", this. dropdownlist1 );
  10. Buildcolumndropdownlist (data, "twocolumn", this. dropdownlist2 );
  11. VaR subdata = data. Skip (this. pageindex * 20). Take (20); // paging.
  12. This. repeater1.datasource = subdata;
  13. This. repeater1.databind ();
  14. }
  15. Private void buildcolumndropdownlist (iorderedenumerable <xelement> data, string columnname, dropdownlist)
  16. {
  17. If (Data = NULL)
  18. Throw new argumentnullexception ("data ");
  19. If (string. isnullorempty (columnname ))
  20. Throw new argumentnullexception ("columnname ");
  21. If (dropdownlist = NULL)
  22. Throw new argumentnullexception ("dropdownlist ");
  23. VaR coldata = from irx in data. descendants (columnname)
  24. Group irx by irx. value into G
  25. Orderby G. Key
  26. Select G. Key;
  27. Dropdownlist. datasource = coldata. toarray ();
  28. Dropdownlist. databind ();
  29. }
private void InitData() {     XDocument xdom = XDocument.Load("data.xml");      // Query rows by "OonColumn" value equal "SomeValue"     var data = from d in xdom.Descendants("Rows")                where d.Element("OneColumn").Value == "SomeValue"                orderby d.Element("OneColumn").Value                select d;      BuildColumnDropDownList(data, "OneColumn", this.DropDownList1);     BuildColumnDropDownList(data, "TwoColumn", this.DropDownList2);      var subData = data.Skip(this.pageIndex * 20).Take(20);  // Paging.      this.Repeater1.DataSource = subData;     this.Repeater1.DataBind(); }  private void BuildColumnDropDownList(IOrderedEnumerable<XElement> data, string columnName, DropDownList dropDownList) {     if (data == null)         throw new ArgumentNullException("data");      if (string.IsNullOrEmpty(columnName))         throw new ArgumentNullException("columnName");      if (dropDownList == null)         throw new ArgumentNullException("dropDownList");      var colData = from irx in data.Descendants(columnName)                   group irx by irx.Value into g                   orderby g.Key                   select g.Key;      dropDownList.DataSource = colData.ToArray();     dropDownList.DataBind(); } 

 

The above four processing methods have their own advantages and disadvantages. It is not absolute to select a relatively high-performance processing method. Therefore, we need to consider the specific XML processing features and design.The following lists some of the key points that affect the XML processing performance.

 

  1. The XML data size to be processed. The boundary can be 1 MB.
  2. If the XML Dom tree is built, whether the XML source changes every time. That is to say, whether to re-construct the Dom tree through Load every time XML data is processed, that is, whether the Dom tree can be cached. For a typical query-type business, the queried source remains unchanged, but it needs to be re-checked every time, so that the Dom tree can be loaded and cached at a time.
  3. XML data is stored in the program. Whether to use the Dom tree (XmlDocument type) or XML String (String type), which method can be used, but the key is to be unified. Some modules cannot use XmlDocument, some modules use strings for a very simple reason. InnerXML, OuterXML, and LoadXml are not free lunches, and every call (in pairs) is unnecessary waste of resources.

 

Based on the key points listed above, compare the existing four methods one by one.

 

The first is about the XML data volume. According to tests and verification, it is found that the data size directly affects the parsing and construction performance of the DOM tree (mainly CPU, memory utilization and response time ), that is, the load and loadxml methods. In addition to the third "xmltextreader-based sax processing method", all the other three processing methods have clear load actions. The verification results show that the XML data size of 1 MB can basically be customized as a demarcation line,If the data volume to be processed is greater than 1 MB, select the XMLTextReader-based SAX processing method to process the data.

 

However,It is also found through verification that, although XmlTextReader omitted the Dom tree parsing process, it is also because it does not have a tree relationship, thus resulting in better performance than XmlDocument for Traversing XML with XmlTextReader.Therefore, if a business can construct a Dom tree at a time and cache it for subsequent use (for example, in the query scenario), we recommend that you use the method for generating the Dom tree, the second "quasi-SAX Processing Method Based on XML Dom", although it will be difficult to load for the first time.

 

For XML Processing of less than 5 MB of data, it is found that the best choice is Linq to XML.The Design of delayed loading makes the construction of Linq very fast, and the performance in real parsing, query, and filtering is also satisfactory.

 

About parallel processing of big XML data.In fact, for the XMLTextReader-based SAX processing method, if the input XML data source is in the form of a stream, XML loading can be parallel to the SAX parsing, it can also be designed like this, because it is hard to imagine that a 30-or 40-m xml data is first loaded into the memory, and then start the while (reader) of XmlTextReader. moveNext) process, as mentioned above. In my application, this large XML data is downloaded from the peripheral system through an http get request. In this scenario, we can use parallel processing, that is, parse while downloading.

 

C # code
  1. Using (WebClient client = new WebClient ())
  2. {
  3. // String content = client. DownloadString (@ "http://server.foo/get_xml? (Id = 123 ");
  4. // StringReader sr = new StringReader (content );
  5. // XmlTextReader reader = new XmlTextReader (new MemoryStream (ASCIIEncoding. Default. GetBytes (content )));
  6. WebRequest request = HttpWebRequest. Create (@ "http://server.foo/get_xml? (Id = 123 ");
  7. XmlTextReader reader = new XmlTextReader (request. GetResponse (). GetResponseStream ());
  8. While (xmlReader. Read ())
  9. {
  10. // Handle read.
  11. }
  12. }
using(WebClient client = new WebClient()) {     // string content = client.DownloadString(@"http://server.foo/get_xml?id=123");     // StringReader sr = new StringReader(content);     // XmlTextReader reader = new XmlTextReader(new MemoryStream(ASCIIEncoding.Default.GetBytes(content)));      WebRequest request = HttpWebRequest.Create(@"http://server.foo/get_xml?id=123");      XmlTextReader reader = new XmlTextReader(request.GetResponse().GetResponseStream());      while (xmlReader.Read())     {         // handle read.     } }

 

Of course, this parallel feasibility and design depends on specific business scenarios and specific analysis. It would be good if it could be parallel.

 

Finally, the SQL Server database provides XQuary to process XML data, allowing the data to be directly returned from the XML data stored in the relational database. However, it has been tested before, and the performance is not satisfactory. The SQL Server Memory increases greatly, and the response time is not reliable. Who else should I ask for advice ~~

If you forget to mention it, try not to use the "XML Dom-based Document tree processing method" to process XML. Try to use the other three methods. Especially when XML data is large.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.