Development experience in improving XML processing performance of. NET Applications

Last Update:2018-12-05 Source: Internet

Author: User

Tags server memory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I will share my practical experience on optimizing. NET application performance over the past two weeks. I would like to share more with you. Due to factors such as business characteristics, overall architecture design, and peripheral systems, the performance bottleneck of this application is mainly caused by XML-related processing, including big XML data (more than 50 MB) parsing and querying, downloading XML data from the peripheral system, and concurrent processing in the B/S structure for fast response, etc. Through this practice, we have gained more in-depth research and understanding on the XML processing provided by. NET Framework, and have done some verification and implementation. The final result is still very satisfactory. The goal of reducing the CPU utilization of the server is also achieved, from more than 50% (dual-core machines, in fact a single core is saturated) to about 10%, some services can be reduced to about 5%.

First, list the XML processing methods that I know.In. NET Framework 3.5, if no other Parser (such as vtd-xml) is used, there are four main types:

Document Tree Processing Method Based on XML Dom

The main methods or attributes used include Load, LoadXml, OuterXMl, InnerXml, and SelectNodes (XPath query.

C # code

XmlDocument doc = new XmlDocument ();
Doc. Load ("data. xml ");
XmlNodeList children = doc DocumentElement. SelectNodes ("*");
XmlNodeList allauthors = doc. SelectNodes ("// authors/author ");
Foreach (XmlNode node in allauthors)
{
// Handle node.
}

XmlDocument doc = new XmlDocument();  doc.Load("data.xml");  XmlNodeList children = doc DocumentElement.SelectNodes("*");  XmlNodeList allauthors = doc.SelectNodes("//authors/author");  foreach (XmlNode node in allauthors) {     // handle node. }

Quasi-SAX Processing Method Based on XML Dom

The main methods or attributes used include LoadXml, CreateNavigator, Select (XPath query), and MoveNext. The reason for this is the "quasi-SAX method", because although it has a one-way forward, read-only, and tag trigger nature, this is, after all, on the premise of using the Xml Dom, that is, the Document tree (called by the LoadXml method) must be constructed.

C # code

XmlDocument dom = new XmlDocument ();
Dom. Load ("data. xml ");
XPathNavigator nav = dom. CreateNavigator ();
Nav. MoveToRoot ();
String xpath = @ "// Rows [OneColumn = 'somevalue']";
XPathNodeIterator ite = nav. Select (xpath, null );
While (ite. MoveNext ())
{
// Handle ite. Current. Value.
}

XmlDocument dom = new XmlDocument();  dom.Load("data.xml");  XPathNavigator nav = dom.CreateNavigator();  nav.MoveToRoot();  string xpath = @"//Rows[OneColumn = 'SomeValue']";  XPathNodeIterator ite = nav.Select(xpath, null);  while (ite.MoveNext()) {     // handle ite.Current.Value. }

XmlTextReader-based SAX Processing

The main methods or attributes used include Read, LocalName, and ReadString. This method is consistent with the SAX processing model, with one-way forward, tag triggering, and no Dom tree support.

C # code

XmlReader = new XmlTextReader ("data. xml ");
While (xmlReader. Read ())
{
Switch (xmlReader. LocalName)
{
Case "OneColumn ":
{
// Handle xmlReader. ReadString ().
Break;
}
Case "OtherColumn ":
{
// Handle "othercolumn" element.
Break;
}
Default:
{
Break;
}
}
}

xmlReader = new XmlTextReader("data.xml");  while (xmlReader.Read()) {     switch (xmlReader.LocalName)     {         case "OneColumn":         {             // handle xmlReader.ReadString().              break;         }         case "OtherColumn":         {             // handle "OtherColumn" element.              break;         }                 default:         {             break;         }     } }

It should be noted that when using this XmlTextReader to pull the model, the XML file is rarely used as the data source to construct the XmlTextReader object. More methods are constructed using streams, therefore, we can also think that it is easy to use this stream and one-way forward pushing method to achieve parallel processing of XML data loading, parsing, and processing, which can improve the overall performance, this will be mentioned later.

Processing Method Based on Linq to XML

This is. net Framework 3.5 provides new features, which are different from the above three methods. It uses xdocument as the underlying data structure support, and uses LINQ to entity to directly query the xdocument content of the set extension framework, you can attach conditions in order, group, and so on, and use delayed loading and processing methods to reduce unnecessary operations and improve performance. These methods or attributes include load, descendants, element (). Value, select, where, and order.

C # code

Private void initdata ()
{
XDocument xdom = XDocument. Load ("data. xml ");
// Query rows by "OonColumn" value equal "SomeValue"
Var data = from d in xdom. Descendants ("Rows ")
Where d. Element ("OneColumn"). Value = "SomeValue"
Orderby d. Element ("OneColumn"). Value
Select d;
Buildcolumndropdownlist (data, "onecolumn", this. dropdownlist1 );
Buildcolumndropdownlist (data, "twocolumn", this. dropdownlist2 );
VaR subdata = data. Skip (this. pageindex * 20). Take (20); // paging.
This. repeater1.datasource = subdata;
This. repeater1.databind ();
}
Private void buildcolumndropdownlist (iorderedenumerable <xelement> data, string columnname, dropdownlist)
{
If (Data = NULL)
Throw new argumentnullexception ("data ");
If (string. isnullorempty (columnname ))
Throw new argumentnullexception ("columnname ");
If (dropdownlist = NULL)
Throw new argumentnullexception ("dropdownlist ");
VaR coldata = from irx in data. descendants (columnname)
Group irx by irx. value into G
Orderby G. Key
Select G. Key;
Dropdownlist. datasource = coldata. toarray ();
Dropdownlist. databind ();
}

private void InitData() {     XDocument xdom = XDocument.Load("data.xml");      // Query rows by "OonColumn" value equal "SomeValue"     var data = from d in xdom.Descendants("Rows")                where d.Element("OneColumn").Value == "SomeValue"                orderby d.Element("OneColumn").Value                select d;      BuildColumnDropDownList(data, "OneColumn", this.DropDownList1);     BuildColumnDropDownList(data, "TwoColumn", this.DropDownList2);      var subData = data.Skip(this.pageIndex * 20).Take(20);  // Paging.      this.Repeater1.DataSource = subData;     this.Repeater1.DataBind(); }  private void BuildColumnDropDownList(IOrderedEnumerable<XElement> data, string columnName, DropDownList dropDownList) {     if (data == null)         throw new ArgumentNullException("data");      if (string.IsNullOrEmpty(columnName))         throw new ArgumentNullException("columnName");      if (dropDownList == null)         throw new ArgumentNullException("dropDownList");      var colData = from irx in data.Descendants(columnName)                   group irx by irx.Value into g                   orderby g.Key                   select g.Key;      dropDownList.DataSource = colData.ToArray();     dropDownList.DataBind(); }

The above four processing methods have their own advantages and disadvantages. It is not absolute to select a relatively high-performance processing method. Therefore, we need to consider the specific XML processing features and design.The following lists some of the key points that affect the XML processing performance.

The XML data size to be processed. The boundary can be 1 MB.
If the XML Dom tree is built, whether the XML source changes every time. That is to say, whether to re-construct the Dom tree through Load every time XML data is processed, that is, whether the Dom tree can be cached. For a typical query-type business, the queried source remains unchanged, but it needs to be re-checked every time, so that the Dom tree can be loaded and cached at a time.
XML data is stored in the program. Whether to use the Dom tree (XmlDocument type) or XML String (String type), which method can be used, but the key is to be unified. Some modules cannot use XmlDocument, some modules use strings for a very simple reason. InnerXML, OuterXML, and LoadXml are not free lunches, and every call (in pairs) is unnecessary waste of resources.

Based on the key points listed above, compare the existing four methods one by one.

The first is about the XML data volume. According to tests and verification, it is found that the data size directly affects the parsing and construction performance of the DOM tree (mainly CPU, memory utilization and response time ), that is, the load and loadxml methods. In addition to the third "xmltextreader-based sax processing method", all the other three processing methods have clear load actions. The verification results show that the XML data size of 1 MB can basically be customized as a demarcation line,If the data volume to be processed is greater than 1 MB, select the XMLTextReader-based SAX processing method to process the data.

However,It is also found through verification that, although XmlTextReader omitted the Dom tree parsing process, it is also because it does not have a tree relationship, thus resulting in better performance than XmlDocument for Traversing XML with XmlTextReader.Therefore, if a business can construct a Dom tree at a time and cache it for subsequent use (for example, in the query scenario), we recommend that you use the method for generating the Dom tree, the second "quasi-SAX Processing Method Based on XML Dom", although it will be difficult to load for the first time.

For XML Processing of less than 5 MB of data, it is found that the best choice is Linq to XML.The Design of delayed loading makes the construction of Linq very fast, and the performance in real parsing, query, and filtering is also satisfactory.

About parallel processing of big XML data.In fact, for the XMLTextReader-based SAX processing method, if the input XML data source is in the form of a stream, XML loading can be parallel to the SAX parsing, it can also be designed like this, because it is hard to imagine that a 30-or 40-m xml data is first loaded into the memory, and then start the while (reader) of XmlTextReader. moveNext) process, as mentioned above. In my application, this large XML data is downloaded from the peripheral system through an http get request. In this scenario, we can use parallel processing, that is, parse while downloading.

C # code

Using (WebClient client = new WebClient ())
{
// String content = client. DownloadString (@ "http://server.foo/get_xml? (Id = 123 ");
// StringReader sr = new StringReader (content );
// XmlTextReader reader = new XmlTextReader (new MemoryStream (ASCIIEncoding. Default. GetBytes (content )));
WebRequest request = HttpWebRequest. Create (@ "http://server.foo/get_xml? (Id = 123 ");
XmlTextReader reader = new XmlTextReader (request. GetResponse (). GetResponseStream ());
While (xmlReader. Read ())
{
// Handle read.
}
}

using(WebClient client = new WebClient()) {     // string content = client.DownloadString(@"http://server.foo/get_xml?id=123");     // StringReader sr = new StringReader(content);     // XmlTextReader reader = new XmlTextReader(new MemoryStream(ASCIIEncoding.Default.GetBytes(content)));      WebRequest request = HttpWebRequest.Create(@"http://server.foo/get_xml?id=123");      XmlTextReader reader = new XmlTextReader(request.GetResponse().GetResponseStream());      while (xmlReader.Read())     {         // handle read.     } }

Of course, this parallel feasibility and design depends on specific business scenarios and specific analysis. It would be good if it could be parallel.

Finally, the SQL Server database provides XQuary to process XML data, allowing the data to be directly returned from the XML data stored in the relational database. However, it has been tested before, and the performance is not satisfactory. The SQL Server Memory increases greatly, and the response time is not reliable. Who else should I ask for advice ~~

If you forget to mention it, try not to use the "XML Dom-based Document tree processing method" to process XML. Try to use the other three methods. Especially when XML data is large.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More