Performance Analysis of datatable Data Retrieval

Source: Internet
Author: User

We know that. there are many types of data storage and retrieval solutions on the. NET platform-ADO. NET Entity Framework, Asp. NET Dynamic Data, XML, nhib.pdf, and LINQ to SQL, but for some reasons, such as platform restrictions, such as must be based on. net framework2.0 and the following platforms; the legacy or third-party data interfaces use datatable and so on, and still need to use datatable as the data storage structure. On the other hand, datatable is easy to use, and some data access interfaces may directly adopt the datatable structure. When using datatable for data retrieval, there are some important points that will seriously affect the efficiency of data retrieval.

In my recent work, I need to splice a large number of datatables. The data of the interface is provided in the form of dataset and datatable. it is not reasonable for the moment. When multiple requests are made at the same time, the server returns a dataset, it contains the result able of each request, which contains a column equivalent to a "keyword" column. Now we need to merge the columns in these able tables into a datatable according to this keyword, and then display them on the interface.

At the very beginning, I used the select method of datatable to traverse and splice cyclically, and found that it was very slow. So I summarized some experiences in operations such as querying datatable and shared them with you.

I Scenario

To simplify the problem, there are two datatables named table A and Table B, with the fields respectively

Table A stores the highest stock price information, and table B stores the lowest stock price information.

Securitycode high securitycode low

000001. sz 20 000001.sz 18.5

000002. sz 26 000002.sz 56

Now, we need to splice the two tables into one table. This table has three columns of fields, securitycode high low. The previous method is, create a datatable Table C containing the three fields, copy the security field, traverse the other two tables, and use the select method to find the corresponding securitycode, then copy the field to C. The discovery efficiency is very slow, and the problem occurs in the select method, so optimization is required.

Able Query Efficiency

Datatable provides two interfaces for Data Query: datatable. Select and datatable. Rows. Find.

The select method of datatable is used to pass in a series of conditions, and then return a data of the datarow [] type. It needs to traverse the entire table, then match the conditions one by one, and then return all matched values. Obviously, there is a problem with the select method for the previous able concatenation, because we only need to find a matching record.

Datatable. Rows's find finds the unique record on the first match. After a primary key is specified, the search is performed using a binary tree, which is highly efficient. To create a primary key, you must specify the primary key field of the datatable as follows:

dtA.PrimaryKey = new DataColumn[] { dtA.Columns["SecurityCode"] };

Of course, creating a primary key will increase the time consumption, which can be divided into creating before data filling and creating after data filling. When the data volume is large, the consumption of primary key creation needs to be taken into account. The following figure shows the time required to create a primary key before filling in the data, and then create a primary key and a dictionary. You can see:

Arraysize

Preindex Creation Time

Postindex Creation Time

Dictionary Creation Time

10

0

0

0

50

0

0

0

100

1

0

0

500

6

1

0

1000

15

2

0

5000

107

16

2

10000

261

42

5

50000

1727

271

31

100000

3525

544

47

500000

20209

2895

240

1000000

43382

5919

517

The figure is as follows:

You can obtain the following information:

  1. It takes more time to create a primary key before filling in the data. This is because, after the primary key is created and data is added to it, the index needs to be re-generated, it is not suitable to create a primary key on fields with frequent changes. In my laptop (win7 32bit, CPU t6600 2.0 GHz, Ram 2 GB), it takes about 5 seconds to create an index for a 1 million-record able, so when the data volume is large, the index creation time must be considered.
  2. Creating a datatable and then creating a primary key takes much less time to create a dictionary than directly creating a dictionary structure that is the same as that of the datatable, and does not increase as the number of records grows.

After the creation is complete, we will test the efficiency of the datatable retrieval in several cases. For this reason, the datatable of different scales is tested under the condition that the primary key is created and no index is created. select, datatable. rows. the query speed of the find statement. Because the time cannot be properly displayed when the able statement is small, the unit of the test is the number of tick of stopwatch. When the data size of each method is different, each method is executed 10 times and then the average value is obtained. The result is as follows:

Arraysize Dictionary create Dictionary search Table select Indexed table select Table rows find LINQ
10 13 3 40 25 8 16
50 27 2 69 37 8 27
100 51 3 112 38 9 39
500 210 3 589 51 11 155
1000 461 4 1175 60 14 328
5000 2264 14 8412 85 17 1540
10000 6235 7 16806 99 20 3354
50000 23768 8 150133 138 26 15824
100000 49133 7 259794 147 26 31525
500000 252103 51 1547935 181 30 158317
1000000 494647 9 2736616 209 30 315716

The figure is as follows:

You can see:

  1. If no primary key is created, the Select Operation on the able is inefficient. After a primary key is created, only select operations are performed on the columns where the primary key is located, which improves the speed. This gap is especially evident when the data volume is large. When the set size is 1000, the difference is nearly 20 times.
  2. The query efficiency of the LINQ able by LINQ is higher than that of the datatale. Select method, but the query efficiency is still lower than that of the datatable. Rows. Find method.
  3. When performing a unique search for the primary key, we should use datatable. rows. the find operation. When the datatable creates a primary key and only performs operations on the primary key, the find method is 3-6 times faster than the select method, this may be because the select method needs to parse and judge the filter strings. Because the select method can accept queries with multiple conditions and use some complex expressions, processing and parsing may take some time. In general, select is a complete search, that is, to find all records that meet the conditions in the entire set. The find method only retrieves the primary key fields. If the primary key is not set, an error is returned when you call the find method.
  4. Using dictionary instead of the able structure for retrieval can achieve the fastest speed and is almost unaffected by the scale. However, when the data volume is large, it may take time to convert a able to a corresponding dictionary structure. If operations are performed frequently, for example, when multiple datatables are spliced Based on keywords, a dictionary <string, datarow> is used to store keywords, And the containskey-based hash method can be used to search for keywords, which greatly improves efficiency. If the able column has repeated fields and the primary key cannot be set up, you can use dictionary <string, list <datarow> to solve the problem that datatable cannot create a primary key, this causes query performance degradation.

 

3. Implementation Effect

Based on the above analysis, the Select method is replaced in actual work, and a dictionary <string, datarow> contains the structure C of all rows of the able object after the target is merged. the keyword is securitycode, and datarow is a row containing three columns of data: securitycode, high, and low. When merging, directly traverse all rows in Table A, and then judge whether the securitycode in the row is included in C. If the securitycode in the row contains and is retrieved, assign a value directly. Then traverse table B. The whole process improves the efficiency of able merging by at least 10 times.

Thu Conclusion

This article briefly introduces two methods for retrieving data in datatable, datatable. Select and datatable. Rows. Find. Before the execution efficiency of the test method, we introduced how to set a primary key for the datatable, and compared the time spent on setting the primary key before and after data filling. The results show that,After data is filled in, setting the primary key is much more efficient than setting the primary key before data is filled in.After the primary key is set, the performance of the able. Select method when only primary key fields are filtered is compared,When only the primary key is retrievedDatatable. SelectThe retrieval speed is much faster than that without a primary key.Under the same conditions,If you only need to find a record, useDatatable. Rows. FindCompared with datatable. SelectFaster.When you frequently perform operations on datatable queries,Avoid calling in a loopDatatable. SelectThe datatableConvert to an equivalent dictionaryStructure, which can effectively solve the problem that duplicate key values make it impossible to create a primary key.,AndDicitonaryThe use of hash tables to search can greatly improve the query efficiency.

Click here to download the test cases and code in this article. It is helpful for you to Improve the efficiency when searching datatable.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.