Performance analysis of DataTable data retrieval [GO]

Source: Internet
Author: User

The original link author is very well written, I learned a lot of things, here is just reprint!

We know that in. NET platform has a lot of data storage, retrieval solution-ado.net Entity framework,asp.net Dynamic data,xml, nhibernate,linq to SQL and so on, but for some reasons, such as platform limitations, For example, it must be based on. NET Framework2.0 and the following platforms; The legacy or third-party data interface uses a DataTable, and so on, still needs to use a DataTable as the data storage structure. On the other hand, the DataTable is relatively easy to use, some data access interfaces may directly adopt the DataTable structure. There are some areas to be aware of when using DataTable for data retrieval, which can seriously affect the retrieval efficiency of data.

I have recently worked on a large number of DataTable splicing. The data of the interface is provided in the form of a dataset and then a DataTable, and the service side returns a DataSet containing the results of each request in a DataTable that contains a column equivalent to " Keyword column. Now you need to follow this keyword to merge the columns from these DataTable into a DataTable and show it to the interface.

At first, I used the DataTable Select method to iterate through the stitching implementation, found very slow, so summed up the query of the DataTable and other operations of some experience, and share with you.

a scene

To simplify the problem, there are two DataTable, named Table A, table B, and the fields are

Table A, storing the highest price information of the stock, table B stores the stock lowest price information

Securitycode High Securitycode Low

000001.SZ 000001.SZ 18.5

000002.SZ 000002.SZ 56

Now it's time to put the two tables together in a table with three columns of fields, Securitycode high, and the previous method is to create a DataTable table C containing the three fields, copy the Security field, and then traverse the other two tables. Use the Select method to find the corresponding Securitycode, and then copy to the corresponding field in C. The discovery efficiency is slow, the problem appears on the Select method, and needs to be optimized.

two query efficiency for DataTable

The DataTable provides two query data interfaces, DataTable.Select and DataTable.Rows.Find methods.

The Select method of a DataTable passes through a series of conditions and then returns a datarow[] type of data, he needs to traverse the entire table, then match the criteria one by another, and then return all matching values. Obviously, there is a problem with the Select method of previous DataTable stitching, because we only need to find a record on the match.

DataTable.Rows find finds the only record on the first match. Based on the designation of the primary key, it is efficient to find a binary tree. To create a primary key, you need to specify the PrimaryKey field for the DataTable as follows:

Dta.primarykey = new datacolumn[] {dta.columns["Securitycode"]};

Of course, creating a primary key increases time consumption, which is also created after the data is populated and after the data is populated. In the case of large amounts of data, the consumption of creating a primary key needs to be taken into account. The following figure shows the time required to create a primary key before populating the data, and then creating the primary key, as well as creating the dictionary. can see:

ArraySize

Preindex Creation Time

Postindex Creation Time

Dictionary Creation Time

10

0

0

0

50

0

0

0

100

1

0

0

500

6

1

0

1000

15

2

0

5000

107

16

2

10000

261

42

5

50000

1727

271

31

100000

3525

544

47

500000

20209

2895

240

1000000

43382

5919

517

The drawing is as follows:

From can get:

    1. Creating a primary key before populating the data, and then populating the data, consumes more time than the primary key is created after the data is populated. This is because, after creating the primary key, adding data to it will result in the need to rebuild the index, which is the same as in the database, where it is not appropriate to create a primary key on a field that is frequently changed. On my Notebook (Win7 32bit,cpu T6600 2.0ghz,ram 2GB), it takes approximately 5 seconds to create an index for a DataTable of 1 million records, so you need to consider the creation time of the index in the case of large data volumes.
    2. Creating a DataTable and then creating a primary key is much less time needed to create a dictionary than it is to create the same dictionary structure as the DataTable, and it will almost never grow larger with the size of the record strip.

After the creation is complete, the following is a test of the retrieval efficiency of the DataTable in several scenarios. To this end, in the establishment of the primary key and no index, the test of datatable.select at different scales, DataTable.Rows.Find query speed, because the DataTable is relatively small, time is not very good display, So the unit used for the test is the number of ticks in stopwatch. Each method is executed 10 times in different data sizes, and then averaged, with the following results:

ArraySize Dictionary Create Dictionary Search Table Select Indexed Table Select Table Rows Find Linq
10 13 3 40 25 8 16
50 27 2 69 37 8 27
100 51 3 112 38 9 39
500 210 3 589 51 11 155
1000 461 4 1175 60 14 328
5000 2264 14 8412 85 17 1540
10000 6235 7 16806 99 20 3354
50000 23768 8 150133 138 26 15824
100000 49133 7 259794 147 26 31525
500000 252103 51 1547935 181 30 158317
1000000 494647 9 2736616 209 30 315716

The drawing is as follows:

can see:

    1. It is inefficient to perform a select operation on a DataTable without creating a primary key. After the primary key is established, only select operations are performed on the column where the primary key is located, which is much faster, especially when the data is large, which is nearly 20 times times larger when the collection size is 1000.
    2. The query efficiency of LINQ to DataTable is higher than the Datatale.select method, but it is still less efficient than the DataTable.Rows.Find method.
    3. When a primary key is uniquely located, the The Find method is 3-6 times faster than the Select method, possibly because the Select method needs to parse and judge the filter string inside, using the DataTable.Rows.Find operation, which establishes the primary key in the DataTable and operates only on the primary key. Because the Select method can accept queries for multiple conditions and with some more complex expressions, processing and parsing can take some time. And under general Conditions Select is a full search, that is, finding the entire collection to find all the records that meet the criteria. The Find method retrieves only the primary key field, and if no primary key is set, the call to the Find method causes an error.
    4. Using dictionary instead of a DataTable structure to retrieve the fastest speed, and almost unaffected by the size, but in the case of large data volumes, converting a DataTable to the corresponding dictionary structure may take time, if the operation is frequent, For example, in the case of multiple DataTable concatenation based on keywords, the target DataTable is stored in a dictionary<string,datarow> way, The ability to use ContainsKey's hash-based approach to searching for keywords can greatly improve efficiency. And in the case that the DataTable column has duplicate fields and cannot establish a primary key, the dictionary<string,list<datarow>> can be used to solve the DataTable cannot create the primary key. This can lead to a problem in finding performance degradation.

three implementation effect

Based on the above analysis, in practice, the Select method is replaced, creating a structure C with all the rows of the target merged DataTable object of type dictionary<string,datarow>. Where the keyword is securitycode,datarow is a row containing three columns of securitycode,high,low data. At the time of merging, traverse all the rows of table a directly, and then determine whether the Securitycode in the row is included in C, and if so, the value is directly assigned. Then traverse table B. The entire process increases the efficiency of the DataTable merge by at least 10 times times.

Four conclusion

This paper briefly introduces two methods of retrieving data in DataTable, DataTable.Select and DataTable.Rows.Find methods. Before testing the execution efficiency of a method, it describes how to set a primary key for a DataTable and compares the time it takes to set the primary key before the data is populated and after the data is populated, and the results indicate that setting the primary key is much more efficient than setting the primary key before the data is populated. after you set the primary key, compare the performance of the DataTable.Select method when there are no primary keys, and the results show that when you retrieve only the primary key, the primary key is set to use The DataTable.Select method will be much faster to retrieve than without a primary key. under the same conditions, if you only need to find a record, using DataTable.Rows.Find is much faster than DataTable.Select. to avoid calling the datatable.select method in the loop body when you need to frequently manipulate DataTable queries, use the DataTable Conversion to an equivalent dictionary structure can effectively solve the problem that the primary key cannot be created because of duplicate key values , and dicitonary The use of a hash table to find a way to greatly improve query efficiency.

Performance analysis of DataTable data retrieval [GO]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.