Performance Analysis of datatable Data Retrieval

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We know that. there are many types of data storage and retrieval solutions on the. NET platform-ADO. NET Entity Framework, Asp. NET Dynamic Data, XML, nhib.pdf, and LINQ to SQL, but for some reasons, such as platform restrictions, such as must be based on. net framework2.0 and the following platforms; the legacy or third-party data interfaces use datatable and so on, and still need to use datatable as the data storage structure. On the other hand, datatable is easy to use, and some data access interfaces may directly adopt the datatable structure. When using datatable for data retrieval, there are some important points that will seriously affect the efficiency of data retrieval.

In my recent work, I need to splice a large number of datatables. The data of the interface is provided in the form of dataset and datatable. it is not reasonable for the moment. When multiple requests are made at the same time, the server returns a dataset, it contains the result able of each request, which contains a column equivalent to a "keyword" column. Now we need to merge the columns in these able tables into a datatable according to this keyword, and then display them on the interface.

At the very beginning, I used the select method of datatable to traverse and splice cyclically, and found that it was very slow. So I summarized some experiences in operations such as querying datatable and shared them with you.

I Scenario

To simplify the problem, there are two datatables named table A and Table B, with the fields respectively

Table A stores the highest stock price information, and table B stores the lowest stock price information.

Securitycode high securitycode low

000001. sz 20 000001.sz 18.5

000002. sz 26 000002.sz 56

Now, we need to splice the two tables into one table. This table has three columns of fields, securitycode high low. The previous method is, create a datatable Table C containing the three fields, copy the security field, traverse the other two tables, and use the select method to find the corresponding securitycode, then copy the field to C. The discovery efficiency is very slow, and the problem occurs in the select method, so optimization is required.

Able Query Efficiency

Datatable provides two interfaces for Data Query: datatable. Select and datatable. Rows. Find.

The select method of datatable is used to pass in a series of conditions, and then return a data of the datarow [] type. It needs to traverse the entire table, then match the conditions one by one, and then return all matched values. Obviously, there is a problem with the select method for the previous able concatenation, because we only need to find a matching record.

Datatable. Rows's find finds the unique record on the first match. After a primary key is specified, the search is performed using a binary tree, which is highly efficient. To create a primary key, you must specify the primary key field of the datatable as follows:

dtA.PrimaryKey = new DataColumn[] { dtA.Columns["SecurityCode"] };

Of course, creating a primary key will increase the time consumption, which can be divided into creating before data filling and creating after data filling. When the data volume is large, the consumption of primary key creation needs to be taken into account. The following figure shows the time required to create a primary key before filling in the data, and then create a primary key and a dictionary. You can see:

Arraysize	Preindex Creation Time	Postindex Creation Time	Dictionary Creation Time
10	0	0	0
50	0	0	0
100	1	0	0
500	6	1	0
1000	15	2	0
5000	107	16	2
10000	261	42	5
50000	1727	271	31
100000	3525	544	47
500000	20209	2895	240
1000000	43382	5919	517

The figure is as follows:

You can obtain the following information:

It takes more time to create a primary key before filling in the data. This is because, after the primary key is created and data is added to it, the index needs to be re-generated, it is not suitable to create a primary key on fields with frequent changes. In my laptop (win7 32bit, CPU t6600 2.0 GHz, Ram 2 GB), it takes about 5 seconds to create an index for a 1 million-record able, so when the data volume is large, the index creation time must be considered.
Creating a datatable and then creating a primary key takes much less time to create a dictionary than directly creating a dictionary structure that is the same as that of the datatable, and does not increase as the number of records grows.

After the creation is complete, we will test the efficiency of the datatable retrieval in several cases. For this reason, the datatable of different scales is tested under the condition that the primary key is created and no index is created. select, datatable. rows. the query speed of the find statement. Because the time cannot be properly displayed when the able statement is small, the unit of the test is the number of tick of stopwatch. When the data size of each method is different, each method is executed 10 times and then the average value is obtained. The result is as follows:

Arraysize	Dictionary create	Dictionary search	Table select	Indexed table select	Table rows find	LINQ
10	13	3	40	25	8	16
50	27	2	69	37	8	27
100	51	3	112	38	9	39
500	210	3	589	51	11	155
1000	461	4	1175	60	14	328
5000	2264	14	8412	85	17	1540
10000	6235	7	16806	99	20	3354
50000	23768	8	150133	138	26	15824
100000	49133	7	259794	147	26	31525
500000	252103	51	1547935	181	30	158317
1000000	494647	9	2736616	209	30	315716

The figure is as follows:

You can see:

If no primary key is created, the Select Operation on the able is inefficient. After a primary key is created, only select operations are performed on the columns where the primary key is located, which improves the speed. This gap is especially evident when the data volume is large. When the set size is 1000, the difference is nearly 20 times.
The query efficiency of the LINQ able by LINQ is higher than that of the datatale. Select method, but the query efficiency is still lower than that of the datatable. Rows. Find method.
When performing a unique search for the primary key, we should use datatable. rows. the find operation. When the datatable creates a primary key and only performs operations on the primary key, the find method is 3-6 times faster than the select method, this may be because the select method needs to parse and judge the filter strings. Because the select method can accept queries with multiple conditions and use some complex expressions, processing and parsing may take some time. In general, select is a complete search, that is, to find all records that meet the conditions in the entire set. The find method only retrieves the primary key fields. If the primary key is not set, an error is returned when you call the find method.
Using dictionary instead of the able structure for retrieval can achieve the fastest speed and is almost unaffected by the scale. However, when the data volume is large, it may take time to convert a able to a corresponding dictionary structure. If operations are performed frequently, for example, when multiple datatables are spliced Based on keywords, a dictionary <string, datarow> is used to store keywords, And the containskey-based hash method can be used to search for keywords, which greatly improves efficiency. If the able column has repeated fields and the primary key cannot be set up, you can use dictionary <string, list <datarow> to solve the problem that datatable cannot create a primary key, this causes query performance degradation.

3. Implementation Effect

Based on the above analysis, the Select method is replaced in actual work, and a dictionary <string, datarow> contains the structure C of all rows of the able object after the target is merged. the keyword is securitycode, and datarow is a row containing three columns of data: securitycode, high, and low. When merging, directly traverse all rows in Table A, and then judge whether the securitycode in the row is included in C. If the securitycode in the row contains and is retrieved, assign a value directly. Then traverse table B. The whole process improves the efficiency of able merging by at least 10 times.

Thu Conclusion

This article briefly introduces two methods for retrieving data in datatable, datatable. Select and datatable. Rows. Find. Before the execution efficiency of the test method, we introduced how to set a primary key for the datatable, and compared the time spent on setting the primary key before and after data filling. The results show that,After data is filled in, setting the primary key is much more efficient than setting the primary key before data is filled in.After the primary key is set, the performance of the able. Select method when only primary key fields are filtered is compared,When only the primary key is retrievedDatatable. SelectThe retrieval speed is much faster than that without a primary key.Under the same conditions,If you only need to find a record, useDatatable. Rows. FindCompared with datatable. SelectFaster.When you frequently perform operations on datatable queries,Avoid calling in a loopDatatable. SelectThe datatableConvert to an equivalent dictionaryStructure, which can effectively solve the problem that duplicate key values make it impossible to create a primary key.,AndDicitonaryThe use of hash tables to search can greatly improve the query efficiency.

Click here to download the test cases and code in this article. It is helpful for you to Improve the efficiency when searching datatable.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Performance Analysis of datatable Data Retrieval

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Performance Analysis of datatable Data Retrieval

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support