[Original] datatable performance problems

Source: Internet
Author: User

During this time, a data analysis tool needs to calculate and filter large amounts of data.

Considering the powerful functions and flexibility provided by datatable, most data processing is performed in datatable.
 
However, performance defects were found in recent tests. As a result, we began to study performance improvement issues.

Let me give a preliminary guess: In the current mode,ProgramReal-time optimization, the optimization scope is not too large, unless the existing operation mode is changed.

So first, the first step is to read the original data from the CSV file. Stream reading has always been adopted, and it is suspected that it is a performance bottleneck. Therefore, you can use ODBC: Microsoft text driver to operate CSV data like a database and obtain data using SQL statements. However, after comparing the two methods, we can conclude that, due to business needs, data is grouped and merged multiple times, but SQL statements are less efficient than stream, it is not flexible to operate on its own, but limited by SQL statements. <It should be written in detail here. Note later>

Therefore, we have no such speculation. What is the real reason for the impact on performance?

next, you can browse the Code in a row, and finally determine the position that affects the performance. It is the position where the item appears in a two-layer loop:
for (Int J = 0; j {< br> for (INT I = 1; I {< br> dtall [intcsvrowcounttemp, I] + = convert. todecimal (aryline [I]);
// difference between a able and a two-dimensional array, each time a "row number and column number" are specified, the corresponding elements are slightly different. However, these elements are amplified in multiple loop operations. This may be related to the datatable and the structure of the two-dimensional array, no further research
}< BR >}< br> The problem lies in the statement in the loop. Because the values of I and j are very large, when I = 50000, j = 200, this sentence in the loop will be operated 1000000 times; therefore, the slightest difference in performance will be infinitely magnified. So try to replace it with an array:
decall [intcsvrowcounttemp, I] + = convert. todecimal (aryline [I]); The performance is immediately increased by 10 times. Previously, it took 20-30 seconds. Now the processing is completed in 2-3 seconds.

conclusion: the function and performance are always a contradiction. The powerful function of Abel must have lower performance. The reverse structure of the array structure.
I think datatables are not useful at ordinary times. I hope to give you a better understanding. In fact, all kinds of data structures have their best uses. We should make targeted choices as needed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.