During this time, a data analysis tool needs to calculate and filter large amounts of data.
Considering the powerful functions and flexibility provided by datatable, most data processing is performed in datatable.
However, performance defects were found in recent tests. As a result, we began to study performance improvement issues.
Let me give a preliminary guess: In the current mode,ProgramReal-time optimization, the optimization scope is not too large, unless the existing operation mode is changed.
So first, the first step is to read the original data from the CSV file. Stream reading has always been adopted, and it is suspected that it is a performance bottleneck. Therefore, you can use ODBC: Microsoft text driver to operate CSV data like a database and obtain data using SQL statements. However, after comparing the two methods, we can conclude that, due to business needs, data is grouped and merged multiple times, but SQL statements are less efficient than stream, it is not flexible to operate on its own, but limited by SQL statements. <It should be written in detail here. Note later>
Therefore, we have no such speculation. What is the real reason for the impact on performance?
next, you can browse the Code in a row, and finally determine the position that affects the performance. It is the position where the item appears in a two-layer loop:
for (Int J = 0; j {< br> for (INT I = 1; I {< br> dtall [intcsvrowcounttemp, I] + = convert. todecimal (aryline [I]);
// difference between a able and a two-dimensional array, each time a "row number and column number" are specified, the corresponding elements are slightly different. However, these elements are amplified in multiple loop operations. This may be related to the datatable and the structure of the two-dimensional array, no further research
}< BR >}< br> The problem lies in the statement in the loop. Because the values of I and j are very large, when I = 50000, j = 200, this sentence in the loop will be operated 1000000 times; therefore, the slightest difference in performance will be infinitely magnified. So try to replace it with an array:
decall [intcsvrowcounttemp, I] + = convert. todecimal (aryline [I]); The performance is immediately increased by 10 times. Previously, it took 20-30 seconds. Now the processing is completed in 2-3 seconds.
conclusion: the function and performance are always a contradiction. The powerful function of Abel must have lower performance. The reverse structure of the array structure.
I think datatables are not useful at ordinary times. I hope to give you a better understanding. In fact, all kinds of data structures have their best uses. We should make targeted choices as needed.