From the previous article "Python Excel operations", we can see that python Excel operations are very convenient. However, Python and related libraries are an additional dependency. If you can solve such problems from Excel itself, it is naturally easier to use.
1. In VBA, the focus of the hash table in python is that the Vlookup formula is too slow, so the key is to find a more efficient algorithm or data structure to locate the data. Vlookup requires sorting the columns. The internal query should be performed on the data in the column in binary search. If the algorithm is not optimized, We have to replace a data structure. After searching, VBA provides the scripting. dictionary structure, and some articles say that the hash table is implemented internally, which is exactly what I want.
In this way, the formula Vlookup (lookup_value, table_array, col_index_num, range_lookup) is converted into the following dictionary search method:
- Use table_array to build a dictionary. Use the first column of table_array as the key, and the column col_index_num of table_array as the value. Insert the column dictionary. Add key, value;
- When searching, you only need to directly retrieve the value of dictionary. Item (lookup_value) to complete the searching;
If you only create Vlookup once, you do not have to create a dictionary first. However, when there are many cells using the same Vlookup formula (for example, tens of thousands), it is necessary. Because dictionary only needs to be created once, it can be searched multiple times with the complexity of O (1.
2. Vlookup is slow. The main problem is not from the algorithm perspective. dictionary search is indeed faster than binary search, but its advantage is not that obvious. Therefore, during the specific execution, I found that using the VBA macro for dictionary search is not much faster than Vlookup. During running, Excel still causes the system to be suspended for several hours. Simply put, such a simple program should not be so slow. What is the problem?
After some exploration, I discovered the root cause of the problem:
- When VBA fills in the content in an Excel table, automatic calculation of existing formulas in the table is triggered, which is very time-consuming;
- When the Excel table content is updated, the displayed content is automatically refreshed at a high cost;
So the key to improving the performance of VBA script execution is to turn off automatic formula calculation and screen refresh During computation, which is unexpected. It is easy to implement these two points in VBA, but because Vlookup itself is a formula, I failed to figure out how to avoid the performance loss caused by these two points when calling Vlookup directly.
3. After the preceding optimization of the sample VBA code, the original Vlookup task can be completed in N hours, And the execution is completed in 7 seconds.
Below is a sample code I wrote. I am not familiar with the VBA language. The degree of code standardization is quite different, but the meaning of the question should be reflected in it. Interested friends can be used as a reference.
Sub generates a level-1 sub-center () ''' in the machine table to generate a level-1 sub-center macro 'application. calculation = xlcalculationmanualapplication. screenupdating = falset0 = timer 'dictionary set map_dict = Createobject ("scripting. dictionary ") 'Open the center ing table set map_sheet = worksheets (" center ing table ") map_nrows = map_sheet.range (" A300 "). end (xlup ). rowset my_rows = map_sheet.range ("A2: B" & map_nrows ). rows traverse the center-specific ing table to obtain the center-specific first-level sub-centers. Insert the dictionary for each my_row in my_rows center = m Y_row.cells (1, 1 ). value City = my_row.cells (1, 2 ). value if not map_dict.exists (center) Then map_dict.add center, city end ifnext my_row 'Open the machine table set dispatch_sheet = worksheets ("machine table") dispatch_nrows = tables ("a9999999 "). end (xlup ). rowset my_rows = dispatch_sheet.range ("A1: B" & dispatch_nrows ). rows traverse the activation table and obtain the first-level sub-center corresponding to machine_id through the dictionary. Insert the activation table for each o_row in my_rows center = o_row.cells (1, 2 ). VA Lue o_row.cells (1, 2). value = map_dict.item (center) Next o_rowmsgbox "generate a level-1 sub-center on the machine table. A total of "& dispatch_nrows &" records are processed. The total time consumed is "& timer-T0 &" seconds. "'Destroy the dictionary set map_dict = nothing 'Enable Automatic calculation and refresh application. Calculation = xlcalculationautomaticapplication. screenupdating = true' end sub
Finally, I first searched the dictionary and found the root cause of the performance problem. Therefore, I failed to compare the specific performance differences between Vlookup and Dictionary lookup methods. If the difference is tolerable, calling the Vlookup formula directly in VBA may be a simpler implementation. 4. Test Cases
How can I test the above Code?
ID name: Center-specific ing table. dictionary table nickname ID: Machine table. Results After matching table: 1 A1 B1 1 B1 A12 A2 B2 2 B2 A23 A3 B3 3 B3 A34 A4 B4 4 B4 a45 A5 B5 5 B5 A56 A6 B6 6 B6 a67 A7 B7 7 B7 a78 A8 B8 B8 a89 A9 B9 B9 a910 A10 B10 B10 A10
-EOF-