Do the phrase, some netizens put forward the use of hashtable data structure to find the string efficiency is lower, the proposed change to dictionary, the reason is to use the Hashtable when the key value is the object will trigger boxing and unboxing action, has been skeptical of this argument, Because I understand that boxing and querying occur only when value types and reference types are passed through object, and that casts and unboxing between reference types should not occur, and that the dictionary generics are actually the underlying or called Hashtable, so how efficient can it be than Hashtable? Today, we decided to do a test of 4 kinds of data structure, so as to make a reference for the system performance optimization.
These four types of data structures are Hashtable, Dictionary, Sorteddictionary,sortedlist, respectively. The data structures they represent are hash tables, hash tables, binary index trees, and binary lookups. The test design is as follows. 100,000 data for a group, according to the length of each record string divided into 3 groups, respectively, and 16,128,1024, respectively, according to the original data is sorted or not sorted two cases of insert and single value lookup test. The test time unit is milliseconds. Look at the test results below
Inserted test result (all results are inserted 100,000 times total time)
test condition |
hashtable< /td> |
dictionary |
sorteddictionary |
sortedlist |
string length 16, unordered |
14 |
21 |
8009 |
string length 16, sorted |
25 |
35 |
990 |
671 |
string length 128, |
52 |
868 |
8415 |
string length 128 , sorted |
67 |
1053 |
666 |
string length 102 4, not sorted |
262 |
1269 |
8159 |
string length 10 24, sorted |
158 |
277 |
1036 |
684 |
The test results for the query (all results are query 100,000 times total time)
test condition |
hashtable< /td> |
dictionary |
sorteddictionary |
sortedlist |
string length 16, unordered |
13 |
15 |
366 |
string length 16, sorted |
25 |
29 |
349 |
315 |
string length 128, |
40 |
492 |
438 |
string length 128, Sorted |
54 |
408 |
371 |
string length 1024 , not sorted |
202 |
934 |
894 |
string length 1024 , sorted |
219 |
801 |
757 |
From the test results, whether the insertion and query, hashtable efficiency is the highest, followed by dictionary this is basically consistent with my expectations.
Inserted in a hash mode, time is consumed primarily in handling hash conflict situations, but it consumes a limited amount of time because it simply chooses an idle bucket without additional memory allocation. The point here is to set the initial value of each data structure to 100,000 in my test design. and dictionary due to the Hashtable based on the encapsulation of a layer, the efficiency of a slight decline.
Binary search tree insertion, time consuming in the maintenance of the nodes of the tree and lookup (because it is necessary to determine whether the records have duplicates before each insert, duplicate records are not inserted).
Binary lookup, because the bottom is an ordered array, when the order is inserted, the efficiency is slightly higher than the binary search tree, when random insertion will require constantly adjust the position of elements in the array, which results in a large number of large chunks of memory copy, so this case is extremely inefficient, and as the number of elements will increase exponentially.
The time consumption of hash lookup is mainly in the case of conflict, and the calculation of hash function is usually very fast. But here's one thing to note is that the. NET string type is specific, because the same string points to the same address, so when a string makes a 22 comparison, only the address is compared, not every character is computed, which greatly increases the efficiency of the comparison.
The other two ways are to compare the size of strings, rather than just being equal, which brings a lot of overhead.