In the past, my understanding of these two collection classes was just about whether they support generics. In the past few days, I took a good look at the opportunity of introducing algorithms and made a good understanding of the internal implementation mechanisms of the two classes.
Both Hashtable and Dictionary belong to Hashtable in terms of data structure, both of which are key-value pairs of keywords). They are hashed to a slot of Hashtable, the difference is the method for dealing with collisions. The hash function may hash different keywords to the same slot in Hashtable. In this case, we call it a collision. In order to insert the data, we need another method to solve this problem.
Chaining)
In the link method, all elements hashed to the same slot are placed in a linked list. The slot has a pointer pointing to the head of the linked list. If not, it is NIL. For a hash list that can store n elements and has m slots, we define the load factor a as n/m, that is, the average number of elements stored in a chain.
The addition, deletion, and searching operations in the Link Method are basically the basic operations of the linked list. I will not go into detail here.
Open addressing)
In the open addressing method, all elements are stored in the hash, instead of the Link Method. data is stored in an external linked list. In the open addressing method, because all the data is stored in the hash, the slot must be greater than or equal to n, that is, the load factor must be less than or equal to 1.
In the open addressing method, when you want to insert an element, we will pass the keyword and probe number (accumulated from 0) as the input to the hash function, and the hash function returns the corresponding slot. When inserting, first search for the hash (key, 0) slot. If it is not empty, probe number + 1. Continue to check the next slot until an empty slot is found or the hash list is full. The search process is similar to insert. If we encounter an empty slot when searching for a keyword, the search will end, because if the keyword exists, it will also appear in this place.
In the open addressing method, delete operations are special. If the deleted data is set to null, there will be a problem. For example, when we insert k during the insertion process, we find that the slot I is occupied, we inserted it into the slot. If we simply set slot I to null during deletion, the keyword k will not be found during search. We can use a flag to solve this problem. The specific implementation will be discussed below.
Double hash
There are multiple open addressing method probing methods. Here we only talk about double profiling, because this method is one of the best methods and is used in Hashtable.
Here is the auxiliary hash function. For the first time, the offset is added on the basis of the subsequent probe location, and then the m is modeled. Here, we need to mention that we need to check the entire hash and interact with the size m of the slot. We can see how this condition is met in the Hashtable class.
After explaining the connection method and the open addressing method, let's talk about Hashtable and Dictionary.
The Hashtable class uses the open addressing Method to Solve the collision problem. Let's take a look at a constructor of Hashtable.
- this.loadFactor = 0.72f * loadFactor;
- double num = ((float) capacity) / this.loadFactor;
- if (num > 2147483647.0)
- {
- throw new ArgumentException(Environment.GetResourceString("Arg_HTCapacityOverflow"));
- }
- int num2 = (num > 11.0) ? HashHelpers.GetPrime((int) num) : 11;
- this.buckets = new bucket[num2];
- this.loadsize = (int) (this.loadFactor * num2);
- this.isWriterInProgress = false;
The constructor will multiply the input load factor by 0.72. This value is an ideal value for Microsoft. As mentioned above, we need to maintain the interconnectivity with the size m of the slot in the dual hash. We only need to ensure that m is the prime number and smaller than m, so that they always have the same quality. Here HashHelpers. GetPrime is used to return a prime number larger than num. This ensures that num2 is always a prime number and then the slot array is created.
(This. getHash (key) & 0x7fffffff) is equivalent to 1 + (uint) (seed> 5) + 1) in the double hash formula) % (hashsize-1); is equivalent,
The hash_coll in the slot is used to store the hashcode corresponding to the key, and the highest bit is used to identify whether a collision has occurred. The highest bit of the slot in the collision will be set to 1. When searching, if the maximum bit is 1, the search function continues searching. Pay attention to the while condition in the contains method,
- do
- {
- bucket = buckets[index];
- if (bucket.key == null)
- {
- return false;
- }
- if (((bucket.hash_coll & 0x7fffffff) == num3) && this.KeyEquals(bucket.key, key))
- {
- return true;
- }
- index = (int) ((index + num2) % ((ulong) buckets.Length));
- }
- while ((bucket.hash_coll < 0) && (++num4 < buckets.Length));
BTW. When I looked at this method, I thought that the search function could actually skip the bucket. key = this. buckets, because in the removal method, if the bucket. hash_coll <0, then the bucket. key = this. bucket. hash_coll <0 is more efficient, so I won't talk about it here. Think-loving friends will write your answers later.
In the Add method, you need to check the count. If the value reaches the set value, you need to expand Hashtable. The Expanded capacity is a prime number more than twice the current capacity, then re-hash the existing elements, which is equivalent to re-inserting the new slot array. I still have some questions about the function of the index variable in the Insert method when reading the Code. If you know something, please leave a message.
Dictionary <TKey, TValue> This generic class uses the Link Method to Solve the collision. The bucket stores the subscript pointing to the Entry, and the Entry is equivalent to the node in the linked list, the Entry stores a subscript pointing to the next element that produces a collision. The difference is that the Entry here is an array.
- public struct Entry<TKey, TValue>
- {
- public int hashCode;
- public int next;
- public TKey key;
- public TValue value;
- }
The Add operation of Dictionary calculates the Hash value of the element, finds the bucket based on the Hash value, finds the corresponding bucket, saves the value to the Entry, and points the bucket to the corresponding Entry. the query operation logic is to find the corresponding bucket based on the Hash value and then search through the bucket to the Entry array.
To reuse the Entry of the deleted node, A freeList field in Dictionary is assigned to freeList, if freeList> 0 is used during the Add operation, data is inserted into the Entry pointed to by freeList.