Hash Table Collision Avoidance Strategy
1, collision resolution by chaining (closed addressing)
The linked list method is a feasible collision avoidance strategy. Each slot (slot) of a Hash table array stores a single-linked list that stores all key-value pairs with the same hash value. The newly inserted key-value pair is added to the end of the corresponding single-linked list. The lookup algorithm scans the corresponding single-linked list to find the corresponding key value. Initially, the space-time pointers stored in each slot in the hash table will be true for the first insert operation of each slot, with the single linked list stored in the slot.
Schematic diagram of the linked list method:
Performance Analysis:
Assuming that the hash function value distribution is to satisfy the uniform distribution, while the hash table allows the dynamic resizing, in the hash table to perform the insertion, deletion and the operation of the search complexity of the constant, then the entire hash table fill rate (load factor) The time it will take to determine the operation's final charges.
It is worth noting that even though the load of the entire hash table is significantly larger than the load capacity of table itself, the hash table using the list method to avoid collision is still very good performance. Suppose a hash table with 1000 slots stores 100,000 entries (load factor:1000000/1000 = 100). With a single linked list (singly-linked list) for storing all the data directly, it is necessary to spend more storage space (requires more memory), but the time to store the data will be 1000 times times higher than the specific insert, find and other operations ( Average). However, you need to know that the time complexity of storing data in a single-linked list and a fixed-size hash table is O (n) alone from the point of view of computation.
2.Open Addressing Strategy
The list method is a good solution to avoid collisions, but the list method needs to spend extra storage space on the list. If the data items we store are small (such as shaping variables) or the amount of data itself is small, the space wasted using the chain-list method to avoid collisions is almost equivalent to the space occupied by the data itself. When using an open addressing strategy to avoid collisions, all key-value pairs are stored in the storage space of the hash table itself, and we do not need additional definition data structures.
Collision Avoidance:
Let's consider the insert operation, given a key, if the key corresponding to the hash slot has been occupied, we have to find an empty location to store the corresponding key value and its data. The general scenario is to start looking backward from the slot where the key value should have been stored (proceeds in a probe sequence) until an empty location can be found to store the corresponding key-value. Here are a few common heuristics:
Linear probing (linear heuristic), the distance between the two heuristic positions is fixed.
Quadratic probing (two temptations), the distance between the two heuristic positions is increased by a fixed length each time, in which case the distance between where the data is actually stored and where it should be stored is determined by the number of times the heuristic executes.
Double Hashing (two-time hash): The distance between the time storage location and the location should be stored in a separate distance hash function is calculated.
The open addressing Collision avoidance strategy has additional requirements for the hash function itself, in addition to the requirement that the hash value distribution satisfies the uniform distribution, and also requires that the hash value not be too concentrated because of the location of the test.
Linear heuristic:
Delete operation:
There are some details we need to consider when deleting a key in the hash table of an open addressing strategy, consider the following picture:
If the "Sandra Miler" corresponding storage location data is deleted directly during the algorithm implementation, the structure of the whole hash table will be destroyed, and the algorithm will not return the corresponding value when finding "Andrew Wilson", but in fact "Andrew Wilson "does exist in hash table.
We can use the following strategy to implement the delete operation, instead of emptying the bucket data directly, we write the "Deleted" tag on the corresponding bucket.
So the lookup algorithm will execute normally, we only need to deal with the deleteed in the insertion algorithm.
It is important to note that this way of solving the problem (adding deleted) will cause the hash table to appear invalid when deleted as the key value, so it is best to use the chain list method in a hash table that allows the deletion operation.
Performance Analysis:
The hash table with open addressing method requires a higher choice of hash function, assuming that the hash function value distribution is to satisfy the uniform distribution, while the hash table allows the dynamic resizing, in the hash table to perform insertions, deletions, and find the operation of the complexity of the constant.
The hash table of the open lipid strategy is also doorway by the load factor effect. If the load factor exceeds the threshold of 0.7, the operation speed of the entire hash table will decrease rapidly. The ratio of the length of the heuristic sequence (the actual storage location to the target storage location distance) is (loadfactor)/(1-loadfactor). In the most extreme case, when the value of Loadfactor is 1, the heuristic sequence length is infinite, in practice this means that we need more space to store the new key, so the open addressing method hash table needs to support dynamic expansion to ensure its performance.
Open addressing vs. chaining:
|
Chaining |
Open addressing |
Collision resolution |
Using External Data structure |
Using Hash Table itself |
Memory Waste |
Pointer size overhead per entry (storing list heads in the table) |
No Overhead 1 |
Performance dependence on table ' s load factor |
Directly Proportional |
Proportional to (Loadfactor)/(1-loadfactor) |
Allow to store more items, than hash table size |
Yes |
No. Moreover, it ' s recommended to keep table ' s load factor below 0.7 |
Hash function requirements |
Uniform distribution |
Uniform distribution, should avoid clustering |
Handle Removals |
Removals is OK |
Removals clog the hash table with "DELETED" entries |
Implementation |
Simple |
Correct implementation of open addressing based hash table is quite tricky |
Reference:
Http://www.algolist.net/Data_structures/Hash_table/Open_addressing
Http://www.algolist.net/Data_structures/Hash_table/Chaining