Although we do not wish to have a conflict, the likelihood of a conflict in fact still exists. When the value of a keyword is much larger than the length of a hash table, it is not known in advance when the keyword is specified. Conflict will inevitably occur. In addition, when the actual value of the keyword is greater than the length of the Hashtable, and the table is filled with records, if a new record is inserted, not only is there a conflict, but also an overflow occurs. Therefore, dealing with conflicts and overflows are two important issues in hashing technology. 1. Open addressing method
The approach to conflict resolution with open addressing is to use some sort of probing (also known as probing) technique to form a sniffing sequence in a hash table when a conflict occurs. Finds the specified keyword along this sequence, either until a given key is found, or when an open address (that is, the address cell is empty) (to insert, in the case of an open address, the new node to be inserted is stored in the Address cell). Probing to open addresses while searching indicates that there are no unknown origin keywords in the table, that is, the lookup failed.
Attention:
① when creating a hash table with open addressing, all the cells in the table (more strictly, the keywords stored in the cell) must be empty before the tables are built.
The expression of ② empty cell is related to the specific application.
According to the method of forming the probing sequence, the open addressing method can be divided into linear probing method, linear compensation detection method, random detection and so on.
(1) Linear probing method (Linear probing)
The basic idea of this method is:
The hash list t[0..m-1] is considered a cyclic vector, and if the initial probe address is D (that is, H (key) =d), the longest probe sequence is:
D,d+l,d+2,...,m-1,0,1,...,d-1
That is, start with address D on probing, first probe T[d], then probe t[d+1], ..., until T[m-1], and then loop to t[0],t[1], ... until you have probed to t[d-1].
The probing process terminates in three cases:
(1) If the current probe unit is empty, it means that the lookup failed (if inserted, the key is written to it);
(2) If the currently probed unit contains a key, the lookup succeeds, but the insertion means failure;
(3) If the discovery of t[d-1] does not find the empty cell and no key is found, then either the lookup or the insertion means that the failure (at this time the table is full).
Using the general form of open address method, the probing sequence of the linear probing method is:
H i = (h (key) +i)%m 0≤i≤m-1//i.e. D i =i
Using linear detection method to deal with the conflict, the idea is clear, the algorithm is simple, but there are the following disadvantages:
① processing overflow requires another program. It is generally possible to set up an overflow table specifically for storing records that do not fit in the above hash table. The simplest structure of this overflow table is the sequential table, in which the lookup method can be found in order.
② the hash table set up by the above algorithm, it is very difficult to delete the work. If you want to delete a record from the hash table HT, it is supposed that the location of this record should be empty, but we can not do this, but can only be marked with the deleted tag, otherwise, will affect the future lookup.
③ linear detection method is very easy to produce the phenomenon of heap accumulation. The so-called heap accumulation phenomenon, that is, the records deposited into a hash table are linked together. When dealing with collisions in linear probing, if the successive sequence generating the hash address is longer (that is, the longer the hash address of the different key values is adjacent together), the greater the likelihood of a conflict with the new record when it joins the table. Therefore, a long sequential sequence of hash addresses grows faster than a short sequential sequence, which means that, once a heap is present (along with the conflict), it will cause further heap accumulation.
(2) Linear compensation detection method
The basic idea of the linear compensation detection method is:
Change the step length of the linear probe from 1 to Q, and the J = (j + 1)% m in the above algorithm is changed to: j = (j + Q)% m, and Q and M are coprime so that all the cells in the hash table can be detected.
The "example" PDP-11 the conforming table used by the assembler in the small computer, using this method to resolve the conflict, the table length m = 1321, the use of Q = 25.
(3) Random detection
The basic idea of random detection is:
The step of linear probing is changed from constant to random number, even if: j = (j + rn)% m, where RN is a random number. In the actual program, the random number generator should be used to generate a random sequence, which is used as the step of sequential detection. This allows different keywords to have different detection sequences, which can avoid or reduce heap accumulation. Based on the same reason as the linear detection method, in the linear compensation detection method and the random detection method, the deletion mark is also marked after deleting a record.
2. Zipper method
(1) method of resolving conflicts by zipper method
Zipper method The conflict resolution approach is to link all keywords as synonyms to the same single linked list. If the hash list length selected is M, the hash list can be defined as an array of pointers consisting of M head pointers T[0..m-1]. All nodes with hash address I are inserted into a single linked list with T[i] as the head pointer. The initial value of each component in T should be a null pointer. In the Zipper method, the filling factor α can be greater than 1, but generally take α≤1.
"Example" with M = 5, H (k) = K mod 5, key value Order Example 5, 21, 17, 9, 15, 36, 41, 24, the hash table created by the outer chain address method is shown in the following figure:
(2) Advantages of zipper method
Compared with open addressing method, the Zipper method has several advantages as follows:
① Zipper method to deal with the conflict is simple, and no accumulation phenomenon, that is, non-synonym will never conflict, so the average search length is short;
② because of the dynamic application of the node space on each linked list in the Zipper method, it is more suitable for the case that the table length can not be determined before watchmaking.
③ open addressing method to reduce the conflict, requires the loading factor α is small, so when the node scale is large, it will waste a lot of space. While the Zipper method is preferable to α≥1, and when the node is larger, the additional pointer field in the Zipper method is negligible, thus saving space;
④ in a hash list constructed with the Zipper method, the operation of deleting nodes is easy to implement. Simply delete the corresponding node on the list. In the case of a hash table constructed by the open address method, the deletion node cannot simply empty the space of the deleted node, otherwise it will truncate the lookup path of the synonym node of the hash table after it. This is because in various open address laws, empty address units (that is, open addresses) are the criteria for finding failures. Therefore, the delete operation is performed on the hash list that handles the conflict with the open address method, and the deletion mark can only be done on the deleted node, instead of the node being actually deleted.
(3) Disadvantages of zipper method
The disadvantage of the zipper method is that the pointer requires additional space, so when the node size is small, open addressing method is more space-saving, and if the savings of the finger
Hash tables and methods for dealing with conflicts
Hash method, also known as hashing, hashing, and key-word address meters