Mathematical Analysis of scattered lists (final solution)-solving collision and mathematical analysis through the Linked List Method
The specific implementation of the hash is not described much. It is an array. Each subscript stores the head pointer of the linked list of collision elements, as shown in:
Next we will directly study the analysis of hash using the link method:
Given a hash T with m slots that can store n elements, the load factor α of T is defined as n/m, that is, the average number of elements stored in a chain.
The worst case of hash by means of link is poor performance: All n keywords are hashed to the same slot, resulting in a chain table with a length of n. At this time, in the worst case, the search time is O (n), and the time used to calculate the hash function is similar to using a linked list to link all elements. Obviously, we don't use it because of poor performance in the worst case of a hash.
The average behavior of the hash method depends on the degree of uniformity of the selected hash function h in general, which distributes all the keywords on m slots. In this case, we assume that each element in m slots has the same possibility and is independent of the position where other elements have been hashed. This assumption is simple consistent hash.
For j =, M-1, the length of list T [j] (the linked list corresponding to each slot) is represented by nj, which includes:
The average value of nj is E [nj] = α = n/m.
Assuming that the hash value h (k) (array subscript) can be calculated in the O (1) Time ), in this way, the time for finding an element with the keyword k is linearly dependent on the length of the table T [h (k)] nh (k ). Without considering the O (1) Time of the hash function and the addressing slot h (k), let's look at the number of elements that the algorithm expects to search, that is, the number of elements in the T [h (k)] Table that checks whether the keyword of the comparison element is k. There are two cases to consider. In the first case, the query fails: the key word of no element in the table is k. In the second case, the search successfully finds the element with the keyword k.
Theorem 11.1 uses the link technology to resolve a collision Hash. under the assumption of a simple consistent hash, the expected time for an unsuccessful search is O (1 + α ).
Proof:Under the assumption of simple consistent hashing, any key word k that has not yet been stored in the table will be hashed to any one of the m slots. Therefore, when a keyword k is searched, the expected time to be searched is the expected time at the end of the chain table T [h (k, the expected length of this time is E [nh (k)] = α. Therefore, an average number of Alpha elements must be checked for an unsuccessful search. The total time required (including the time for calculating h (k) is O (1 + α ).
For successful searches, the situation is slightly different, because each linked list is not always possible to be found. The probability of a linked list being searched is proportional to the number of elements it contains. However, the expected search time is still O (1 + α ).
Theorem 11.2 In the case of simple consistent hashes, it takes O (1 + α) Time for a successful search to solve the collision Hash using the link technology on average.
Proof:Assume that the keywords to be searched are any of the n keywords stored in the table, which is likely to be the same. In a successful search of element x, the number of checked elements is 1 more than the number of elements that appear before x in the linked list where x is located. In this linked list, all elements that appear before x are inserted after x, because new elements are inserted in the header. To determine the expected number of elements to be searched, add 1 to the expected number of elements inserted to the table after x in the linked list where x is located, and then take the average of n elements x in the table. If xi is set, it indicates the I-th element inserted into the table, I =,..., n, and ki = key [xi]. For key words ki and kj, define the random indicator variable Xij = I {h (ki) = h (kj )}. In the assumption that the simple consistent hash is used, Pr {h (ki) = h (kj)} = 1/m is available, and then according to the theorem 5.1, E [Xij] = 1/m. Therefore, in a successful search, the expected number of elements to be checked is
Note::
1.Based on discrete mathematics, calculating the computing complexity of an algorithm in an average state can be transformed into calculating the expected value of a random variable. To set the sample space of an experiment, enter aj (j = ,..., n), and the random variable X is assigned to aj as the number of operations used by the algorithm when aj is input. Based on our understanding of input, each possible input aj is assigned a probability p (aj ). The complexity of the algorithm in the average state is
2.The first sentence proves that the probability of each possible input is 1/n. That is to say, the probability that x is equal to the I element in the table is 1/n.
3.The random indicator variable Xij = I {h (ki) = h (kj)} indicates that if I and j are hashed in the same slot, that is, add one time in the same linked list.
Therefore, in summary, the total number of checked elements inserted into the table after x is:
The total number of elements that the algorithm checks on average is:
Finally, the expected value of this formula is calculated to get the expected number of checked elements. Therefore, the full time required for a successful search (including the time used to calculate the hash function) is O (2 + α/2-α/2n) = O (1 + α ).