This study note part of the content from the NetEase cloud classroom Zhejiang University data Structure course, thank you!
1. Hash list (hash table)Several known methods of finding:Order Lookup O (N) binary lookup (static lookup) O (logn) Two fork Search tree O (h) H for two fork tree height (Dynamic lookup: There is a search for insertions with deletions) Balanced binary Tree O (logn)
the nature of the lookup: the location of the known object 1, the orderly arrangement object: The whole order or the half order; 2, directly calculate the object location: hash.
the two basic tasks of the hash lookup method are: 1, calculate location: Constructs the hash function to determine the keyword storage location; 2. Resolve Conflicts: Apply a strategy to solve the same problem with multiple keyword locations. time complexity is almost constant O (1), which means that as long as the hash function is well constructed, the lookup time is independent of the problem size. The basic idea of hashing:1, with the keyword key as an independent variable, through a deterministic function h (hash function), the corresponding function value of h (key), as the storage address of the data object; 2. Potentially different keywords are mapped to the same hash address, that is, H (keyi) =h (Keyj), and (Keyi is not equal to Keyj), which is called a conflict and requires some sort of conflict resolution policy.
Example:
2. How to construct the hash functiona good hash function should be: 1, the calculation is simple, to improve the conversion speed, 2, the corresponding address space of the keyword distribution evenly, to reduce the conflict. When a keyword is a number:A, direct address method: Take the key word of a linear function is a hash, that is, H (key) =a*key+b; B, in addition to the remainder method (commonly used):h (key) =key%p General p is equal to the expression of small, p is generally to take prime numbers; C, Digital analysis method: Analysis of the number of keywords in the changes in the situation, take a relatively random bit as a hash address, such as phone number, ID number of a few will be more random; D, Folding method: The keywords are divided into several parts of the same number of bits, and then superimposed; E, the square takes the middle method: Key takes the square to take the intermediate several; Keywords are characters:A, ASCII code Plus and method: H (key) = (sum key[i]) mod tablesize; B, the first 3 character shift method:h (key) = (key[0]*27*27+key[1]*27+key[2]) mod tablesize; C, Shift method:
3, the hash list conflict processing methodTwo ideas: 1, open address law: To change the location of the conflict data; 2, chain address law: Conflicting objects in the same location are organized together.A, open address law:once a conflict has been created (the address already has other elements), a rule will be taken to find another empty address. If this conflict occurs, the next address of the temptation will be increased by Di, i.e.: Hi (key) = (h (key) +di) mod tablesize; di determines the different solutions: linear probing: di=i, probing the next storage address in an incremental 1,2,...,tablesize-1 loop. Example: H (key) =key mod one; when looking for a value , when the hash function is calculated, if the number on the result position is different from the keyword, it is not determined that the keyword does not exist, and should continue to find the conflict resolution strategy, Until we find an empty location, I can't conclude that the keyword doesn't exist.
Square detection (two probes): di= positive and negative i*i, to increase 1 parties, 1 Parties, 2 Parties, 2 parties, ..., Q Square,-Q square and q is less than or equal to TABLESIZE/2, loop to test the next storage address. Example:h (key) =key mod one;
Double hash: di=i*h2 (key), H2 (key) is another hash function, H2 (key) =p-(key mod p) works best.
re-hashing: When the hash element is too large (filling factor is too big), the search efficiency will decrease, the utility filling factor generally takes 0.5 to 0.85. when the filling factor is too large, the solution is to double-expand the hash table, or hash again.
b, Separate link method: the corresponding location of all the conflicting keywords stored in the same single linked list. Example:4, the performance analysis of the hash tableThe number of times a keyword is compared depends on how much conflict is generated, and the following three factors affect how much the conflict is:the hash function is uniform, the method of dealing with the conflict, the filling factor Summary: Select the appropriate hash function h (key), the hash method of the search efficiency is expected to be O (1), it is almost independent of the size of the keyword space, It is also suitable for the problem that the keyword directly compares the computational quantity. The hashing method is based on a small filling factor, and is a method of space-changing time. The hash method stores the keywords randomly, does not facilitate sequential lookup of keywords, and is not suitable for range lookup or maximum minimum value lookups.
Open Address method: Good: Hash table is an array, storage efficiency is high, random search. not good: there is aggregation phenomenon. the separation chain method: The hash list is the combination of sequential storage and chained storage, and the storage efficiency and search efficiency of the chain table are relatively low. good: No garbage stored when keyword is deleted; not good: too small filling factor may lead to space waste, too large filling factor will pay more time cost, uneven chain table length resulting in a serious decline in time efficiency.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Find hash lookup (hash table)