1.Hash table
A hash table (hash table, also known as a hash list) is a data structure that is accessed directly from key. That is, it accesses the record by mapping the key to a location in the table to speed up the lookup. This mapping function is called a hash function, and the array that holds the record is called the hash table.
The value of the function is computed by the hash function h (k) as the argument K of each element in the data, and the function value is used as the cell address of a contiguous storage space to store the element in the corresponding cell of the function value.
The hash table stores the key-value pairs, and the time complexity of the lookup is irrelevant to the number of elements, and the hash table locates the elements by calculating the hash code value to locate the element's position and thus directly accessing the element, so the time complexity of the hash table lookup is O (1).
2. Hash table construction Method 2.1 Direct addressing method
The value of a linear function of the keyword or keyword as a hash address, i.e.
H (key) =key or H (key) =a*key+b (A, B is an integer)
This hash function is also called its own function. If the hash address of H (key) already has a value, then go to the next position, until you find the position of H (key) without value, put the element in.
This method is only appropriate for: the size of the address collection equals the size of the keyword collection
2.2 Digital Analysis method
Analyzing a set of data, such as the birth date of a group of employees, we find that the first few digits of the birth date are generally the same, so the probability of a conflict is very large, but we find that the number of months and days of the month and day is very different, if you use the following numbers to construct the hash address, The odds of the conflict are significantly reduced.
Therefore, the digital analysis method is to find out the laws of numbers , as far as possible to use this data to construct a low probability of conflict hash address.
This method is suitable for: the frequency of the various numbers appearing on each of the keywords can be estimated beforehand.
2.3 Square Take the middle method
The middle of the square value of the keyword as the storage address (hash address). The purpose of the "square value of the keyword" is to "widen the difference", while the middle of the square value can be affected by the whole of the keyword.
This method is suitable for: every bit in the keyword has a high frequency of repetition of certain numbers.
2.4 Folding Method
Divide the keywords into sections and then take their overlays and hash addresses. Two methods of superposition: Shift Overlay: Add the low-level alignment of the segmented parts, and overlap the boundaries: fold back and forth from one end to the next, and then add and then align.
This method is suitable for: the number of digits of the keyword is particularly numerous .
2.5 Random Number method
Set the hash function to: H (key) = random (key) where random is a pseudo-random function
This method is suitable for constructing hash functions for keywords of unequal length .
2.6 Residual remainder method
The remainder is a hash address when the keyword is removed by a number p that is not larger than the hash table length m.
The hash function is: H (key) = key MOD P (p≤m), where M is the table length and p is a prime number less than M.
3. Hash table conflict resolution method
Hash table processing conflicts mainly include open addressing method , re-hashing method , chain address method (Zipper method) and the establishment of a public overflow zone four methods.
By constructing a well-performing hash function, you can reduce conflicts, but it is generally not possible to avoid conflicts altogether, so resolving conflicts is another key issue in hashing.
The actual meaning of "handling conflicts" is to look for the next hash address for the keyword that generated the conflict.
3.1 Open Addressing method
In the event of a conflict, search for the next empty hash address, as long as the hash table is large enough, the empty hash address is always found and the record is deposited.
3.1.1 Linear detection
When a conflict occurs, the next cell in the table is viewed sequentially until an empty cell is found or a full table is searched.
Formula:
fi(key) = (f(key)+di) MOD m (di=1,2,3,......,m-1)
3.1.22-Time detection method
When a conflict occurs, a jumping probe is made to the left and right of the table to find a possible empty position in both directions.
Formula:
fi(key) = (f(key)+di) MOD m (di = 12, -12, 22, -22,……, q2, -q2, q <= m/2)
3.1.3 Random Detection method
In the case of conflict, the displacement di is calculated by random function, which we call random detection method.
Formula:
fi(key) = (f(key)+di) MOD m (di是一个随机数列)
Linear detection and re-hashing is prone to "two aggregation", that is, when dealing with synonyms conflicts, it leads to non-synonym conflicts.
The advantage of a linear probing re-hash is that, as long as the hash table is dissatisfied, it is possible to find a hash address that does not conflict, while the two-probe hash and pseudo-random probing re-hash are not necessarily.
3.2 Chain Address method
All records with the same hash address are linked in the same linked list. The node space on each list is applied dynamically, so it is more suitable for the case that the table length cannot be determined before watchmaking.
The processing conflict is simple, and no accumulation phenomenon, that is, non-synonym will never conflict, so the average search length is short;
3.3 Re-hash method
This method constructs several different hash functions at the same time:
Hi=RH1(key),i=1,2,3,…,n.
When the hash address HI=RH1 (key) conflicts, calculate HI=RH2 (key) ... until the conflict no longer occurs. This method is not easy to generate aggregation, but increases the computational time.
3.4 Creating a public overflow zone
The basic idea of this method is that the hash table is divided into the basic table and the overflow table two parts, and the elements that conflict with the basic table are filled in overflow table. (Note: In this method, the elements are separated by two tables to store)
Data structure Hash table, hash function and conflict resolution