First, let's take a look at the definition of the hash:
Hash: uses a hash function to map keywords to specific positions in the hash list.
Ideally, if the key word K of Element E and the hash function is f, the position of E in the hash is F (k ).
This type of conversion is a compression ing, that is, the space of hash values is usually much smaller than the input space,
From the definition of the function, we can know that different input may be hashed into the same output, but it is impossible to uniquely determine the input value from the hash value.
In short, the hash function is a function that compresses messages of any length to a fixed-length message digest.
Next, let's take a look at the features of the hash through the main issues of the two HASH lists.
Two main problems of hash: Common Hash Functions and conflict handling methods.
First, let's talk about common hash functions.
If the keyword range is too large, for example, XXXX-xxxxxxx is a city phone number, it is possible to indicate that there are about 10 11 calls,
However, many numbers may not be used, which means there is a huge waste of space.
Then we need a ing function to compress data.
The most common ing function is Division ing:
K is the keyword, D is the size of the hash list, and % is the region operation.
The position in the scattered list from 0 to each position in the D-1 image is called a bucket.
Let's take a look at the real cases of the scattered list:
Is a hash HT with three elements in the bucket, D = 11.
Because 80% 11 = 3, 80 is in bucket 3.
Because 40% 11 = 7, 40 is in bucket 7.
Because 65% 11 = 3, 65 is in the 10 th bucket.
Suppose we want to insert an element: 24.
Because 24% 11 = 2, 24 is in the bucket 2.
OK. The insert operation is successful. It looks easier than you think.
But what if we want to insert another element: 58?
Because 58% 11 = 3, 58 is in bucket 3.
But apparently, the bucket 3 has already been named a master.
So where will 58 go?
The simplest solution is to store it in the next available bucket of the table.
In this way, the insert operation is successful, and the second problem of the hash table is solved successfully: The method for handling the conflict.
If overflow occurs in the last bucket, insert it directly to the bucket 0 of the next available bucket.
It can be seen that when looking for the next bucket, the table is regarded as a ring.
This method is called linear open addressing.
In this way, the insertion conflict is solved, but there is a kind of hunch that this kind of solution has a hidden danger.
A simple example.
If 80 is deleted at this time, and then I want to search for element 58, it will definitely search for Bucket 3 directly,
Unfortunately, 58 is not honestly in bucket 3, so the search result must fail.
The search operation continues until the following three conditions are met:
1) the bucket with the keyword K has been found, that is, the search is successful.
2) arrive at an empty bucket
3) The F (k) bucket is returned.
In both cases, search fails.
However, after a delete operation is completed, the above operations cannot be smooth.
Therefore, we can no longer look for elements in the hash list like the regular search of arrays.
One solution is to check each bucket one by one from the elements to be deleted after each deletion to determine the elements to be moved until they reach an empty bucket or return to the bucket corresponding to the delete operation.
Another solution is to add a tag for each bucket: neverused.
Neverused = true during initialization. Once inserted, the neverused value at this position is set to true.
Therefore, the ending mark of the search becomes:
1) the bucket with the keyword K has been found, that is, the search is successful.
2) The neverused value of the bucket is true.
To avoid the impact of the delete operation on the search operation.
In a short time, all neverused values will be false. In this case, you only need to reorganize the table and insert the remaining elements into an empty hash.
The following describes how to calculate the average length of a hash table that fails to be queried.
The specific process is: first create a table, and then calculate the sum of the number of comparisons when each location fails, and then divide by the number of tablespaces.
Let's first look at this question: how to calculate if the hash table cannot be searched successfully?
In the following example, if the hash table already exists, the number of successes and failures is the number of times the bucket K is queried to succeed or to fail.
Average search length when the search is successful: ASL = (1 + 3 + 1 + 2 + 2 + 1 + 1 + 9 + 1 + 1)/10 = 2.2
Average search length when the search fails: ASL = (9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 + 1 + 2 + 1 + 1 + 10) /13 = 4.54
Note:
The number of comparisons when the nth position is unsuccessful is the distance from the nth position to the 100th position without data.
The minimum number of times to be queried before you can confirm that this value does not exist.
(1) query Hash (x) = 0. If the table value is null at least nine times, the query fails.
(2) query Hash (x) = 1. If the table value is null at least eight times, the query fails.
(3) query Hash (x) = 2. You must query at least seven times if the table value is null before you can confirm the query failure.
(4) query Hash (x) = 3. If the table value is null at least six times, the query fails.
(5) query Hash (x) = 4. If the table value is null at least five times, the query fails.
(6) When querying Hash (x) = 5, you must query at least four times and the Table value is empty before you can confirm the query failure.
(7) When querying Hash (x) = 6, you must query at least three times when the table value is null to confirm the query failure.
(8) query Hash (x) = 7. If the table value is null twice, the query fails.
(9) query Hash (x) = 8. At least one query failure can be confirmed only when the table value is null.
(10) query Hash (x) = 9. At least one query failure can be confirmed only when the table value is null.
(11) When querying Hash (x) = 10, you must query at least two times when the table value is null to confirm the query failure.
(12) When querying Hash (x) = 11, you must query at least one time when the table value is null.
(13) query Hash (x) = 12. You must query at least 10 times before confirming that the query fails if the table value is null (the sequential table is queried cyclically ).
The average length of records that fail to be searched is added to the Hasse table with an analysis length of M,
This is equivalent to the expected number of times to be compared when the n + 1 record is added to this table.
For example, when you use mod11, the MOD value after number n + 1 may be 0 ~ 10.
When it is 0, it is placed in the first position. If there is a record in the first position, it is necessary to handle the conflict and move it back to an empty position,
Remember that the number of moving locations is the value that fails to be found at 0,
Then analyze the data in 1 ~ The value at the 10 position, and the sum is divided by 11. (Note: The table length is not M ).