Introduction to the MIT algorithm--seventh. Hash table

Source: Internet
Author: User

In terms of function, the purpose of building a hash table is to reduce the time complexity of the search to O (1), taking into account a sequence of length n, if the time complexity is θ (n), or sort its first and then search faster, but neither of these methods is the quickest method.
Hash table is also known as a hash table, he through a hash function h, to store the content of a key value, after the operation of H, the key value mapping to a table with M slots, the simplest example is the phone store someone else's phone number, the key value is the name, the content is the phone number and other personal information. One of the biggest benefits of such a table is that once you look for a value, say "Zhang San", we take the name "Zhang San" into the H function and return to a storage location, such as the 45th position of the number book, where we can immediately search for the "Zhang San" phone number in this position.

The first topic:
All the stored content in a computer is a number, so we're going to look at building a hash table for numbers. First of all, consider what a good hash function H requires:
1. The key value of the hash function should be as uniform as possible, do not appear aggregation effect, that is, the resulting various H (k) to be as far as possible to the distribution of the probability of the hash table in the M slot, of course, if you already know the type of input, We can design a better hash function, but each hash function may encounter a particular input to him, so that all computed health values point to the same slot.
(whether a hash function is uniformly defined: X, Y is two different health values, the hash table length is m,p{h (x) = h (y)} =1/m)  
2. The hash function itself cannot be too complex to calculate too long.

1. Direct hash h (k) =k, no collisions, but space-intensive, meaningless
2. Except Fahahi h (k) =k mod m, the value of M is very fastidious, can not go to the power of 2 and 10, So many of the content has been dropped by the MoD, and can not be too small and so on, you can consider to a suitable prime. Since the computer often uses the power of 2 and 10, not very good prime numbers, but also division, so this hash efficiency is not very high
3. Multiply hash, assuming all keys are integers, m=2^r, computer word length is W, then build h (k) = (a*k mod 2^w) rsh (w-r) Where rsh is the meaning of the right shift, the size of A is 2^ (w-1) <a<2^w The advantage of this hash function is that the final acquisition of H (K) is actually related to the K value on each of them, and a and 2^w the two numbers are coprime, so imagine a roulette, perimeter is 2 powers, A is certainly not a multiple of the perimeter, K is the number of turns, then the last H (k) will likely fall to the wheel anywhere.

A second topic:
As mentioned before, regardless of the design of a hash function, collisions are difficult to avoid, then how to solve the collision problem? There are two main ways of doing this:
1. Link method, each collision is added a linked list, this will increase the size of the hash table, the worst case will cause all values to point to the same slot, and then the hash table becomes a linked list, our query has become a list of queries.
2. Open addressing method, without increasing the capacity of the Hashtable, continue to "probe" the table until an empty position is found to put the content in. Wikipedia explains this in this way (Open addressing, or closed hashing, is a method of the collision resolution in hash tab Les. With this method a hash collision was resolved by probing)

Analysis of the first method--linked list method:

In the worst case, that is all H (k) points to the same slot, then the hash table is actually a linked list, the time complexity of querying a value in the list is θ (n), in the best case, no collision occurs so the time is θ (1). Define α=n/m as the load factor for the hash table, a successful search average time θ (1+Α/2) 1 means the hour of calculation H, Α/2 represents the average time spent in the linked list, so if N=o (m) then α is the constant, the time to search in this hash table is θ (1), and Consider the worst case search on average, time θ (1+α),

Analysis of the second method-"open addressing" (enclosing hash)
This method mainly through the "quest" to find the next empty position in the hash table, the value of the store in the same way when the query, step by step to find the target key value.
The methods to explore are:
1. Linear Search
2. Nonlinear exploration
3. Double Hash Quest
4. Pseudo-Random sequence search
These methods have some limitations and may result in top-level or sub-aggregation
Now to analyze the efficiency of open addressing, let's first give a theory: for an open addressing hash table, α=n/m<1, the expected number of unsuccessful searches is 1/(1-α).
Thus, if α=50% so expected to explore the number of 2, if α=90%, the expected number of exploration will be significantly increased to 10, so in this strategy, the size of α is very important (Lenovo to the same day birthday problem, also this reason), in engineering some of the hash table using this strategy will force α less than 75%, If this value is exceeded, the hash table is automatically expanded.
The expected number of search 1/(1-α) is how to calculate, as follows:
1. First, query a value of at least 1 times to explore
2. There is a possibility of n/m collision, we need a second search
3. There is the possibility of (n-1)/(M-1) The second quest also collided
......
Observed (N-i)/(m-i) <ΑI=1,2,3......N


Introduction to the MIT algorithm--seventh. Hash table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.