On algorithm and data structure 11 hash table

Last Update:2017-02-27 Source: Internet

Author: User

Tags hash time limit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the previous series, we introduced sequential lookups based on unordered lists, binary lookups based on ordered arrays, balanced lookup trees, and red and black trees, and the following figure is the time complexity of their average and worst:

It can be seen that in the time complexity, the red-black tree inserts, searches and deletes on average to reach the LGN time complexity.

So there is no more efficient data structure, the answer is this article next to introduce a hash table, also known as a hash table

What is a hash table

A hash table is a structure that stores data in key-value (key-indexed), and we can find its corresponding value as long as we enter the value to be looked for as key.

The idea of hashing is simple, if all the keys are integers, then a simple unordered array can be implemented: the key is indexed and the value is its corresponding value, so you can quickly access the value of any key. This is the case for a simple key, which we extend to a key that can handle more complex types.

There are two steps to using a hash lookup:

Uses the hash function to convert the lookup key to the index of the array. Ideally, different keys will be converted to different index values, but in some cases we need to handle multiple keys being hashed to the same index value. So the second step in the hash lookup is to handle the conflict handling hash collision. There are many ways to handle hash collision conflicts, and the zipper method and the linear detection method are introduced later in this article.

A hash table is a classic example of a trade-off between time and space. If there is no memory limit, you can directly index the key as an array. Then all the search time complexity is O (1), if there is no time limit, then we can use unordered array and order lookup, which requires very little memory. The hash table uses a modest amount of time and space to find a balance between the two extremes. You just need to adjust the hash function algorithm to make trade-offs in time and space.

hash function

The first step in hash lookup is to use a hash function to map the keys to an index. This mapping function is the hash function. If we have a save 0-m array, then we need a hash function that converts any key to an index (0~M-1) within the range of the array. Hash functions need to be easy to compute and distribute all keys evenly. For example, a simple example, the use of mobile phone number three is better than the first three digits as key, because the first three mobile phone number is very high repetition rate. For example, the number of years of birth with an ID number is better than the number of previous digits.

In practice, our keys are not all numbers, they may be strings, there may be combinations of several values, so we need to implement our own hash function.

1. Positive integers

The most common way to get a positive integer hash is to use the divide-by-residue method. That is, for an array of prime m of the size, for any positive integer k, the remainder of K divided by M is computed. M general Prime.

2. String

When a string is used as a key, we can also use it as a large integer, using a retention method. We can take each character that makes up a string and hash it, for example

public int GetHashCode (string str)
{
    char[] s = str. ToCharArray ();
    int hash = 0;
    for (int i = 0; i < s.length i++)
    {
        hash = S[i] + (hash); 
    }
    return hash;
}

The above hash value is the Horner method for calculating the hash value of the string, which is:

H = s[0] ^l–1 + s[l–3] ² + s[l–2] ¹ + s[l–1] ⁰

For example, for example, to get "call" hash value, string C corresponds to Unicode for 99,a corresponding Unicode to 97,l corresponding to Unicode 108, so the string "call" hash value of 3045982 = 99 31³ + 97 31² + 108 31¹ + 108 31⁰= 108 + 31 · (108 + 31 · (97 + 31 · (99)))

If hashing a value on each character can be time-consuming, you can save time by taking n characters at intervals to get the Hasi value, for example, to obtain a hash value of every 8-9 characters:

public int GetHashCode (string str)
{
    char[] s = str. ToCharArray ();
    int hash = 0;
    int skip = Math.max (1, S.LENGTH/8);
    for (int i = 0; i < s.length I+=skip)
    {
        hash = S[i] + (hash);
    }
    return hash;
}

However, in some cases, different strings produce the same hash value, which is the hash conflict (hash collisions) mentioned earlier, such as the following four strings:

If we hash every 8 characters, we get the same hash value. So here's how to solve the Greek collision:

Avoid hash conflicts

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More