On algorithm and data structure 11 hash table

Source: Internet
Author: User
Tags hash time limit

In the previous series, we introduced sequential lookups based on unordered lists, binary lookups based on ordered arrays, balanced lookup trees, and red and black trees, and the following figure is the time complexity of their average and worst:

It can be seen that in the time complexity, the red-black tree inserts, searches and deletes on average to reach the LGN time complexity.

So there is no more efficient data structure, the answer is this article next to introduce a hash table, also known as a hash table

What is a hash table

A hash table is a structure that stores data in key-value (key-indexed), and we can find its corresponding value as long as we enter the value to be looked for as key.

The idea of hashing is simple, if all the keys are integers, then a simple unordered array can be implemented: the key is indexed and the value is its corresponding value, so you can quickly access the value of any key. This is the case for a simple key, which we extend to a key that can handle more complex types.

There are two steps to using a hash lookup:

Uses the hash function to convert the lookup key to the index of the array. Ideally, different keys will be converted to different index values, but in some cases we need to handle multiple keys being hashed to the same index value. So the second step in the hash lookup is to handle the conflict handling hash collision. There are many ways to handle hash collision conflicts, and the zipper method and the linear detection method are introduced later in this article.

A hash table is a classic example of a trade-off between time and space. If there is no memory limit, you can directly index the key as an array. Then all the search time complexity is O (1), if there is no time limit, then we can use unordered array and order lookup, which requires very little memory. The hash table uses a modest amount of time and space to find a balance between the two extremes. You just need to adjust the hash function algorithm to make trade-offs in time and space.

hash function

The first step in hash lookup is to use a hash function to map the keys to an index. This mapping function is the hash function. If we have a save 0-m array, then we need a hash function that converts any key to an index (0~M-1) within the range of the array. Hash functions need to be easy to compute and distribute all keys evenly. For example, a simple example, the use of mobile phone number three is better than the first three digits as key, because the first three mobile phone number is very high repetition rate. For example, the number of years of birth with an ID number is better than the number of previous digits.

In practice, our keys are not all numbers, they may be strings, there may be combinations of several values, so we need to implement our own hash function.

1. Positive integers

The most common way to get a positive integer hash is to use the divide-by-residue method. That is, for an array of prime m of the size, for any positive integer k, the remainder of K divided by M is computed. M general Prime.

2. String

When a string is used as a key, we can also use it as a large integer, using a retention method. We can take each character that makes up a string and hash it, for example

public int GetHashCode (string str)
{
    char[] s = str. ToCharArray ();
    int hash = 0;
    for (int i = 0; i < s.length i++)
    {
        hash = S[i] + (hash); 
    }
    return hash;
}

The above hash value is the Horner method for calculating the hash value of the string, which is:

H = s[0] l–1 + s[l–3] 2 + s[l–2] 1 + s[l–1] 0

For example, for example, to get "call" hash value, string C corresponds to Unicode for 99,a corresponding Unicode to 97,l corresponding to Unicode 108, so the string "call" hash value of 3045982 = 99 313 + 97 312 + 108 311 + 108 310 = 108 + 31 · (108 + 31 · (97 + 31 · (99)))

If hashing a value on each character can be time-consuming, you can save time by taking n characters at intervals to get the Hasi value, for example, to obtain a hash value of every 8-9 characters:

public int GetHashCode (string str)
{
    char[] s = str. ToCharArray ();
    int hash = 0;
    int skip = Math.max (1, S.LENGTH/8);
    for (int i = 0; i < s.length I+=skip)
    {
        hash = S[i] + (hash);
    }
    return hash;
}

However, in some cases, different strings produce the same hash value, which is the hash conflict (hash collisions) mentioned earlier, such as the following four strings:

If we hash every 8 characters, we get the same hash value. So here's how to solve the Greek collision:

Avoid hash conflicts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.