Hash list (hash table)--Introduction to Algorithms (13)

Source: Internet
Author: User

1. Introduction

Many applications require a dynamic collection structure, which requires at least support for Insert,search and delete dictionary operations . Hash table is an effective data structure to implement dictionary operation.

2. Direct Addressing table

Before we introduce the hash table, we introduce the direct addressing tables before.

Direct addressing is a simple and effective technique when the keyword's global U (The range of keywords) is compared to hours. We assume that an application is going to use a dynamic set where the keywords for each element are taken from the whole domain u={0,1,...,m-1}, where M is not a large number. Also, assume that the keywords for each element are different.

To represent a dynamic collection, we use an array, or direct addressing table (direct-address table), denoted as t[0~m-1], where each position (slot, slot) corresponds to a keyword in the global U, and the corresponding rule is that slot K points to the element in the collection with the key K, T[k]=nil If there are no elements of the keyword K in the collection.

Several dictionary operations are very simple to implement:

The time for each of these operations is O (1) time.

In some applications, we can actually save space by saving the object as an element directly in the slot of the addressing table, rather than using the pointer to point to the object as shown.

3. Hash table

(1) Disadvantages of direct addressing

As we can see, the direct addressing technique has several obvious drawbacks: if the global U is large, then the table T will apply for a very long space, it is likely to request a failure, for the large domain, but the elements are very sparse, using this storage method will waste a lot of storage space.

(2) hash function

In order to overcome the disadvantage of direct addressing technology and maintain the advantages of its fast dictionary operation, we can use hash function

H:U→{0,1,2,...,M-1}

To calculate the location of the keyword K, simply speaking, the function of the hash function h (k) is to map a large range of keywords to a smaller set of scopes. At this point we can say that an element with the keyword K is hashed to the slot H (k), or H (k) is the hash value of the keyword K.

As follows:

A problem arises: Two keywords may be mapped to the same slot (which we call conflict (collision)), and this happens regardless of how you optimize the H (k) function (because | U|>M).

So we now face two problems, one is how to solve the conflict, and the other is to find a function h (k) to minimize the conflict;

(3) Resolving conflicts through linked list method

Let's solve the first problem first.

The solution is to "concatenate" the elements that are simultaneously hashed into the same slot in the form of a linked list, which holds a pointer to the linked list. As shown in the following:

With this workaround, we can do the dictionary operation in the following way:

Let's analyze the performance of each operation.

The first is the insert operation, and the time is obviously O (1).

Then analyze the delete operation, which takes the time equivalent to deleting an element from the linked list: If the list t[h (k)] is a doubly linked list, the time spent is O (1), and if the list t[h (k)] is a single-linked list, the time spent is the same as the incremental run time of the find operation.

Here we focus on the search run time:

First, we assume that any given element is potentially hashed in any slot in the hash table, regardless of where other elements are hashed in T. We call this hypothesis a simple uniform hash (simply uniform hashing).

Without losing its generality, we set the M slots of the hash table to hash n elements, then the average of each slot is hashed out α= n/m elements, we call α the loading factor of T (load factor). We remember that the link table in slot J is t[j] (j=1,2,...,m-1), and NJ represents the length of the list t[j], so there are

n = n0+n1+...+nm-1,

and E[nj] =α= n/m.

Now we are looking for success and finding unsuccessful two case discussions.

① Find not successful

In the case of unsuccessful lookups, we need to traverse each element of the linked list t[j], and the length of the list t[j] is α, so it takes time O (α), plus the index to T (j) Time O (1), the total time is θ (1 + α).

② Find Success

In the case of a successful lookup, we cannot know exactly where to stop the traversal to the list t[j], so we can only discuss the average situation.

We set XI to be the first element of the hash table T (assuming we 1~n the n elements in the hash table in order of insertion), Ki represents xi.key, where I = 1,2,...,n, and then defines the random variable xij=i{h (ki) =h (kj)}, i.e.:

Under the assumption of simple uniform hashing

P{h (ki) =h (kj)} = 1/m,

E[xij] = 1/m.

The expectation of the number of elements to be examined is:

Therefore, the time required for a successful check is O (2 +α/2–α/2n) =θ (1 + α).

Combined with the above analysis, on average, all dictionary operations can be completed in O (1) time.

4. Hash function

Now let's solve the second problem: How to construct a good hash function.

A good hash function should (approximately) satisfy a simple uniform hash: Each keyword can be hashed to each slot, regardless of which slot is hashed to other keywords (but unfortunately, we are not generally able to verify that this condition is true).

In practical applications, it is often possible to use heuristic methods to construct good hash functions. During the design process, useful information about the distribution of keywords can be leveraged. A good way to export a hash value should be independent of any pattern that the data may exist in some way.

Here are two basic ways to construct a hash function:

(1) Division hashing Method

The method of division hashing is simple, that is, the keyword K is removed to a number m, the remainder, so that the K map to one of the M slots, that is, the hash function is:

H (k) = k mod m,

The speed of the method is very fast because only one division operation is done. It should be noted, however, that when we select the value of M, we should avoid some of the selected values. For example, M should not be an integer power of 2, because if M = 2 ^ p, then H (k) is the P-lowest digit of K. Unless we already know that the arrangement of the lowest p-bits is possible, we'd better choose m carefully. And a prime number that is not too close to the 2 integer power is often a better choice.

(2) multiplication hashing method

The method consists of two steps. The first step: Multiply the keyword K by A (0 < A < 1) and extract the decimal part of Ka; the second step: multiply the value by M, and in the downward rounding, the hash function is:

H (k) = [m (kA MoD 1)],

Here "Ka MoD 1" is to take the meaning of Ka decimal part, namely Ka–[ka].

One of the advantages of the multiplicative hashing method is that the choice of M is not particularly critical, and it is generally chosen as an integer power of 2. Although this method applies to arbitrary a, Knuth believes that a≈ (√5-1)/2 = 0.618033988 ... is a more desirable value.

Hash list (hash table)--Introduction to Algorithms (13)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.