Introduction to Algorithms-chapter 11 hash Table

Source: Internet
Author: User

1 order

In many applications, a dynamic collection structure is used, which supports only insert, search, and delete three dictionary operations. For example, the compiler of a computer programming language needs to maintain a symbol table, where the key of an element is any string that corresponds to an identifier in the language. A valid data structure that implements a dictionary is a hash table.
A hash table is a generalization of an ordinary array, because it can be addressed directly to an array, so that any element of the arrays can be accessed within the time of O (1). For a hash table, the worst-case lookup of an element is the same as the time it was found in the linked list, O (n), but in practice the hash table is usually very efficient, and under some reasonable assumptions, the expected time to find in the hash table is O (1).

2 Direct Addressing table

Direct addressing is a simple and effective technique when the global U of keywords is compared to the hour.
To represent a dynamic array, define an array (direct addressing table) T[0...m-1], where each position is a keyword in the global U. The exact method is as follows:

The dictionary operation implements the pseudo code:

//查询操作DirectAddressSearch(T , k)    returnT[k];//插入操作DirectAddressInsert(T , x)    T[key[x]] = x;//删除操作DirectAddressDelete(T , x)    TNULL;

The time complexity of the direct addressing table is very low, but the problem is that it needs to cover the whole domain memory capacity, the space complexity is high. Therefore, the direct addressing table is not an ideal solution for applications where the global U is large and memory capacity is insufficient.

3 Hash Table

In the direct addressing table, the element with the keyword K is placed in the corresponding slot K. In the hash table now being discussed, the mapping location of the elements of the keyword K will be computed by a hash function h (k).

Obviously, using this method, memory space is occupied from the whole domain | U| reduces the number of keywords to M, greatly saving memory overhead.
However, this can cause problems, and two or more keywords may be mapped to the same slot, which is called collision collisions. This is closely related to the selection of the hash function (described in the next section), and of course, we need not only to reduce collisions through well-designed random hash functions, but also to think about ways to resolve collisions that may occur. Next, several hashing functions and collision Resolution strategies are described in detail.

4 hash function

The hash function h (k) is a function that calculates the location of the keyword K mapping. For the M keyword in the global U, a good hash function should approximate the assumption of simple consistent hashing: a possible hash of every keyword, such as any of the M slots, is irrelevant to which slot the other keywords are mapped. In practice, heuristic techniques are often used to construct good hash functions, and a good practice is to derive the hash value in a way that is independent of any pattern that may exist in the data, such as the "Division hash" that is described next.
The direct addressing table introduced at the beginning is also a hashing method, which h ( k = K , there are three other types of hashing listed below.

4.1 Division Hashing method

The Division hashing method is the remainder of the keyword K divided by M to map K to one of the M slots, that is, the hash function:
h(k)=kmod m
The key to applying the division hashing method is the choice of M.

4.2 Multiplication Hashing method

The multiplication method for constructing a hash function consists of two steps:
The first step is to multiply the keyword K by the constant A (0 < A < 1) and take out the decimal part of Ka;
In the second step, the number of decimal parts is obtained by M, in order to get the bottom value of the result;
The hash function is:
h(k)=F LooR(m(kAmod 1))
One advantage of multiplicative hashing is that it has no special requirements for m selection and is generally set to a power of 2.

4.3 Full-domain hashing

The hash method discussed above is inevitably the worst case scenario, where all keywords are mapped to the same slot, which is an average retrieval time of O (n). In fact, any particular hash function can be the worst case, the only effective method is to randomly select the hash function, so that it is independent of the keyword to be stored, this method is also known as the global hash, the average performance of the best method.
The discussion of the whole-domain hash state is described in the introduction to the algorithm p139~p141.

5 Collision Resolution Strategy 5.1 link method

The link method is one of the simplest collision resolution techniques, and the method chooses to put the elements that are hashed into the same slot in a linked list. For example, in Groove J there is a pointer to the header of all the linked list of elements that are hashed to J, and if no element is mapped to this, the pointer is nil.

The dictionary operation on the hash table T is easy to implement when the link method resolves the collision conflict.

这里写代码片

As can be seen from the above discussion, after the link method resolves the conflict, the insert operation for the hash table can always be implemented in O (1), and the finding operation time complexity is linearly related to the length of the linked list of the element, while the delete operation deletes an element x (pointer node) for the doubly linked list can also be implemented in O (1 If a single-linked list must first find the previous node of the target node based on the input parameters, it is the same complexity as the lookup operation.

5.2 Open Addressing method I linear Detection II two Probe III double hash 5.3 full Hash

Introduction to Algorithms-chapter 11 hash Table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.