Review notes for hash lists in data structures

Source: Internet
Author: User

In addition to the various tree tables, you can also use hashing techniques to represent and implement dynamic lookup tables. Hashing is both a way of storing and a way to find. This lookup method is called Hash lookup. A storage structure constructed by hashing is called a hash table. The core of the hashing technique is the hash function.

A hash function is a function that maps a key value to a storage location in a hash table. For any given dynamic lookup table T, if an "ideal" hash function h and the corresponding hash list L are selected, then for each data element x in T, the function value H (x.key) is where x is stored in the hash table L. When you insert (or build a table), the data element x is placed at that location, and it is found at that location when you look for X.

The data element that is determined by the hash function is stored in the hash table as a hash address. Therefore, the basic idea of hashing is to implement the storage organization and the lookup operation by the correspondence between the key value (X.key) and the hash address (H (X.key)) determined by the hash function.
In the ideal case, the hash function is a one by one correspondence, that is, each key value corresponds to a hash address, and the different key values correspond to different hash addresses. In practical applications, however, this situation rarely occurs. In most cases, "synonyms" and "conflicts" occur are unavoidable.
The hash function h and the key value K1, K2, if K1<>k2 and H (K1) =h (K2), are called K1, K2 is a synonym (relative to h). If there are two data elements in the dynamic lookup table X1, X2 are stored in the same hash list, and their key values are synonyms, this situation is called a conflict.
Of course, we want the synonym to be as small as possible to reduce conflict. On the other hand, due to the inevitability of conflict, consideration must also be given to the approach in the event of conflict. Therefore, two issues to consider when using hashing techniques are:
① How to construct (select) a "uniform" hash function?
What method does ② use to resolve conflicts?
Of course, you also need to consider how the hash table itself is organized. The following are discussed separately.

the construction method of the hash function
   Here are a few common construction methods. The hash function constructed by these methods is simple and "uniform" (that is, fewer synonyms). The following assumes that the hash address is a natural number. In addition, the key values are also assumed to be natural numbers (in fact, the key values can always be converted to natural numbers).

⒈ Digital Analysis MethodThe digital analysis method, also known as the digital selection method, is suitable for the following situations: knowing all the possible key values beforehand, and the number of digits of the key value is more than the number of bits of the hash address. In this case, you can analyze the key values and choose a number of bits with a more uniform distribution to make the hash address.
It is assumed that some of the key values that are known to be present are as follows:
0 0 1 3 1 9 4 2 1
0 0 1 6 1 8 3 0 9
0 0 1 7 3 9 4 3 4
0 0 1 6 4 1 5 1 6
0 0 1 8 1 6 3 7 8
0 0 1 1 4 3 3 9 5
0 0 1 2 4 2 3 6 3
0 0 1 9 1 5 4 0 9
......
It is not difficult to see, (from the left) the first three-bit distribution is uneven, 5th, 7 is also a lot of repetition, it should be discarded five bits. The remaining 4th, 6, 8, and 9 bits are evenly distributed, consider them or several of them as hash addresses. As for the choice of several, further consideration should be given to the capacity of the dynamic lookup table (i.e., the maximum number of data elements) and the form of the hash list.

2. Except congruentialBesides congruential is a simple and effective construction method. This method does not require prior knowledge of all key values. The method: Select an appropriate positive integer p, divide the key value by the P-derived remainder as the hash address, even if hash function h is

H (key) = key% P

The key of this method is the selection of P. If P is an even number, the resulting hash function always maps the odd key value to an odd address. Even-numbered key values are mapped to even addresses. Thus increasing the chances of conflict. If you choose the power of the cardinal value of p, the resulting hash address is actually the last few of the key values, which is not good. Typically, p is the smallest prime number less than or equal to the hash table capacity.

3. The method of square takeThe middle of a number of squares is related to each of these numbers. Using this feature, the square limit method takes the middle of the square of the key value as the hash address. This method is simple to calculate and does not require prior knowledge of the distribution of key values. Also because the square limit can enlarge the difference of the key value, the resulting hash address is more uniform.

4. Cardinal Conversion MethodConvert the key value as another binary number to the original binary number, and then select several as the hash address. For example, for a decimal key value of 210485, it is first treated as a 13-digit number and converted to a decimal number:
21048513=2x135+1x134+0x133+4x132+8x13+8=77193510
Then select several of them as hash addresses. Two cardinality is usually required, and the new cardinality is larger than the original cardinality.

5. Random number methodSelect a random function for the random, with the value of the key value under the function as the hash address:
H (key) = random (key)
It is better to use this method when the number of bits of each key value is different.

Second, the implementation of dynamic lookup table on the open-hash listThe second major issue to consider with hashing technology is how to resolve conflicts. The method of dealing with conflicts is related to the organization form of the hash table itself. There are usually two types of hash lists, in terms of their organizational forms: open and closed lists. Once the hash function and the form of the hash table are selected, the corresponding conflict resolution method can be determined, and then a concrete implementation of the dynamic lookup table can be given. This section first discusses the implementation of the dynamic lookup table on the Open hash list.
The scatter list is organized in the following way. Set the selected hash function to H, the range of H (that is, the collection of hash addresses) of this 0. N-1. Set an "address vector" pointer hp[n], where each pointer hp[i] points to a single linked list that stores all data elements that have a hash address of I, which is a synonym for all hash addresses I. Each such single-linked list is called a synonym child table. A storage structure consisting of an address vector and a synonym sub-table that each pointer in the vector refers to is called an open-hash list.

In the open hash list, all data elements that have a synonym for the key value exist in the same synonym child table, and the elements in the address vector are the header pointers to the synonym sub-table. This is the way to open a hash table to resolve conflicts. This method is sometimes referred to as the "zipper Method". Specific hash functions, hash tables, and methods of dealing with conflicts these three densely related parts or aspects determine a specific hash storage structure for a dynamic lookup table. On this basis, we can further consider the implementation of the basic operation of dynamic lookup table.


Iii. Comparison of open-hash list and closed-hash listThe difference between a hash list and a closed hash list is similar to the difference between a single linked table and a sequential table. The open-hash list uses the link method to store synonyms, does not generate the accumulation phenomenon, and makes the basic operation of dynamic find, especially find, sow and delete easy to be realistic. However, the storage overhead is increased because of the additional pointer domain. The closed hash list does not require additional pointer fields, so storage efficiency is high. But the problem is that it is easy to accumulate, and some basic operations are difficult to achieve. Because the idle location is a condition for unsuccessful lookups, the delete operation cannot be simply empty if it is implemented, otherwise the lookup path of the subsequent hash address sequence will be truncated. Therefore, the deletion on the closed list can only be marked on the node to be deleted. Only when you run to a certain stage, after the overall consideration and collation, you can really delete the marked node.
The primitive motive of hashing technology is to complete the lookup without the key-value ratio. Because of the existence of synonyms (and accumulation), this "ideal" is not fully realized. For a hash list, it is still necessary to compare the given value to the key value of each node in the synonym's child table, and for a closed-hash list, compare the given value to the key value of the node in the subsequent hash address sequence. The average lookup length is shorter because the hash table does not generate a heap.
Finally, because the table length of each synonym in the hash table is dynamic, there is no need to determine the table's capacity in advance (the name of the hash list), whereas the Closed hash list has to estimate the capacity beforehand. Therefore, the open-hash list is more suitable for occasions where capacity is difficult to estimate beforehand.

It should be added that the linear detection method and the two-time detection method are presented in this book as a method of resolving conflicts in a closed-hash list. But in some textbooks, these two methods are called "Open addressing Law". Be careful not to confuse this "open" (meaning that the hash address is open for all data elements) with the "Open hash list" of the book (which means that the storage space for the hash table is open).

Review notes for hash lists in data structures

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.