Hash Table Summary

Last Update:2015-03-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hash table also known as hash list, there is a direct translation of the hash table, hash is a special data structure, it is the array, linked list and two-fork sorting tree and other compared with the obvious difference, it can quickly locate the record you want to find, rather than the table in the existence of the record of the keyword comparison to find. This is derived from the particularity of the design of the hash table, which uses the idea of function mapping to correlate the stored location of the record with the key words of the record, so that it can be searched very quickly.

Design idea of 1.Hash table

For general linear tables, such as linked lists, if you want to store contact information:

Zhang 313,980,593,357 Li 415,828,662,334 Wang 513,409,821,234 Zhang Shuai 13890583472

Then a struct may be designed to contain the information of the name, mobile number, and then the information of the 4 contacts is stored in a linked list. When looking for the "Li 415,828,662,334" record in this linked list or want to get John Doe mobile phone number, it may be from the head node of the list to traverse, the name of each node in order to compare the "John Doe", until the search succeeded or failed, the time complexity of this practice is O (n). Even if the binary sort tree is used for storage, the maximum is O (logn). Assuming that the "John Doe" This information directly to the record in the table storage location, you can dispense with the intermediate keyword comparison of this link, the complexity of the direct drop to O (1). The hash table will be able to achieve this effect.

The hash table uses a mapping function f:key-> address to map the keyword to the location where the record is stored in the table, so that when you want to find the record, you can directly calculate the location of the record in the table based on the keyword and the mapping relationship, typically This mapping relationship is called the hash function, and the storage location (note that the storage location is only the storage location in the table, not the physical address) is called as the hash address by the hash function and the keyword. For example, if the contact information is stored in a hash table, when you want to find the "John Doe" information, the hash address can be calculated directly according to the "John Doe" and the hash function. The following is a discussion of several key issues in hash table design.

1. Design of hash function

The design of hash function directly affects the operation efficiency of hash table. The following examples illustrate:

If the above contact information is stored, the hash function used is: the name of each word of the phonetic opening letter of the sum of the ASCII code.

So address (Zhang San) =ascii (Z) +ascii (S) =90+83=173;

Address (John Doe) =ascii (L) +ascii (S) =76+83=159;

Address (Harry) =ascii (W) +ascii (w) =87+87=174;

Address (Zhang Shuai) =ascii (Z) +ascii (S) =90+83=173;

If only these 4 contact information needs to be stored, this hash function is poorly designed. First, it wastes a lot of storage space, if the use of a char array to store contact information, you need to open at least 174*12 bytes of space, the space utilization of only 4/174, less than 5%; In addition, after calculating the result by the hash function, address (Zhang San) and address (John Doe) has the same address, this phenomenon is called conflict, for 174 storage space only need to store 4 records in conflict, such a hash function design is very unreasonable. Therefore, when constructing the hash function, we should consider the distribution characteristics of the key words so as to design the function so that the hash address is distributed randomly and evenly throughout the address space. There are usually several ways to construct a hash function:

1) Direct Addressing method

Take a keyword or a linear function of the keyword is the hash address, that is, =a*key+b; if you know that the student number starts at 2000 and the maximum is 4000, address (key) =key-2000 can be used as the hash.

2) The method of square take

Perform a square operation on the keyword, and then take the middle of the result as a hash address. If you have the following keyword sequence {421,423,436}, the result after the square is {177241,178929,190096}, then you can take {72,89,00} as the hash address.

3) Folding method

Split the keywords into parts, and then combine the sections together to convert into a hash address in a specific way. If you know that the ISBN number of the book is 8903-241-23, you can use Address (key) =89+03+24+12+3 as a hash.

4) In addition to the retention method

If you know that the maximum length of the hash table is m, you can take the maximum prime number p not greater than M, and then the key word to take the remainder operation, address (key) =key%p.

Here P's selection is very critical, p chooses the good words, can minimize the conflict, p generally take no greater than M maximum prime number.

Determination of 2.Hash Table size

The size of the hash table is also very critical, if the hash table space is far greater than the last actual number of records stored, it creates a lot of space waste, if the selection is small, it is prone to conflict. In the actual situation, it is generally necessary to determine the size of the hash table based on the number of final records stored and the distribution characteristics of the keywords. There is also a situation where you may not know the number of records that ultimately need to be stored, you need to dynamically maintain the capacity of the hash table, you may need to recalculate the hash address.

3. Resolution of conflicts

In the above example, a conflict occurs, so a workaround is needed, otherwise the record cannot be stored correctly. Typically there are 2 workarounds:

1) Open addressing method

That is, when a keyword and another keyword conflict, using a detection technology in the hash table to form a detection sequence, and then follow the detection sequence to find the next, when an empty unit is encountered, it is inserted. More commonly used detection methods have linear detection method, such as a set of keywords {12,13,25,23,38,34,6,84,91},hash table length is the 14,hash function for address (key) =key%11, when inserting 12,13,25 can be inserted directly, When 23 o'clock is inserted, the address 1 is occupied, so it is probed down the address 1 (the probe step can be based on the case), until address 4 is detected and null is found, then 23 is inserted.

2) Chain Address method

Using the combination of arrays and linked lists, the records with the same hash address are stored in a linear table, and the ordinal of each table header is the calculated hash address. In the example above, the hash table stored using the chain address method is represented as:

While there are ways to reduce conflict, conflicts cannot be completely avoided. It is therefore necessary to choose a solution to the conflict based on actual situation.

Average lookup length for 4.Hash tables

The average lookup length of a hash table includes the average lookup length at the time of the search success and the average lookup length when the lookup fails.

Average lookup length When finding success = number of comparisons/elements in the table for each element in the table when the lookup succeeds;

The average lookup length when lookup is unsuccessful is equivalent to the average number of comparisons when finding an element in a table is unsuccessful, and can be understood as inserting an element into a table, which is possible at each location, and then calculating the number of times to be compared at each location to be inserted, divided by the length of the table, which is the average lookup for unsuccessful lookups.

Here's an example:

With a set of keywords {23,12,14,2,3,5}, with a table length of 14,hash function as key%11, the key is stored in the table as follows:

Address 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Keywords 23 12 14 2 3 5

Number of comparisons 1 2 1 3 3 2

So the average lookup length when finding success is (1+2+1+3+3+2)/6=11/6;

The average lookup length for a lookup failure is (1+7+6+5+4+3+2+1+1+1+1+1+1+1)/14=38/14;

Here is a concept filling factor = The number of records in a table/the length of a hash table, if the loading factor is smaller, indicating that there are many empty cells in the table, the less likely the conflict is, and the larger the filling factor, the greater the likelihood of a conflict and the more time it takes to find it. Therefore, the average lookup length of the hash table is related to the filling factor. It is proved that when the filling factor is around 0.5, the performance of the hash can be achieved optimally. Therefore, in general, the loading factor takes experience value of 0.5.

Advantages and disadvantages of 5.Hash tables

The advantages of the hash table are obvious, it can be found on the time complexity of the constant level, and it is easier to insert the data and delete the data. However, it also has some drawbacks, such as not supporting sorting, generally requires more space than using linear table storage, and the recorded keywords cannot be duplicated.

Code implementation:

Press CTRL + C to copy the code<textarea></textarea>Press CTRL + C to copy the code

Hash Table Summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hash Table Summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hash Table Summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support