Summary of hash tables

Source: Internet
Author: User

The following is my summary of the hash table. Most of the conceptual things come from Yan Weimin's Data Structure book. I have added some of my personal experience. The main purpose of this article is

It is easy for me to read in the future, so it is a bit like an outline.

1. Definition: maps a set of keywords to a finite continuous address set (range) based on the set hash function H (key) and the method for dealing with conflicts, and uses the key
The image in the address set is used as the storage location of records in the table. Such a table is called a hash table. This image process is called a hash table or a hash table.
The obtained storage location is the hash address or hash address.
Objective: To establish a definite correspondence between record locations and keywords to improve the search speed
Case: The symbol table in the Compilation Principle and the State query in the state machine
Implementation: there can be multiple types of data, depending on the actual situation. Storage and search can be separated. Generally, the data structure is either insert fast search slow or insert slow search fast.
The combination of data structures can achieve satisfactory insertion and search speeds, but it usually takes up memory. In addition, the query speed of the hash table is not absolute.
It is also related to the construction method of the hash function and the conflict processing method.
2. Construction Method of the hash function:
(1) Direct addressing
(2) Digital Analysis
(3) China and France
(4) Folding Method
(5) Addition to the residue method (most commonly used)
(6) Random Number Method
These methods can be used in combination. Note that some hash tables can not be dynamically generated by the program or manually generated. They are mostly seen in applications with fixed data.
Manually generated hash tables are mostly constructed by direct addressing. For example, CINT Script Engine's extended function query uses this method.
3. Factors to consider for the hash function are:
(1) Time required to calculate the hash function
(2) Length of keywords
(3) hash table size
(4) keyword Distribution
(5) record search frequency
4. Conflict Handling Methods
Generally, conflicts can only be reduced, which is difficult to avoid. Therefore, it is necessary to handle conflicts. There are generally the following methods:
(1) Open addressing Method
(2) rehash
(3) link address Method
(4) Establish a public overflow Zone
In general, the probability of conflict is related to the fill factor. (3) and (4) can solve the conflict after one processing, and (1) and (2) it may take multiple times,
(1) Some methods are like testing methods. When there are many conflicts (1) and (2), the efficiency is low. (4) This method is relatively negative and suitable for scenarios with few conflicts, if there are many conflicts, (4) Search Efficiency
Will be significantly reduced. In short, everything should be selected based on the actual application.
5. Several Concepts
(1) Uniform hash function: This function is called if the probability that any keyword in the set of keywords is mapped to any address in the set of addresses by the hash function is equal.
For a uniform Hash Function
(2) Fill Factor A = (number of records filled in the table)/(the length of the hash table), control the fill factor to reduce conflicts
(3) Conflict: K1! = K2, F (K1) = f (K2)

 

 

If you have time, add an example.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.