Hashing algorithm-The principle of fast table checking

Source: Internet
Author: User

In the actual problem, the data query according to the given value is often encountered, for example, in the telephone directory to inquire about a person's telephone number, in the library according to the ISBN number to find the location of a book, in the map in accordance with coordinates to find a place's place name and so on.

Definition of a dictionary

We have used dictionaries, such as English-Chinese dictionary, idiom Dictionary, book Search directory, telephone book, etc. can also be regarded as a generalized dictionary. In computer science, a dictionary is also considered a data structure.

  we define the dictionary as a set of "key-value pairs" (Key-value pair). according to different problems, we give the name and value different meanings, for example, in English-Chinese dictionary, the English word is the name, the Chinese interpretation of the word entry is the value;

  The most basic operations of a dictionary include find (Find), add (insert), remove (delete), respectively, to retrieve data from the dictionary, insert data, and delete data. In the actual store, we store "key-value pairs" in the record and identify the "key-value pair" by key. The correspondence between the location of the "key-value pair" and its key is represented by a two-tuple: (The position of the key, the value).

The simplest way to look up a key-value pair from a dictionary is to use an array store and then iterate through the array when it is searched, and the key-value pair is found when traversing to the same name as the key-value pair that is being searched.

This most simple way must not meet the actual requirements, so people have invented a very efficient way to retrieve the organization dictionary data, that is, the hash table structure.

Hash table and Hash method

Hashing Method: establish a corresponding function relationship hash () between the key and the storage location of the key-value pair, so that each key corresponds to a unique storage location in the structure:

Storage location =hash (key)

  In the search, the key is first hash operation, the value of the evaluated as "key-value pairs" of the storage location, in the structure in accordance with this position to take "key-value pairs" to compare, if the key is equal, then the search is successful.

  When storing "key-value pairs", the same hash function is used to calculate the storage location and store it at this location, which is called the hashing method, also called the hash method. The conversion function hash used in the hashing method is called a hash function (or hash function).

The table constructed by this algorithm is called a Hashtable (or hash list).

The hash function establishes a mapping from the key-value pair to the hash table address set, and with the hash function, we can determine the address of the "key-value pair" position in the Hashtable based on the key. Using this method, because there is no need to do multiple key comparisons, so its search speed is very fast, many systems use this method to organize and retrieve data.

Conflict and conflict resolution

  the value range of a key is usually much larger than the hash table address set, so it is possible to calculate the same hash function by mapping different keys to the same address, which is called a conflict. For example, there is a set of "key-value pairs" whose keys are 12361, 7251, 3309, 30976, and the hash function used is:

public static int hash (int key) {

return key%73+13420;

}

You will get hash (12361) =hash (7251) =hash (3309) =hash (30976) =13444, that is, the different keys are mapped to the same address by a hash function, which we call a synonym for the same hash calculation result.

If the "key-value pair" has a conflict when it joins a hash table, it has to find another place to store it, too much conflict can reduce the efficiency of data insertion and search, so we hope to find a function that is not prone to conflict, that is, to construct a hash function with a more uniform address distribution.

Commonly used hash functions include: direct addressing method, digital analysis method, residue remainder method, multiplicative remainder method, square take method, folding method, etc. Appropriate methods should be chosen according to the characteristics of the key codes in the actual work.

  Although the use of the appropriate hashing method can reduce the probability of conflict, but the conflict is still unavoidable, the most common way to deal with the conflict is the "bucket" algorithm: If the hash table has a M address, it will be changed to M "Bucket", its bucket number and hash address one by one corresponding to each bucket is used to store the synonym of the key, That is, if two different keys are computed with a hash function to get the same hash address, they are placed in the same bucket and retrieved sequentially in the bucket.

Hashing algorithm-The principle of fast table checking

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.