"Python algorithm" hash storage, hash table, hash list principle

Source: Internet
Author: User

Definition of hash table:

The basic idea of hash storage is to calculate the corresponding function value (hash address) with the keyword key as an independent variable, through a certain function relation (hashing function or hash function), take this value as the address of the data element, and deposit the data element into the storage unit of the corresponding address.

The hash address is computed using the same function for the keyword you are looking for and then directly to the appropriate storage unit to fetch the data element you want to find.

Application of hash Table:

Hash table is an effective data structure to implement dictionary operation.

Despite the worst case, the time to find an element in the hash table is the same as the time it was found in the list, reaching O (n).

In practical applications, however, the performance of hash lookups is excellent. Under some reasonable assumptions, the average time to find an element in a hash table is O (1).

To establish a hash table operation steps:

1) Step1 The key of the data element and computes its hash function value (address). If the storage space for the address is not yet occupied, the element is deposited, otherwise the execution step2 resolves the conflict.

2) Step2 calculates the next storage address for the keyword key based on the selected conflict handling method. If the next storage address is still occupied, continue with the STEP2 until the available storage address is found.

Common hash Functions:

There are many ways to construct a hash function, and the general principle is to map the keyword collection space evenly to the address set space as much as possible, while minimizing the probability of collisions occurring.

1. Residue Remainder method:

H (key) = key% P (p≤m)

The keyword is divided by the remainder of P as the hash address, and P is best to select a maximum number that is less than or equal to M (the count of hash address collections)

Hash table length 8 16 32 64 128 256 512
Maximum prime number 7 13 31 61 127 251 40R

2. Direct Address method

H (key) = A * key + B; This "A-B" is a constant.

3. Digital Analysis method

For example, there is a group of key1=112233,key2=112633,key3=119033,

For such a number we analyze the number of the middle of the two-number comparison fluctuation, the other number is unchanged. Then we can take the value of the key is key1=22,key2=26,key3=90.

4, the square take the Chinese law

Ignored here, see the meaning of the name.

5. Folding method

For example, key=135790 requires that key be a 2-digit hash value. Then we change the key to 13+57+90=160, then we remove the high "1", at this time key=60,

This is the relationship between them and the purpose of this is to address each key is related to the "hash address" as far as possible to scatter the target.

Conflict Handling methods:

An important factor affecting the efficiency of hash lookup is the hash function itself. When two different data elements have the same hash value, a conflict occurs. To reduce the likelihood of a conflict, the hash function should map the data as far as possible to each table entry in the Hashtable.

There are two ways to resolve conflicts:

  (1) Open address law 

If the hash value of the two data elements is the same, another table entry is selected in the hash table for the later inserted data element.

When the program looks for a hash table, if the data element matching the lookup requirement is not found in the first corresponding Hashtable entry, the program continues to look for it until it finds a data element that meets the lookup requirements, or encounters an empty table entry.

①. Linear detection method

This method, when resolving a conflict, detects the next address in turn, until an empty address is inserted, and if the entire space is searched and still cannot find a spare address, an overflow occurs.

Hi = (H (Key) + di)% m (i =,..., K, k≤m-1)

Address increment di =,..., m-1, where I is the number of probes

②. Two-time detection method

Address increment sequence is: di = 12,-12,22,-22, ..., Q2,-Q2 (Q≤M/2)

③. Double-hash Function detection method

Hi = (H (key) + I * RH (key))% m (i = m-1,...,)

H (key), the RH (key) is two hash functions, and M is the hash table length.

The first hash function is used to calculate the hash address of the keyword, once the address conflict is generated, then the second function to determine the moving step of the yin son, and finally through the step factor sequence by the probe function to find a free hash address.

H1 = (a+b)%m, H2 = (A + 2b)%m, ..., Hm-1 = (A + (m-1) *b)%m

  (2) Chain address method

Data elements with the same hash value are stored in a linked list, and in the process of finding a hash table, a linear lookup method must be used when finding the linked list.

    

The implementation of the Python dictionary dict is to resolve the conflict using the two probes in the open addressing method.

?? Reference links

"Python algorithm" hash storage, hash table, hash list principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.