Hash Functions and linear detection and re-partitioning

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As we know, generally, the URLs of many webpages are extremely long. Do you really need to allocate such lengthy space for each webpage to store them, we believe that these stupid behaviors that let the server crash will not be adopted by Internet giants, so they will choose to use Hash Functions to store these addresses.

For a general linear table, the relative positions of records in the structure in the tree are random, that is, there is no definite relationship with the record keywords. Therefore, when searching records in the structure, you must compare them with keywords. This type of search method is based on comparison. The search efficiency depends on the number of comparisons in the search process. Ideally, you can directly find the desired record. Therefore, you must establish a definite correspondence between the storage location of the record and Its keywords, match each keyword with a unique storage location in the structure.

The elements in the hash table are determined by the hash function. Using the key word K of the data element as the independent variable, the calculated value is the storage address of the element through a certain functional relationship (called a hash function), which is expressed as: ADDR
= H (key ).

The processing of the hash function conflicts with the hash function. In a hash table, different keyword values correspond to the same storage location. That is, the key word k1 = k2, but h (K1) = H (K2 ). A uniform hash function can reduce conflicts, but cannot avoid conflicts. After a conflict occurs, it must be resolved; that is, you must find the next available address.

To handle such conflicts, we can use the following methods:

1. Zipper Method

Pulling a dynamic linked list instead of a static sequential storage structure can avoid hash function conflicts. However, the disadvantage is that the linked list design is too cumbersome and increases programming complexity. This method can completely avoid conflicts between hash functions.

2. Multi-Hash Method

Two or more types of hash functions can be designed to avoid conflicts, but the probability of conflicts is still there. The better or more function design can minimize the probability (unless the character is too bad, otherwise, it is almost impossible to conflict ).

3. Open address Method

There is a formula for the open address method: HI = (H (key) + DI) mod m I = 1, 2,..., K (k <= m-1)

M indicates the table length of the hash table. Di is the incremental sequence when a conflict occurs. If the di value may be 1, 2, 3, M-1, it is called linear detection and then hashed.

If Di is set to 1, after each conflict, move one position backward. if the di value may be 1,-,-9,-9, 16,-16 ,... K * k,-K * K (k <= m/2)

It is called secondary detection and then hashed. If the di value may be a pseudo-random series. It is called pseudo-random detection and then hashed.

4. domain creation

Assume that the value of the hash function is [0 M-1], then set the vector hashtable [0 .. m-1 the storage space vector overtable [0 .. v] used to store conflicting records.

The following is an example of a simple hash function. In this example, we use the P. J. Weinberger hash algorithm:

// P. J. Weinberger Hash unsigned int PJWHash(char *str){    unsigned int BitsInUnignedInt = (unsigned int)(sizeof(unsigned int) * 8);    unsigned int ThreeQuarters    = (unsigned int)((BitsInUnignedInt  * 3) / 4);    unsigned int OneEighth = (unsigned int)(BitsInUnignedInt / 8);    unsigned int HighBits = (unsigned int)(0xFFFFFFFF) << (BitsInUnignedInt                                                - OneEighth);    unsigned int hash    = 0;    unsigned int test    = 0;     while (*str)    {        hash = (hash << OneEighth) + (*str++);        if ((test = hash & HighBits) != 0)        {            hash = ((hash ^ (test >> ThreeQuarters)) & (~HighBits));        }    }     return (hash & 0x7FFFFFFF);}

I believe that after reading the above program, many people will say, what is this spam algorithm? This is inexplicable. Indeed, in hash functions, this is an algorithm with low efficiency. For effective hash data storage, it can be completed without complex Address Allocation. Therefore, to truly complete a hash function suitable for your file storage, you must not generalize it.

This article is almost the same. Maybe you will say why linear detection is not mentioned and then hashed out. What I want to say is, for those who have learned the hash function but only know how to use linear detection and then hash the concept of loading B to show their high-end people, you should go home to wash and sleep. A really clever hash function can be solved by means of linear detection and hash. This kind of thing will only appear in a variety of boring level tests at most.

By the way, I despise a penguin who recruited a linear probing problem on campus this year. Well, it's almost the same as writing this article. I don't want to talk too much about the nature of the article, so let's just do it!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hash Functions and linear detection and re-partitioning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hash Functions and linear detection and re-partitioning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support