Hash Functions and linear detection and re-partitioning

Source: Internet
Author: User

As we know, generally, the URLs of many webpages are extremely long. Do you really need to allocate such lengthy space for each webpage to store them, we believe that these stupid behaviors that let the server crash will not be adopted by Internet giants, so they will choose to use Hash Functions to store these addresses.

For a general linear table, the relative positions of records in the structure in the tree are random, that is, there is no definite relationship with the record keywords. Therefore, when searching records in the structure, you must compare them with keywords. This type of search method is based on comparison. The search efficiency depends on the number of comparisons in the search process. Ideally, you can directly find the desired record. Therefore, you must establish a definite correspondence between the storage location of the record and Its keywords, match each keyword with a unique storage location in the structure.

The elements in the hash table are determined by the hash function. Using the key word K of the data element as the independent variable, the calculated value is the storage address of the element through a certain functional relationship (called a hash function), which is expressed as: ADDR
= H (key ).

The processing of the hash function conflicts with the hash function. In a hash table, different keyword values correspond to the same storage location. That is, the key word k1 = k2, but h (K1) = H (K2 ). A uniform hash function can reduce conflicts, but cannot avoid conflicts. After a conflict occurs, it must be resolved; that is, you must find the next available address.

To handle such conflicts, we can use the following methods:

1. Zipper Method

Pulling a dynamic linked list instead of a static sequential storage structure can avoid hash function conflicts. However, the disadvantage is that the linked list design is too cumbersome and increases programming complexity. This method can completely avoid conflicts between hash functions.

2. Multi-Hash Method

Two or more types of hash functions can be designed to avoid conflicts, but the probability of conflicts is still there. The better or more function design can minimize the probability (unless the character is too bad, otherwise, it is almost impossible to conflict ).

3. Open address Method

There is a formula for the open address method: HI = (H (key) + DI) mod m I = 1, 2,..., K (k <= m-1)

M indicates the table length of the hash table. Di is the incremental sequence when a conflict occurs. If the di value may be 1, 2, 3, M-1, it is called linear detection and then hashed.

If Di is set to 1, after each conflict, move one position backward. if the di value may be 1,-,-9,-9, 16,-16 ,... K * k,-K * K (k <= m/2)

It is called secondary detection and then hashed. If the di value may be a pseudo-random series. It is called pseudo-random detection and then hashed.

4. domain creation

Assume that the value of the hash function is [0 M-1], then set the vector hashtable [0 .. m-1 the storage space vector overtable [0 .. v] used to store conflicting records.


The following is an example of a simple hash function. In this example, we use the P. J. Weinberger hash algorithm:

// P. J. Weinberger Hash unsigned int PJWHash(char *str){    unsigned int BitsInUnignedInt = (unsigned int)(sizeof(unsigned int) * 8);    unsigned int ThreeQuarters    = (unsigned int)((BitsInUnignedInt  * 3) / 4);    unsigned int OneEighth = (unsigned int)(BitsInUnignedInt / 8);    unsigned int HighBits = (unsigned int)(0xFFFFFFFF) << (BitsInUnignedInt                                                - OneEighth);    unsigned int hash    = 0;    unsigned int test    = 0;     while (*str)    {        hash = (hash << OneEighth) + (*str++);        if ((test = hash & HighBits) != 0)        {            hash = ((hash ^ (test >> ThreeQuarters)) & (~HighBits));        }    }     return (hash & 0x7FFFFFFF);}  

I believe that after reading the above program, many people will say, what is this spam algorithm? This is inexplicable. Indeed, in hash functions, this is an algorithm with low efficiency. For effective hash data storage, it can be completed without complex Address Allocation. Therefore, to truly complete a hash function suitable for your file storage, you must not generalize it.

This article is almost the same. Maybe you will say why linear detection is not mentioned and then hashed out. What I want to say is, for those who have learned the hash function but only know how to use linear detection and then hash the concept of loading B to show their high-end people, you should go home to wash and sleep. A really clever hash function can be solved by means of linear detection and hash. This kind of thing will only appear in a variety of boring level tests at most.

By the way, I despise a penguin who recruited a linear probing problem on campus this year. Well, it's almost the same as writing this article. I don't want to talk too much about the nature of the article, so let's just do it!




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.