The single hash algorithm of hash function solves the conflict problem

Source: Internet
Author: User

1. Questions

The problem with the simple hashing function algorithm

There are 10 non-negative integers, with no more than 20 storage units to store, how to store these 10 numbers, so that when searching for one of the number, in the storage unit to find the least number of times?

The problem is similar to, there are 10 ball with number, put to the number of {0, 1, 2, ..., 19} A total of 20 boxes, each box up to one, ask how to put, so that the minimum number of times to open the box, know any ball where the box number?

2. Analysis

In the simple hashing function algorithm, it has been analyzed that the search time can be reduced to a constant range as long as the conflict problem can be resolved.

Idea: When a number is in conflict, find an empty box that is not occupied to put the ball

Haha, the idea is quite simple, seems to have a reasonable appearance, the key question is: How to know the empty box and the ball number of the corresponding relationship?

Here is the method described in the 5th chapter of the elementary number theory and its application, where the description of the method is abbreviated in some places, and there is no way of finding it, I have done it here and written a Python code that is easy to understand and apply

3. single hash function resolves conflict issues

3.1 Methods of thinking:

Set the number of boxes is m, the total number of balls is n

When a number has a conflict, then see if the next box (k+1) of this conflicting box (k) is empty, if it is, then put in, if not, then continue to look at the next (k+2), always add to M, greater than M has not been found, then to {0, 1, ..., k} is the box in front of K to find empty box:

If you have the ball {0, 1, 30}, still set m = 10, when 0 and 1 respectively into the corresponding 0, 1th box, when to put in 30 o'clock, f (30) = 30 10 = 0, No. 0 box is occupied, conflict, and then see the next box 1th, found that 1th is also occupied, and then see the next 2nd box, found to be empty, Put it

When another ball is 40 o'clock, similar, will find 0, 1, 2nd boxes are occupied, then need to put in the box 3rd

So then there's a real number 3rd, and we'll find box 3rd is already occupied, so we can only put it in box 4th.

Finally {0, 1, 30, 40, 3} are put in the following situations:

box 1 number 0 1 2 3 4 5 6 7 8 9
Ball number 0 1 30 40 3

Note: This is because the order of the ball is different, the position is not consistent, such as the ball number is arranged in {0, 1, 3, 30, 40}, then the 3rd ball will be in position 3rd.

∵ and M >= N

∴ for any x, you can always find an empty box to put the ball

3.2 Mathematical Expressions

Set H0 (k) ≡k (mod m), k = Ball number, M = number of boxes, where "≡" means congruence, not equal, H (k) is the remainder of M

HJ (k) ≡h0 (k) + j,0<= J < M, HJ (k) indicates the box number where the ball was placed after the J-conflict occurred

∴hj+1 (k) ≡h0 (k) + (j + 1) ≡hj (k) + 1

That is, when there is a conflict in the position of the HJ (k), you can see if the next box is empty.

∵ when k = m-1, k≡0 (mod m), according to the algorithm of the modulo

∴ its next position k + 1 = 0, which means returning to box No. 0 to start looking for an empty box

3.3 How to find the box where the ball K is located

The method is the same as when the ball is put, look for H0 (k) ≡k (mod m), if equal, OK

If not, then it is possible that in the next box, the formula in 3.2 will be recursively followed by the corresponding box.

3.4 Worst-case complexity

Assuming that there are 9 balls already occupying {0, 1, 2, ..., 8} before 8 boxes, and the last ball k9≡k0 (mod m), you need to add the No. 0 position +1 to 9 to find the non-conflicting box, that is, the worst to open 10 boxes to find, the worst complexity = N,n is the number of balls,

Haha, it looks like a half-day effort is not as good as method 2.2 in the simple hash function algorithm.

However, it is also obvious that method 2.2 in the simple hashing function algorithm requires the order of the balls in the 2nd box, and if a new ball is added, it should be arranged again.

3.5 Python Code and test results

#mod = m, h (k) = N m, HJ (k) = (h (k) + j)% MdefSinglehash (numlist):ifLen (numlist) >m:Print "num list len is:", Len (numlist),"is big than mod:", M; returnNone; Hashlist= [None] *m;  forKinchnumlist: forJinchRange (m): Hj_k= (k + j)%m; ifNone = =Hashlist[hj_k]: Hashlist[hj_k]=K;  Break; Else:            Print "num list is too big, hash list is overflow"; returnNone; returnhashlist;defChecksinglehash (Numlist, hashlist):ifNone = =hashlist:return;  forKinchnumlist: forJinchRange (m): Hj_k= (k + j)%m; #Check the key is equal with the one on hash list, if not check the next one            ifHashlist[hj_k] = =K:ifJ >0:PrintK"Find", J+1," Times";  Break; Else:                PrintK"Conflict with", Hashlist[hj_k]; Else:            PrintK"Is isn't find ..."; returnFalse; returnTrue;

When tested, M = 19 is set

The test sequence is: Numlist = [0, 1, 2, 7, 9, 15, 19, 20, 77, 38], in order to test the conflict, deliberately set a number of conflicts, in order to reduce useless output, for no conflict will not play, the test results are as follows:

As you can see, 38 because of multiple collisions, you need to find 7 times to find

4. hash algorithm conflict problem

Whether it is the hash algorithm in the simple hashing function algorithm or the single hash algorithm, if there is no conflict, only once can find the ball box, so if the algorithm conflict probability is low, then the average time complexity is more and more close to the constant.

4.1 Probability of a simple hashing algorithm conflict

Hashing algorithm in simple hash function algorithm as long as a ball K-mode m (k%m) is already in the box, is bound to conflict, set K in the first set of boxes, then all f (x) = k + IM, (i∈{0, 1, 2, ...}), there will be conflicts, the probability of conflict is very high.

Set the largest ball number to S, then a total of (s/m) to meet the f (x) = k + Im ball number (when s is very large, can be ignored in addition to the endless parts)

The first of the simple hash function algorithms is the probability of a ball satisfying f (x) = k + im:

Set a total of n balls, just 2 balls to meet F (x), the probability of the 2nd ball: When S is very large, 1 can be ignored, then about (n-1)/M

So just take 2 balls to meet the probability of f (x) = (n2-1)/m2

Similarly, a probability of 3 balls satisfying f (x) = N (n-1) (n-2)/m3

As you can see, when N and m approach, the probability of collisions is getting closer to 100%, so be sure to make M > N, and as much as possible.

Back to the setting in our question, n = ten, M = 20, then the probability of a collision is 2 when the ball is satisfied = 25%

When there are just 3 balls to meet, the probability of the collision = 9%

Total conflict probability > 34%, conflict probability is very high

4.2 Collision probability of single hash algorithm

In a single hash algorithm, because the first time a conflict is set to HJ (K), the 2nd time a conflict must have a HJ (k) + 1 ball is already in the box, so the probability of more than 2 collisions will be reduced, of course, in this case it is necessary to view 2 boxes.

The 3rd ball must be f (x) or F (x) +1 to be in conflict with the probability of 4.1 being removed while 2 are satisfied with f (x) = (n2-1)/m2

The 3rd ball is the probability of f (x) = (n-2)/M

The 3rd ball is the probability of f (x) + 1 = (n-2)/M,

Probability of conflict = 2n (n-1) (n-2)/m3

Other than the probability of more than 3 ball collisions are not calculated compared to this and simple hashing algorithm just have 2 of the ball conflict comparison:

When n = ten, M = 20, the probability of a single hash algorithm conflict = 18%

It can be seen that the single hash algorithm is superior to the simple hashing algorithm in this case.

The single hash algorithm of hash function solves the conflict problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.