Source: Internet
Author: User

One. Basic concepts of hash functions and hash conflicts 1. Hash function: Hash method is also called hashing method, hash method and keyword address calculation method, etc., the corresponding tables become hash table.

Basic idea: First, establish a corresponding relationship f between the element K and the position p of the element, and make p=f (k), where F becomes a hash function.

When creating a hash table, the elements of the keyword K are deposited directly into the cell with the address F (k), and the hash function is used to calculate the storage location of the element K P=f (k).

When creating a hash table, the elements of the keyword K are deposited directly into the cell with the address F (k), and the hash function is used to calculate the storage location of the element K P=f (k).

2. Hash conflict: When the keyword collection is large, elements with different key values may be imaged to the same address of the hash table, i.e. K1!=k2, but F (K1) =f (K2), a phenomenon called hash conflict, in which the actual conflict is

Inevitably, conflicts can be reduced only by improving the performance of the hash function.

Two. The construction of hash functions the principle of the hash function is: The function itself is convenient to calculate, the calculation of the address distribution evenly (that is, arbitrary k,f (K) corresponding to the probability of different addresses). 1. Digital analysis: From the key if you know beforehand the keyword set, and the number of bits per keyword than the hash table address code number of digits, you can choose from the keyword distribution of a more uniform number of bits, constitute a hash address.

For example, there are 80 records, the keyword is a 8-bit decimal integer d1d2d3...d7d8, such as a hash table length of 100, the hash table address space is: 00~99.

Assuming that after analysis, the D4 and D7 in each keyword are evenly distributed, the hash function is: h (key) =h (D1D2D3...D7D8) =d4d7. For example, H (81346532) =43,h (81301367) = 06.

On the contrary, the assumption is analyzed, the D1 and d8 in each keyword are distributed extremely unevenly, D1 equal to 5,d8 equal to 2, at this point, if the hash function is: h (key) =h (D1D2D3...D7D8) =d1d8,

The address code for all keywords is 52, which is obviously undesirable.

2. Square Method: When it is not possible to determine which of the keywords in the distribution of more evenly, you can first find the keyword of the square value, and then the need to take the square value of the middle several as a hash address.

This is because: the square after the middle and each of the keywords are related, so the different keywords will be a higher probability to produce a different hash address.

Basic idea: First, establish a corresponding relationship f between the element K and the position p of the element, and make p=f (k), where F becomes a hash function.

When creating a hash table, the elements of the keyword K are deposited directly into the cell with the address F (k), and the hash function is used to calculate the storage location of the element K P=f (k).

When creating a hash table, the elements of the keyword K are deposited directly into the cell with the address F (k), and the hash function is used to calculate the storage location of the element K P=f (k).

2. Hash conflict: When the keyword collection is large, elements with different key values may be imaged to the same address of the hash table, i.e. K1!=k2, but F (K1) =f (K2), a phenomenon called hash conflict, in which the actual conflict is

Inevitably, conflicts can be reduced only by improving the performance of the hash function.

Two. The construction of hash functions the principle of the hash function is: The function itself is convenient to calculate, the calculation of the address distribution evenly (that is, arbitrary k,f (K) corresponding to the probability of different addresses). 1. Digital analysis: From the key if you know beforehand the keyword set, and the number of bits per keyword than the hash table address code number of digits, you can choose from the keyword distribution of a more uniform number of bits, constitute a hash address.

For example, there are 80 records, the keyword is a 8-bit decimal integer d1d2d3...d7d8, such as a hash table length of 100, the hash table address space is: 00~99.

Assuming that after analysis, the D4 and D7 in each keyword are evenly distributed, the hash function is: h (key) =h (D1D2D3...D7D8) =d4d7. For example, H (81346532) =43,h (81301367) = 06.

On the contrary, the assumption is analyzed, the D1 and d8 in each keyword are distributed extremely unevenly, D1 equal to 5,d8 equal to 2, at this point, if the hash function is: h (key) =h (D1D2D3...D7D8) =d1d8,

The address code for all keywords is 52, which is obviously undesirable.

2. Square Method: When it is not possible to determine which of the keywords in the distribution of more evenly, you can first find the keyword of the square value, and then the need to take the square value of the middle several as a hash address.

This is because: the square after the middle and each of the keywords are related, so the different keywords will be a higher probability to produce a different hash address.

For example: We use the ordinal number of the letter in the alphabet as the internal code for the English letter. For example, the internal encoding of K for 11,e internal code for 05,y is 25,

The internal code for a is a, and B's internal code is 02.

The internal code that makes up the keyword "Keya" is 11052501, and in the same vein we get the internal encoding of the keyword "Kyab", "Akey", "Bkey".

After the keyword is squared, remove the 7th to 9th digits as the keyword hash address, as shown in Figure 8.23.

Keywords internal encoding of the square value H (k) keyword hash address

Keya 11050201 122157778355001 778

Kyab 11250102 126564795010404 795

Akey 01110525 001233265775625 265

Bkey 02110525 004454315775625 315

3. Piecewise Superposition Method: This method is a hash table address to the number of digits in the same number of bits (the last part can be shorter), and then add these parts, discard the highest carry result is the hash address of the keyword.

The concrete methods include folding method and displacement method. The Shift method is the addition of each part of the lower position, the folding method is folded from one end to the other end along the dividing line (odd-numbered segments are positive sequence, even-numbered segments are in reverse order), and then add each section.

For example: key=12360324711202065, the hash table length is 1000, then should divide the key word into 4 bits, in this place the lowest two digits 65, respectively carries on the shift superposition and the folding superposition, obtains the hash address is 105 and 907

(a) Shift overlay (b) Folding Overlay

1 2 3 1 2 3

6 0 3 3 0 6

2 4 7 2 4 7

1 1 2 2 1 1

0 2 0 0 2 0

—————— ——————

1 1 0 5 9 0 7

4. Excluding remainder method: Assuming that the hash table is m,p to the largest prime number less than or equal to m, the hash function is

H (k) =k% p, where% is modulo p remainder operation.

For example, if the hash element is known to be (18,75,60,43,54,90,46) and the table length is m=10,p=7, there is

(h) =18% 7=4 H (a) =75% 7=5 H (60) =60% 7=4

H (a) =43% 7=1 H (=54)% 7=5 H (90) =90% 7=6

H (46) =46% 7=4

There are more conflicts at this time. To reduce conflicts, it is desirable to have larger m and P values, such as m=p=13, with the following results:

(h) =18% 13=5 H (a) =75% 13=10 H (60) =60% 13=8

H (a) =43% 13=4 H (=54)% 13=2 H (90) =90% 13=12

H (46) =46% 13=7

Three. Ways to handle Conflicts: 1. Open approach (re-hashing): basic idea: When the hash address p=h (key) of a keyword has a conflict, a second hash address is generated based on P P1, if P1 is still in conflict, and then based on p, generates another hash address P2 ..., Until a conflicting hash address pi is found, the corresponding element is deposited into it.

This method has a common form of a hash function:

Hi= (H (key) +di)% m i=1,2,...,n

where H (key) is a hash function, M is the table length, and di is called an increment sequence. The increment sequence is taken in different ways, and the corresponding hashes are different.

1. Linear Detection and Dii=1,2,3,...,m-1:

When a conflict occurs, the next cell in the table is viewed sequentially until an empty cell is found or the entire table is searched.

2. Two probe re-hashing: Di=12,-12,22,-22,...,k2,-k2 (K<=M/2)

When the conflict occurs, the table of the left and right to carry out a jump-type detection, more flexible.

3. Pseudo-random detection and di=: A sequence of pseudorandom numbers.

When implemented, a pseudo random number generator (such as i= (i+p)% m) should be established, and given a random number to do the starting point.

4. Example: known hash table length m=11, hash function is: h (key) = key% 11, then H () =3,h (60) = 5, assuming the next keyword is 69, then H (69) = 3, and 47 conflict.

A): If you use a linear probe to hash the conflict, the next hash address is h1= (3 + 1)% 11 = 4, still conflicting, and then find the next hash address is h2= (3 + 2)% 11 = 5, or a conflict,

Keep looking for the next hash address is h3= (3 + 3)% 11 = 6, no longer conflicting, 69 is filled in unit 5th. 0 1 2 3 4 5 6 7 8 9 10

47 26 60 69

B: If you use two probes to hash the conflict, the next hash address is h1= (3 + 12)% 11 = 4, still conflicting, and then the next hash address is h2= (3-12)% 11 = 2, no longer conflicting,

Insert 69 into Unit 2nd. 0 1 2 3 4 5 6 7 8 9 10

69 47 26 60

c): If you use pseudo random detection and hash to deal with the conflict, and pseudorandom number sequence: 2,5,9, ..., the next hash address is h1= (3 + 2)% 11 = 5, still conflict, and then find the next hash address

For h2= (3 + 5)% 11 = 8, no more conflicts at this time, filling 69 into unit 8th. 0 1 2 3 4 5 6 7 8 9 10

47 26 60 69

2. Again Hashifa: This approach is to construct several different hash functions at the same time:

HI=RH1 (Key) i=1,2,...,k

When the hash address HI=RH1 (key) conflicts, the HI=RH2 (key) is computed ... until the conflict is no longer generated. This method is not easy to generate aggregation, but increases the computational time.

3. Zipper Method (HashMap conflict handling Mode):

Basic idea: The elements of all hash address I are formed into a single linked list called a synonym chain, and the head pointer of a single list is present in the first unit of the hash table, thus locating, inserting, and deleting the main

In a chain of synonyms. The link address method applies to situations where insertions and deletions are frequently performed.

For example: A list of known keywords (32,40,36,53,16,46,71,27,42,24,49,64), hash table length 13, hash function: H (key) = key% 13,

The result of handling the conflict with the chain address method is shown in Figure 8.27:

Location Entry

0

1-->-->--> 53

2

3-->--> 42

4

5

6-->--> 71

7

8

9

--> 36--> 49

--> 24

--> 64

&NBSP;&NBSP

The average lookup length of this example asl= (1*7+2*4+3*1)/ 13=1.38 4. Create a public overflow area: The basic idea of this method is to divide the hash table into the basic table and the overflow table, and all elements that conflict with the basic table are filled in the overflow table & nbsp;