[Data structure]-hash table)

Source: Internet
Author: User
There is no new originality here. This part of content is mainly based on the content of "Software Designer tutorial.
I want to emphasize a data structure, with a hash. It is designed from the perspective of quick access, and is also a typical "space for Time" approach. As the name suggests, this data structure can be understood as a linear table, but the elements in it are not closely arranged, but may have gaps. That is to say, for example, we store 70 elements, but we may have applied 100 elements for these 70 elements. 70/100 = 0.7, which is called a load factor. This is also the purpose of "quick access. We arrange storage locations for each element based on the fixed function H, which distributes the results randomly and evenly as much as possible, so that we can avoid linear searches of traversal properties to achieve fast access. However, this randomness will inevitably lead to a conflict. The so-called conflict means that the addresses of the two elements are the same through the hash function H, so these two elements are called "Synonyms ". This is similar to 70 people going to a restaurant with 100 chairs. The computing result of the hash function is a storage unit address. Each storage unit is called a bucket ". If a hash table has m buckets, the value range of the hash function should be [0 M-1].
Resolving conflicts is a complex problem. The conflict mainly depends on: (1) hash function. The values of a good hash function should be evenly distributed as much as possible. (2) Conflict handling methods. (3) load factor size. Too large is not necessarily good, and the waste of space is serious, the load factor and the hash function are linked.
Solution to the conflict:
(1) linear probing: after a conflict, the linear forward test finds the nearest empty position. The disadvantage is accumulation. During access, words that are not synonyms may also be in the probe sequence, affecting efficiency.
(2) double hash function method: After the position d conflict, use another hash function to generate a number c that interacts with the size m of the hash table bucket, test (d + n * c) % m in sequence to make the exploration sequence skip distribution.

In the following example, a hash table is created for a total of 32 keywords in C language. For simplicity, the capacity of each bucket is 1 and the load factor is 0.7. Therefore, the hash table size is 32/0. 7 = 47. The double hash function resolves the conflict. The base number of the compensation function H2 is 43 and the number of the 47 mutex is 43.
The hash function H1 is used to take each of the three letters as a segment. Each letter occupies one byte, fold and accumulate, and the remainder algorithm is used. H2 is composed of two letters.

Code
# Define P1 47
# Define P2 43
/* Compute Hash */
Int GetHashCode (char * key, int iSection, int iBase, int offset)
{
Long k = 0, d;
Int c;
While (* key)
{
For (d = 0, c = 0; * key! = '\ 0' & c <iSection; c ++)
{
D = (d <8) + (* key ++ );
}
K + = d;
}
Return (k % iBase + offset );
}

/* Double Hash Function */
Int H1 (char * key)
{
Return GetHashCode (key, 3, P1, 0 );
}
Int H2 (char * key)
{
Return GetHashCode (key, 2, P2, 1 );
}

32 keywords. Note that volatile may be a small number of keywords, indicating that the variable is variable-prone and the compiler is prohibited from optimizing the value. Code
Char tbl [N] [LEN];
Char * kWord [] =
{
"Auto", "break", "case", "char", "const ",
"Continue", "default", "do", "double", "else ",
"Enum", "extern", "float", "for", "goto ",
"If", "int", "long", "register", "return ",
"Short", "signed", "sizeof", "static", "struct ",
"Switch", "typedef", "union", "unsigned", "void ",
"Volatile", "while"
};

Operation Code for storing the hash table. The search code is similar. Use the count array to record the number of conflicts at each position. Code
For (I = 0; I <sizeof (kWord)/sizeof (kWord [0]); I ++)
{
Pos = H1 (kWord [I]);
C = H2 (kWord [I]);
While (tbl [pos] [0]! = '\ 0' & strcmp (tbl [pos], kWord [I])
{
Count [pos] ++;
Pos = (pos + c) % N;
Printf (", % d", pos );
}
Strcpy (tbl [pos], kWord [I]);
}

------------------------------------------------------------
During storage, the exploration sequence of each element is as follows (the number is the index value of the array element ):
Auto: 12
Break: 11
Case: 40
Char: 41
Const: 33
Continue: 45
Default: 37
Do: 2
Double: 15
Else: 25
Enum: 30
Extern: 34
Float: 23
For: 4
Goto: 41,37, 33,29
If: 4, 26
Int: 39
Long: 23,13
Register: 28
Return: 16
Short: 13,25, 37,2, 14
Signed: 25, 0
Sizeof: 0, 43
Static: 18
Struct: 38
Switch: 15,18, 21
Typedef: 26,7
Union: 6
Unsigned: 11, 38, 18, 45, 25, 5
Void: 7,31
Volatile: 33,17
While: 32

It can be seen that the longest probe sequence is 6 times.
-----------------------------------------------------
The distribution of each element in the hash is as follows:
0: signed 2: do 4: for 5: unsigned 6: union
7: typedef 11: break 12: auto 13: long 14: short
15: double 16: return 17: volatile 18: static 21: switch
23: float 25: else 26: if 28: register 29: goto
30: enum 31: void 32: while 33: const 34: extern
37: default 38: struct 39: int 40: case 41: char
43: sizeof 45: continue

The number of conflicting locations is as follows:
Count [0] = 1 count [2] = 1 count [4] = 1 count [7] = 1 count [11] = 1
Count [13] = 1 count [15] = 1 count [18] = 2 count [23] = 1 count [25] = 3
Count [26] = 1 count [33] = 2 count [37] = 2 count [38] = 1 count [41] = 1
Count [45] = 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.