The hash table is a series of disordered records stored in an orderly way to store, mainly to solve the chain list structure to find the efficiency problem.
We often use the hashtable structure, when encountering key-value pairs of storage, the use of Hashtable than ArrayList to find a high performance.
1) Hash Why does it have high performance for key-value lookups
Learn the data structure, should know, linear table and tree, record in the structure of the relative position is random, there is no explicit relationship between the record and the keyword, so in the search for records, a series of keyword comparisons, which is based on the comparison of the search method, in. NET (Array, Arraylist,list) These collection structures are stored in the above way.
For example, now we have a class of students data, including name, gender, age, school number and so on. If the data has
Name |
Gender |
Age |
School Number |
Tom |
Man |
15 |
1 |
John doe |
Woman |
14 |
2 |
Harry |
Man |
14 |
3 |
If we look up by name, suppose the lookup function Findbyname (string name);
1) Find "Zhang San"
just once in the first row.
2) find "Harry"
match in the first row, fail,
match in the second row, fail,
match in the third row, success
above two cases, analyze the best case separately , and worst case scenario, the average number of lookups should be (1+3)/2=2 times, i.e. 1/2 of the average number of lookups (Total records + 1).
Although there are some optimized algorithms that can increase the efficiency of the lookup sort, the complexity remains within the log2n range.
How to find more quickly. The effect we expect is to locate the record position at once, with a time complexity of 1 and the fastest. If we make a sequence number for each record, and then let them enter by the number, and we know what rules to number these records, if we look for a record again, we need to first calculate the number of the record through the rule, and then according to the number, in the recorded linear queue, Can easily find the record.
Note that the above description contains two concepts, one for the number of students in the rules, in the data structure, called the hash function, and the other is the order of the students arranged by the rules of the structure, called a hash table.
still take the above students as an example, assuming that the study number is the rule, the teacher has a rule table, in the row of seats in accordance with this rule to sort, find John Doe, first of all, the teacher will judge according to the rules, John Doe number 2, is in the seat of the 2nd position, go directly past, "John Doe, Haha, you son It's here. "
Look at the general flow:
from the above figure, you can see that the hash table can be described as two packages, a package for the record location number, another package for recording, and a set of rules to describe the relationship between records and numbers. How is this rule usually made?
a) Direct addressing method:
I mentioned in the previous article about GetHashCode () performance comparison, the shape of the data GetHashCode () function returned is the shape itself, is actually based on the method of direct addressing, For example, there is a set of 0-100 data that represents the age of the person
so the hash table with the direct addressing method consists of:
0 |
1 |
2 |
3 |
4 |
5 |
0 years old |
1 years old |
2 years old |
3 years old |
4 years old |
5 years old |
.....
Such a way of addressing, simple and convenient, applicable to the meta-data can be expressed in numbers or the original data has a distinct sequence of relations.
b) Digital Analysis method:
There is a set of data that is used to describe the date of birth of some people
Years |
Month |
Day |
75 |
10 |
1 |
75 |
12 |
10 |
75 |
02 |
14 |
Analysis, the first digits of the year and month are basically the same, resulting in a very high probability of conflict, and the latter three-bit difference is relatively large, so the use of the latter three-bit
c) The method of square take
Take the middle of the keyword squared as a hash address
d) Folding Method:
Divide the keyword into parts of the same number of bits, the last part can be different, and then go to the overlay of these parts and (take out carry) as a hash address, such as the data 20-1445-4547-3
OK
5473
+ 4454
+ 201
= 10128
Take carry 1, fetch 0128 for hash address
e) Take the remainder method
The remainder is a hash address when the keyword is removed by a number p that is not longer than the hash table length m. H (key) =key MOD p (p<=m)
f) Random number method
Select a random function that takes the random function value of the keyword to its hash address, which is the H (key) =random (key), where random is the stochastic function. This method is usually used when the length of the keyword is unequal.
In summary, the rule of a hash function is to make the keyword moderately dispersed into a specified size in a sequential structure, through some kind of conversion relationship. The more dispersed, the less time complexity is found later, the higher the spatial complexity.
2) Use hash, we pay what.
Hash is a typical space-time algorithm, such as an array of length 100, to find it, only need to traverse and match the corresponding record, from the spatial complexity, if the array is stored in byte type data, then the array occupies 100byte space. Now we use the hash algorithm, we said before the hash must have a rule, constraint key and storage location relationship, then need a fixed-length hash table, at this time, is still an array of 100byte, assuming we need 100byte to record the relationship between the key and position, Then the total space is 200byte