The HASHMAP data structure is a hash table (hash table)
HashMap
1) Array: Continuous address, Find Fast, but take up too much memory
2) linked list: Address is not contiguous save space, look for arrays slower, delete and add faster
The advantages of two data structures are assembled
The purpose of the array is to map the address using the hash function based on the key keyword, and this address is stored in the array
The purpose of the linked list is to resolve the conflict because different keywords may be equal according to the hash function mapping address, speaking the latest insert header
hash function :
1), direct addressing method
The value of a linear function that takes a keyword or keyword is a hash address, which is:
H (key) = key or H (key) = A * key + b
Where A and B are constants.
(2), digital analysis method
(3), square value method
The middle of the keyword squared is the hash address.
(4), Folding method
Divide the keywords into parts with the same number of bits (the last part can be different), and then take the overlay of those parts and (rounding up) as the hash address.
(5), in addition to the remainder of the law
The remainder is the hash address of the keyword by a number p that is not larger than the hash table length m, i.e.:
H (key) = key MOD P p≤m
(6), random number method
Select a random function that takes the random function value of the keyword as its hash address, i.e.:
H (key) = random (key)
Where random is the stochastic function.
Handling conflicts
The same hash address may be obtained for different keywords, that is, Key1≠key2, and H (key1) = h (key2), which is called a conflict. A keyword with the same function value is called a synonym for the hash function.
In general, a hash function is a compressed image, which inevitably creates a conflict, so when you create a hash table, you not only have to set a good hash function, but also set a way to handle the conflict.
Common methods of dealing with conflicts are:
(1), open addressing method
hi = (H (key) + di) MOD m i =1,2,..., K (k≤m-1)
where H (key) is a hash function, M is a hash table length, di is an incremental sequence, the following three methods can be used:
1), Di =,..., m-1, called linear detection and re-hashing;
2), Di = 12,-12,22,-22,32,..., ±k2 (K≤M/2), called two-time detection and re-hashing;
3), Di = pseudo-random number sequence, called pseudo-random detection re-hash.
(2), re-hash method
hi = RHI (key) i =,..., k
RHI are different hash functions.
(3), Chain address method
Stores all the data elements of a synonym in the same linear list. Assuming that the hash address produced by a hash function is on the interval [0,m-1], a pointer-type vector void *vec[m] is established, and the initial state of each component is a null pointer. All data elements where the hash address is I are inserted into the linked list with the header pointer Vec[i]. The insertion position in the linked list can be in the table header or footer, or in the middle of the table, to keep synonyms sorted by keyword in the same linear list.
(4) Establishing a public overflow area
In general, the method of dealing with the hash function and the chain address method with the addition of the residue remainder method
struct Hash_node {
int count;
struct Hash_node *next;
};
static int hash (int num)
{
return num% LEN;
}
static void collision (struct Hash_node *vec[], int elem, struct Hash_node *new)
{
if (vec[elem] = = NULL)
Vec[elem] = new;
Else
{
New--next = Vec[elem];
Vec[elem] = new;
}
}
Hash is a hash function, and the collision function is used to handle conflicts
Data structures and algorithms for HashMap