Objective
A hash table is a data structure that holds key-value pairs, where values are used to store the data that we really need, and the key is to find the value. Hash table Ideally, only need a hash calculation can find value data, but usually we do not need to spend a lot of extra space to pursue the slightest search speed (to pursue the low hash conflict rate, it is necessary to enlarge the hash table), we would like to allow space and time to achieve a certain balance, This can be solved by adjusting the hash function (filling factor).
Reload factor = The number of records in the table/the length of the hash table, the smaller the filling factor, the more empty cells in the table, the less likely the conflict is, and the greater the filling factor, the greater the likelihood of a conflict The more time it takes to find (the default reload factor for HashMap in the JDK is 0.75).
hash function
When we use a string as a key, we can use it as a large integer, using the retention method of the remainder. We can take each character that makes up a string and then hash it, for example
public int GetHashCode (string str) { char[] s = Str.tochararray (); int hash = 0; for (int i = 0; i < s.length; i++) { hash = S[i] + (* hash); } return hash;}
The above hash value is a method that Horner computes a string hash, with the formula:
h = s[0] 31l–1 + ... + s[l–3] 312 + s[l–2] 311 + s[l–1] 310
For example, to get the hash value of the string "call", the Unicode corresponding to 99,a for the string C corresponds to 97,l for Unicode of 108, so the hash value of the string "call" is 3045982 = 99 313 + 97 312 + 108 311 + 108 310 = 108 + 31 · (108 + 31 · (97 + 31 · (99)))
For computers, multiplication is a fairly expensive calculation, 31*hash is equivalent to Hash<<5-hash, and bit operations are more comfortable than multiplication.
Avoid hash collisions
There are many ways to avoid having conflicts, but the zipper method (separate chaining with linked lists) is commonly used in engineering.
For example, John Smith and Sandra Dee will be mapped at 152 after the hash, which creates a conflict, which is resolved by concatenating the conflicting entry (Key-value) (there are many ways to concatenate, and the simplest is a single-linked list, Some of the more complex can be two-fork balanced trees, as long as the data structures that can be efficiently searched are available.
Data structure and algorithm-hash table