To know whether an element is in an array or linked list, it can only be compared in front and back, whether it is an array or a linked list, its query performance is weak. In the two-fork sort tree, the data is also sorted for binary lookups, reducing the time complexity from O (n) to O (LG N).
The root of the problem is that we have no way to find where it is stored directly from an element.
Is there any way to eliminate the contrast? A hash table is a solution to query problems.
What is a hash table and hash function
In layman's terms, a hash table is a data structure that takes a keyword to retrieve the element by mapping the keyword to a location in the table, which is mainly using the hash function.
Because the key types required by different requirements are inconsistent, it could be an int, possibly a string, or any other object. But the memory address can not be addressed with these objects, so the function of the hash function is to convert these objects in a reasonable way to the int type, so as to complete the storage of data. The hash function needs to ensure that the same key is calculated and the result is always the same.
This process is like we use pinyin to look up a dictionary. If you want to look up a word, we will not see from the first page to the last page, it will take a long time, but according to its pronunciation in the phonetic table first find the corresponding number of pages, directly to the corresponding page. Of course, because there are many Chinese characters that are pronounced consistently, we may still need to compare each other, but this is a little too complex.
The process of a hash table is consistent with the above example, and we position it directly from the hash function based on the key of the element. However, similar to the pronunciation of many Chinese characters, there will be a lot of key through the hash function to locate the same result, this is the so-called hash collision.
Ways to solve the collision of halons
A more general approach is to use arrays + linked list combinations. When a hash collision occurs, the data at that location is linked by the way the list is linked:
In JDK1.7 and previous versions, HashMap's storage structure was consistent, and red-black trees were added to optimize after JDK1.8.
Advantages and disadvantages of hash table
The hash table is an optimized storage idea, and the specific storage elements are still other data structures.
A well-designed hash table that combines the advantages of both arrays and lists provides good performance in both insertion and lookup.
Poorly designed hash tables are likely to have more hash collisions, resulting in a chain list that is too long, making the hash table more like a linked list. In addition, when the data volume is very large, in order to prevent the chain list too long, it is necessary to expand the array, when it involves the copying of arrays, the impact on the performance is very serious.
So it is necessary to have a good forecast of the possible situation in advance in order to really play the advantage of the hash table.
Source code: Java Collection source code: Hash Table (ii)