Java -- HashMap implementation principle, self-implemented simple HashMap, javahashmap
There are arrays and linked lists in the data structure to store data. However, the array storage range is continuous, addressing is easy, and insertion and deletion are difficult. The linked list space is discrete, therefore, addressing is difficult, and insertion and deletion are easy.
Therefore, combining the advantages of the two, we can design a Data Structure-hash table, which is convenient for addressing, inserting, and deleting. In java, the implementation of hash tables is mainly HashMap. It can be said that HashMap is one of the most used classes in java development.
The underlying layer of HashMap is actually an array of linked lists. The code is
transient Entry[] table;
The table here is actually an array of linked lists. Because our data is binary, HashMap defines an internal class Entry, which contains two attributes: key and value. Such a one-dimensional linear array can store two values. At the same time, the Entry is a linked list, so there is also an Entry next attribute, which points to the next node.
When storing put:
Calculate the hash of the key, use table [hash] to obtain the chain table, and then traverse the chain table. If a key in the chain table matches equals, the value is replaced; if not, the value is inserted at the end of the linked list.
int h = hash(key);Entry e = table[h];
For (Entry <K, V> e = table [I]; e! = Null; e = e. next) {Object k; // if the key already exists in the linked list, replace it with the new value if (e. hash = hash & (k = e. key) = key | key. equals (k) {V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue ;}}
In get, the Entry e of the linked list is obtained in the same way. Then, the elements of the linked list are traversed.
for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null;
HashMap performance optimization:
HashMap optimizes the performance by reducing hash conflicts (calculate the same hash with different keys). The more hash conflicts, the longer the addressing time required from the linked list.
1. Reduce hash conflicts by calculating hash values:
This hash method effectively reduces hash conflicts: (I really don't understand it! Everybody refer to http://zhangshixi.iteye.com/blog/672697)
static int hash(int h) { h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } static int indexFor(int h, int length) { return h & (length-1); }
I wrote a very simple way to calculate the hash value, barely using:
Math.abs(o==null?0:o.hashCode()) % length
2. automatic resizing
When there are more and more elements in HashMap, the probability of hash conflicts increases, because the length of the array is fixed. Therefore, you need to resize the array.
When the number of elements in the HashMap exceeds the array size * loadFactor (default value: 0.75), the array is expanded. Create a new table to map the original table to the new table.
During expansion, each element is traversed, its hash value is recalculated, and then added to the new table.
In general, the size of the expanded array is twice the size of the original array. This is a very performance-consuming operation. Therefore, if we have predicted the number of elements in HashMap, setting the initial capacity in advance will greatly improve its performance.
I have released my source code to github. You are welcome to download the source code.
Http://pan.baidu.com/s/1dFj2405
Https://github.com/xcr1234/my-java
Attach your own performance test results to barely accept them
There must be many shortcomings in this blog post and code. Please point them out! Or fork my code and provide valuable suggestions. Thank you!