A deep understanding of the implementation principles of HashMap in Java
HashMap inherits from the abstract class AbstractMap. The abstract class AbstractMap implements the Map interface. The diagram is as follows:
Map in Java The interface allows us to use an object as the key, that is, we can use an object as the key to find another object.
Before discussing the implementation principle of HashMap, we first implemented a SimpleMap class, which inherits from the AbstractMap class. The specific implementation is as follows:
Import java. util. *; public class SimpleMap
Extends AbstractMap
{// Keys stores all the keys in the private List
Keys = new ArrayList
(); // Values stores all values in the private List
Values = new ArrayList
();/*** This method gets all key-value pairs in Map */@ Overridepublic Set entrySet () {Set
> Set = new SimpleSet
> (); // The size of keys and the size of values should always be the same size as Iterator
KeyIterator = keys. iterator (); Iterator
ValueIterator = values. iterator (); while (keyIterator. hasNext () & valueIterator. hasNext () {K key = keyIterator. next (); V value = valueIterator. next (); SimpleEntry
Entry = new SimpleEntry
(Key, value); set. add (entry) ;}return set ;}@ Overridepublic V put (K key, V value) {V oldValue = null; int index = this. keys. indexOf (key); if (index> = 0) {// The key already exists in keys, and the valueoldValue corresponding to the key is updated = this. values. get (index); this. values. set (index, value);} There is no key in else {// keys. Add key and value as key-value pairs. this. keys. add (key); this. values. add (value) ;}return oldValue ;}@ Overridepublic V get (Object key) {V value = null; int index = this. keys. indexOf (key); if (index> = 0) {value = this. values. get (index) ;}return value ;}@ Overridepublic V remove (Object key) {V oldValue = null; int index = this. keys. indexOf (key); if (index> = 0) {oldValue = this. values. get (index); this. keys. remove (index); this. values. remove (index) ;}return oldValue ;}@ Overridepublic void clear () {this. keys. clear (); this. values. clear () ;}@ Overridepublic Set keySet () {Set
Set = new SimpleSet
(); Iterator
KeyIterator = this. keys. iterator (); while (keyIterator. hasNext () {set. add (keyIterator. next () ;}return set ;}@ Overridepublic int size () {return this. keys. size () ;}@ Overridepublic boolean containsValue (Object value) {return this. values. contains (value) ;}@ Overridepublic boolean containsKey (Object key) {return this. keys. contains (key) ;}@ Overridepublic Collection values () {return this. values ();}}
When the subclass inherits from the AbstractMap class, we only need to implement the entrySet and put methods in the AbstractMap class. The entrySet method is used to return a Set of all key-value pairs of the Map, put is to put a key-value pair into the Map. As you can see, the above Code not only implements the entrySet and put methods, but also overwrites the get, remove, clear, keySet, values, and many other methods. In fact, we only need to rewrite the entrySet and put methods to run the class correctly. Why should we rewrite the remaining methods? The AbstractMap method performs a lot of processing operations. Many methods in Map are implemented in AbstractMap, and many methods depend on the entrySet method. For example, the values method in the Map interface allows us to return the Collection of all values in the Map. Let's take a look at the implementation of the values method in AbstractMap:
public Collection
values() { if (values == null) { values = new AbstractCollection
() { public Iterator
iterator() { return new Iterator
() { private Iterator
> i = entrySet().iterator(); public boolean hasNext() { return i.hasNext(); } public V next() { return i.next().getValue(); } public void remove() { i.remove(); } }; } public int size() { return AbstractMap.this.size(); } public boolean isEmpty() { return AbstractMap.this.isEmpty(); } public void clear() { AbstractMap.this.clear(); } public boolean contains(Object v) { return AbstractMap.this.containsValue(v); } }; } return values; }
As you can see, there are a lot of code. The basic idea is to first generate a Set containing all key-value pairs through entrySet, and then obtain the value through iteration. Generating a Set that contains all key-value pairs requires overhead. Therefore, we overwrite the values method in our own implementation and return this. values directly to our values field. Therefore, the purpose of rewriting most methods is to make the implementation of methods faster and more concise.
When rewriting the entrySet method, you need to return a Set containing all the key-value pairs of the current Map. A key-value pair is a type. Map. Entry must be implemented for all key-value pairs. This interface. Secondly, because entrySet requires us to return a Set, the existing Set type (such as HashSet and TreeSet) in Java is not used here. There are two reasons: 1. in Java, the HashSet class is actually implemented using HashMap. The purpose of this blog is to study HashMap, so we don't need this class; 2. the implementation of the Set in Java is not very troublesome. Implement the AbstractSet yourself to deepen the understanding of the Set.
The following is our own key-Value Pair class SimpleEntry, which implements Map. Entry The Code is as follows:
Import java. util. Map; // key-value pairs stored in Map. The Map. Entry interface public class SimpleEntry must be implemented for key-value pairs.
Implements Map. Entry
{Private K key = null; // key private V value = null; // value public SimpleEntry (K k, V v) {this. key = k; this. value = v ;}@ Overridepublic K getKey () {return this. key ;}@ Overridepublic V getValue () {return this. value ;}@ Overridepublic V setValue (V v) {V oldValue = this. value; this. value = v; return oldValue ;}}
The following is our own set class SimpleSet, inherited from the abstract class AbstractSet The Code is as follows:
import java.util.AbstractSet;import java.util.ArrayList;import java.util.Iterator;public class SimpleSet
extends AbstractSet
{private ArrayList
list = new ArrayList
();@Overridepublic Iterator
iterator() {return this.list.iterator();}@Overridepublic int size() {return this.list.size();}@Overridepublic boolean contains(Object o) {return this.list.contains(o);}@Overridepublic boolean add(E e) {boolean isChanged = false;if(!this.list.contains(e)){this.list.add(e);isChanged = true;}return isChanged;}@Overridepublic boolean remove(Object o) {return this.list.remove(o);}@Overridepublic void clear() {this.list.clear();}}
We tested the SimpleMap class we wrote. The test consists of two parts: one is to test whether the SimpleMap we wrote is correct, and the other is to test the performance. The test code is as follows:
Import java. util. hashMap; import java. util. hashSet; import java. util. map; public class Test {public static void main (String [] args) {// Test SimpleMap's correctness SimpleMap
Map = new SimpleMap
(); Map. put ("iSpring", "27"); System. out. println (map); System. out. println (map. get ("iSpring"); System. out. println ("-----------------------------"); map. put ("iSpring", "28"); System. out. println (map); System. out. println (map. get ("iSpring"); System. out. println ("-----------------------------"); map. remove ("iSpring"); System. out. println (map); System. out. println (map. get ("iSpring"); System. out. println ("-----------------------------"); // test the performance of testPerformance (map);} public static void testPerformance (Map
Map) {map. clear (); for (int I = 0; I <10000; I ++) {String key = "key" + I; String value = "value" + I; map. put (key, value);} long startTime = System. currentTimeMillis (); for (int I = 0; I <10000; I ++) {String key = "key" + I; map. get (key) ;}long endTime = System. currentTimeMillis (); long time = endTime-startTime; System. out. println ("Traversal time:" + time + "millisecond ");}}
Output result: {iSpring = 27} 27 ------------------------------- {iSpring = 28} 28 ----------------------------- {} null ----------------------------- time traversal: 956 Ms
From the results, we can see that the output result is correct, that is, the basic implementation of SimpleMap we write is correct. We inserted 10000 key-value pairs into the Map. We tested the performance overhead of retrieving these 10000 key-value pairs from the Map, that is, the performance overhead of Map traversal, the results are 956 milliseconds.
Without comparison, we do not know the performance strength. We tested the time overhead of reading the 10000 key-value pairs by HashMap. the test method is the same, but we passed in the HashMap instance. The test code is as follows:
// Create a HashMap instance HashMap
Map = new HashMap
(); // Test the performance testPerformance (map );
The test result is as follows: Traversal Time: 32 Ms
I did not know. I was a little scared. It was not half past one that HashMap was faster than SimpleMap we implemented. Why is our SimpleMap performance so poor? What about the high performance of HashMap? We will study it separately. First, we will analyze why SimpleMap has poor performance. Our SimpleMap uses ArrayList to store keys and values. In essence, ArrayList is implemented using arrays. Our SimpleMap get method is implemented as follows:
@ Overridepublic V put (K key, V value) {V oldValue = null; int index = this. keys. indexOf (key); if (index> = 0) {// The key already exists in keys, and the valueoldValue corresponding to the key is updated = this. values. get (index); this. values. set (index, value);} There is no key in else {// keys. Add key and value as key-value pairs. this. keys. add (key); this. values. add (value) ;}return oldValue ;}
The main performance overhead is this. keys. indexOf (key) indicates the code used to search for the index of a specified Element in the ArrayList. The essence of this Code is to search for the index from the beginning of the array until the end of the array. As shown in:
In this way, The equals method of the element needs to be called every time the element is traversed. Therefore, the equals method is called many times, this results in SimpleMap inefficiency. For example, when we put vehicles all over the country into SimpleMap, we put the vehicles in order to the end of the ArrayList, and insert values in turn. The license plate number is equivalent to the key, and the vehicle is like the value, therefore, SimpleMap has two long arraylists, storing keys and values respectively. If you want to find a car in SimpleMap, the license plate is "Lu E. DE829 ", if you use ArrayList to search for all vehicles in the country, it is too slow.
Why is HashMap so efficient? HashMap is more intelligent. Let's take a look at HashMash. for java source code, HashMap classifies the elements and uses the above example to find a vehicle based on the license plate number. When we put the vehicle into HashMap, hashMap classifies them. First, when you come to a car, first check its license plate number. For example, the license plate number is "Lu E. DE829 ", when we look at it as Lu, we will know that it is a vehicle in Shandong, so HashMap opened up a space dedicated to parking cars in Shandong, put this car in this specific range of Shandong, and next time I want to add A license plate number to HashMap as "zhe. GX588 ", HashMap is a vehicle in Zhejiang province. It is placed in a specific zone in Zhejiang Province, and so on. Let's talk about it in plain words. Suppose we have a large bucket, which is the corresponding interval. We can hold a lot of cars, as shown in:
When we look for a specified vehicle based on the license plate number in HashMap, for example, if the license plate number is "Lu E. DE829 "car. When HashMap's get method is called, HashMap looks at the license plate number as Lu, so HashMap goes to the bucket marked as Lu, that is, to find this car in the Shandong region. In this way, there is no need to find this car from all over the country, which greatly shortens the search space and improves efficiency.
Let's look at HashMap. for specific source code implementation in java, HashMap stores an Entry array with a field named table, and table stores all key-value pairs in HashMap. Each key-value pair is an Entry object. Each Entry object stores a key and value. In addition, each Entry also contains a next field, and next is also an Entry type. The default length of the array table is DEFAULT_INITIAL_CAPACITY, that is, the initial length is 16. When the container needs more space to access the Entry, it automatically expands. The following is the source code implementation of the put Method of HashMap:
public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry
e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }
In the put method, the hashCode method of the object is called. This method returns an int type value, which is an initial hash value. This value is equivalent to the license plate number, for example, "Lu E. DE829 ", HashMap has a hash method. This hash method further processes the obtained initial hash value to obtain the final hash value, just like we pass the license plate number into the hash method, then return to the bucket where the vehicle is stored, that is, return to "Lu", so that HashMap places the car in the bucket marked with "Lu. The hash method mentioned above is called a hash function, which is used to return the specified final hash value based on the input value. The specific implementation is as follows:
static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
As you can see, HashMap mainly implements hash functions through bit operators. Here is a brief description of the hash function. There are multiple implementation methods for the hash function. For example, the simplest method is the remainder method, for example, the remainder Method for I % 10, create different blocks or buckets according to the remainder. For example, if there are 100 numbers, from 1 to 100 respectively, then the 100 numbers can be put into 10 buckets, which is the so-called hash function. However, the hash function in HashMap looks complicated and performs bitwise operations. However, its function is equivalent to the simple remainder hash method, that is, to classify and place elements. The specific method to put a key-value pair into HashMap is addEntry. The Code is as follows:
void addEntry(int hash, K key, V value, int bucketIndex) { Entry
e = table[bucketIndex]; table[bucketIndex] = new Entry<>(hash, key, value, e); if (size++ >= threshold) resize(2 * table.length); }
Key-value pairs are all Map. Entry And Map. the Entry has the next field, that is, the elements in the bucket are Map in the form of a one-way linked list. in this way, we can traverse all the elements in the bucket through next from the first element on the bucket. For example, the bucket has the following key-value pairs: bucket --> e1 --> e2 --> e3 --> e4 --> e5 --> e6 --> e7 --> e8 --> e9 -->... the addEntry code first extracts the first key-Value Pair e1 in the bucket, and then places the new key-Value Pair e to the position of the first element in the bucket, then place the key-Value Pair e1 after the new key-Value Pair e. After the key-value pair is placed, the new key-value pair in the bucket is as follows: bucket --> e --> e1 --> e2 --> e3 --> e4 --> e5 --> e6 --> e7 --> e8 --> e9 -->...
In this way, the new key-value pair is put into the bucket, and the key-value pair is put into HashMap.
So how can we find a key-value pair from HashMap? The principle is similar to putting key-value pairs into HashMap. The following is the source code implementation of the HashMap get method:
public V get(Object key) { if (key == null) return getForNullKey(); int hash = hash(key.hashCode()); for (Entry
e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null; }
In the get method, the hashCode method of the object is also called first, which is equivalent to the license plate number, and then the value is processed by the hash function to obtain the final hash value, that is, the bucket index. Then we will go to the bucket marked with "Lu" to find our key-value pair. First, we will retrieve the first key-value pair in the bucket and check whether it is the element we are looking, if yes, it will be returned directly. If not, it will continue to be searched through the one-way linked list through the next of the key-Value Pair until it is found. As shown in:
Next we will write another Car class. This class has a String field num, And we have rewritten the Car equals method. We think that this is the same Car as long as the license plate number is equal. The Code is as follows:
Import java. util. hashMap; public class Car {private final String num; // license plate number public Car (String n) {this. num = n;} public String getNum () {return this. num ;}@ Overridepublic boolean equals (Object obj) {if (obj = null) {return false;} if (obj instanceof Car) {Car car = (Car) obj; return this. num. equals (car. num);} return false;} public static void main (String [] args) {HashMap
Map = new HashMap
(); String num = "Lu E. DE829 "; Car car1 = new Car (num); Car car2 = new Car (num); System. out. println ("Car1 hash code:" + car1.hashCode (); System. out. println ("Car2 hash code:" + car2.hashCode (); System. out. println ("Car1 equals Car2:" + car1.equals (car2); map. put (car1, new String ("Car1"); map. put (car2, new String ("Car2"); System. out. println ("map. size (): "+ map. size ());}}
We wrote some test code in the main function. We created a HashMap, which uses Car as the key and string as the value. We instantiate two cars with the same string, namely car1 and car2, and then put both cars into HashMap. The output result is as follows: Car1 hash code: 404267176
Car2 hash code: 2027651571
Car1 equals Car2: true
Map. size (): 2
From the results, we can see that Car1 and Car2 are equal. Since the two are equal, that is, they are equal keys as keys, so HashMap can only put one of them as keys, but in the actual result, the length of map is two. Why? The key lies in the Car hashCode method, which is precisely the hashCode method of the Object. The hashCode method of the Object returns the Object Memory Address by default, because the memory address is unique.
We didn't override the hashCode method of Car, so the returned value of car1's hashCode is definitely different from the returned value of car2's hashCode. We have learned from the previous research that if two elements are equal, these two elements should be put in the same HashMap bucket. But because the hashCode of car1 and car2 is different, HashMap puts car1 and car2 in different buckets, which leads to a problem. Two equal elements (car1 and car2) of equals are placed in different intervals of HashMap if hashCode returns different values. Therefore, when writing code, we must ensure that the hash values of the two objects of mutual equals must be equal, that is, the return values of hashCode must be equal. So how can we solve this problem? We only need to override the hashCode method. The Code is as follows:
@Overridepublic int hashCode() {return this.num.hashCode();}
Run the test code in main again. The output result is as follows: Car1 hash code: 607836628
Car2 hash code: 607836628
Car1 equals Car2: true
Map. size (): 1
As we have mentioned before, equals objects must return the same hash value. All objects with the same hash value are in the same bucket, but in turn, objects with the same hash value (that is, objects in the Same bucket) do not have to be equals.
Summary:
1. hashMap uses the block search principle to improve the search efficiency. The hash value returned by the hashCode of the object is further processed, in this way, different elements are regularly stored in different blocks or buckets. The next time you look for this object, you can calculate its hash value, determine the block or bucket based on the hash value, and then find the element in this small range, which is much faster.
2. If the equals method is rewritten, The hashCode method must be rewritten to ensure that if the two objects are equals, the return values of the hashCode of the two objects must be equal.
3. If the hashCode return values of the two objects are equal, the two objects do not have to be equals.