Elaborate Java.util.HashMap

Source: Internet
Author: User
Tags rehash

HashMap is one of our most commonly used classes, it implements the hash algorithm, although the use is very simple, but its implementation has a lot of places worth studying.

HashMap stores a key-value pair in the form of Key-value, which is represented in the implementation using a static inner class entry, which stores the key, value, hash value, and a reference to the next element in the list when the hash conflict occurs.

HashMap the underlying implementation uses an array to store the elements. Its initial capacity is 16 by default, and must have an integer power of 2, a maximum capacity of 1<<30 (1.07 billion +), and a load factor (load factor) to control the expansion of this hash table of this map. The default is 0.75, which expands when the capacity reaches the initial capacity of 3/4 (not only, of course, as explained later).

When you add an element to HashMap, it calculates the hashcode of the key and then determines where it is stored in the array based on the hashcode and the size of the group, and when it encounters a hash conflict, it is stored in the array as a list.

The following specific look at the source code, first look at the construction method

    Public HashMap (int initialcapacity, float loadfactor) {    //initial capacity cannot be less than 0, otherwise an exception will be thrown        if (initialcapacity < 0)            throw new IllegalArgumentException ("Illegal initial capacity:" +                                               initialcapacity);        Control initial capacity cannot be greater than maximum capacity 1<<30        if (initialcapacity > maximum_capacity)            initialcapacity = maximum_capacity;        Check the legitimacy of the load factor, cannot be less than 0, and must be numeric        if (loadfactor <= 0 | | Float.isnan (loadfactor))            throw new IllegalArgumentException ("Illegal load factor:" +                                               loadfactor);        This.loadfactor = Loadfactor;        threshold = initialcapacity;        This init method is left to subclass extension        init ();    }
You can see that when you create a hashmap, you do not allocate memory space, but instead allocate it when you actually add data to the map, as you can see from the Put method:

    Public V put (K key, V value) {//create unallocated space, so check if the table is empty, allocate memory space if (table = = empty_table) {i        Nflatetable (threshold);        }//special handling of NULL key if (key = = null) return Putfornullkey (value);        Computes the key's hashcode int hash = hash (key);        According to Hashcode and current capacity to determine the position of the element in the hash table, that is, the location of the hash bucket int i = indexfor (hash, table.length);  Check if the key already exists, replace the old value with the new value if it already exists, and return the old value for (entry<k,v> e = table[i]; E! = null; e = e.next) {Object            K Here you can see whether a key already exists if it is based on the Hashcode and Equals method (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))                ) {V oldValue = E.value;                E.value = value;                E.recordaccess (this);            return oldValue;        }}//Increase the number of changes to the map, which is used to implement the fail-fast mechanism modcount++;        Actually add the element to the hash table specified in the index position processing (also known as hash bucket) addentry (hash, key, value, I); Returns null indicates that no re exists before keyTurn null;        } void AddEntry (int hash, K key, V value, int bucketindex) {//To determine if expansion is required, current capacity reaches threshold, and a hash conflict has been generated (specifies that the hash bucket already has elements present) if (size >= threshold) && (null! = Table[bucketindex])) {//capacity expands to previous twice times resize (2 * tabl            E.length); hash = (Null! = key)?            Hash (key): 0;        Recalculate the stored hash bucket position Bucketindex = indexfor (hash, table.length);    }//Create entry and store it in the hash table createentry (hash, key, value, Bucketindex); } void Createentry (int hash, K key, V value, int bucketindex) {//Take out previously existing element entry<k,v> E = Table[buc        Ketindex];        Put the new element at the beginning of the list, i.e. let the next reference of the new element point to the previously existing element table[bucketindex] = new entry<> (hash, key, value, E);    Modify element Count size++; }

As you can see from your code,expansion needs to meet the following two conditions:

    1. To reach the threshold specified by the load factor
    2. Hash conflict when put current value (that is, the position of the current bucket already exists element)

Only the number of key value in the current container exceeds the threshold is not expandable. That is, for example, the initial capacity of 16, when the threshold is reached before a large number of hash conflicts, and then add the elements are rarely hash collisions, then it is possible that the number of key value more than 16*0.75=12 or even more than 16 do not expand, so the hash algorithm must ensure uniform distribution, Minimize hash collisions.

Above is the implementation of the add element, and here is how it initializes and allocates memory:

    private void inflatetable (int tosize) {        //guaranteed capacity is 2 integer power        int capacity = ROUNDUPTOPOWEROF2 (tosize);        At the time of initialization, the threshold value of expansion is calculated and saved to avoid recalculation of        threshold = (int) math.min (capacity * Loadfactor, maximum_capacity + 1) every time;        Only here will the real memory be allocated        table = new Entry[capacity];        Initialize hash seed        inithashseedasneeded (capacity);    }    /**     * Ensures that the capacity is 2 full power and does not exceed the maximum capacity.     * For example: The incoming is 15, the value becomes 16, the incoming is 17, it will become a     number, * that is greater than the current value and the closest to 2 of the whole number of powers */    private static int roundUpToPowerOf2 (int number) {        //guaranteed capacity is an integer power of 2 and does not exceed the maximum capacity        return number >= maximum_capacity                ? Maximum_capacity                : (number > 1)? Integer.highestonebit ((number-1) << 1): 1;    }
Special handling of NULL key:

    Private V Putfornullkey (v value) {    //If already present, replace old value for        (entry<k,v> e = table[0]; E! = null; e = e.next) {
   if (E.key = = null) {                V oldValue = e.value;                E.value = value;                E.recordaccess (this);                return oldValue;            }        }        Increase the number of changes to the map, which is used to implement the fail-fast mechanism        modcount++;        The hashcode of the null key is fixed to 0, and the position of the bucket is fixed to 0        addentry (0, NULL, value, 0);        return null;    }
and see how to determine the location of a non-null key

    static int indexfor (int h, int length) {        return H & (length-1);    }
H is the hashcode,length of key is the maximum length of the current hash table, H & (Length-1) is equivalent to h% length, but the former uses bit operation, and bit operation is faster than modulo operation. Why can I use & operations instead of modulo operations? Because length is an integer power of 2, and it is minus 1, the low is exactly 1, and the other number is &, the result will certainly not exceed length, the same as the effect of the% operation. If length is not an integer power of 2, then it is not possible to do so, so the use of this is very ingenious.

Here's a look at the core of the hash method for generating hashcode:

    Final int hash (Object k) {        int h = hashseed;        if (0! = h && k instanceof String) {            return Sun.misc.Hashing.stringHash32 ((String) k);        }        Call the Hashcode () method of key to get hashcode        H ^= k.hashcode ();        A series of displacements and XOR operations are performed on hashcode and the results are returned as Hashcode        H ^= (H >>>) ^ (h >>> N);        Return h ^ (H >>> 7) ^ (H >>> 4);    }
Over herewhy this series of displacements and XOR operationsIt? Mainly after its operation here, it can make the bit 0 and 1 in the hashcode evenly distributed, thus reducing the hash conflict, thus improving the efficiency of the whole hashmap.

Rehash when expanding:

    void Resize (int newcapacity) {entry[] oldtable = table;        int oldcapacity = Oldtable.length;            if (oldcapacity = = maximum_capacity) {threshold = Integer.max_value;        Return        }//re-create the underlying array entry[] newtable = new Entry[newcapacity];        Re-hash the existing elements into the new hash bucket transfer (newtable, inithashseedasneeded (newcapacity));        Table = newtable;    Update capacity threshold threshold = (int) math.min (newcapacity * loadfactor, maximum_capacity + 1);        } void Transfer (entry[] newtable, Boolean rehash) {int newcapacity = newtable.length;                for (entry<k,v> e:table) {while (null! = e) {entry<k,v> next = E.next;                if (rehash) {E.hash = NULL = = E.key? 0:hash (E.key);                } int i = Indexfor (E.hash, newcapacity);                E.next = Newtable[i];                Newtable[i] = e;         e = next;   }        }    } 
Since the hash table length has changed, it is necessary to recalculate the hashcode and put it into the new hash bucket for existing elements. This is a time-consuming operation, so when creating a hashmap, if you have an expected value for the amount of data, you should set a more appropriate initial capacity to avoid the performance penalty of constant expansion in the process of adding data.

Let's take a look at the get operation.

 public V get (Object key) {///null key for special operation if (key = = null) retur        n Getfornullkey ();        Gets the key corresponding to the Entry entry<k,v> Entry = getentry (key); Returns the value corresponding to key if it exists, returns null if it does not exist = = entry?    Null:entry.getValue ();  Final entry<k,v> getentry (Object key) {//size = 0 means no element, so return NULL if (size = = 0) {return        Null }//Get key hashcode int hash = (key = = null)?        0:hash (key);             Gets the element in the hash bucket corresponding to the key, and iterates over the linked list to return the corresponding value for (entry<k,v> e = table[indexfor (hash, table.length)];             E! = null;            E = E.next) {Object k; Determine key if (E.hash = = Hash && (k = e.key) = = key according to the Hashcode and Equalse () method | |                (Key! = null && key.equals (k))))        return e;    }//If not present, returns null return null; }
For the load factor, the default is 0.75, which is a tradeoff value, we can change this value by constructing the method, but we need to be aware thatthe larger the load factor, the greater the overhead of querying the data。 Because the larger the load factor, meaning that the more elements stored in the map, so the more likely the hash conflict, according to the Hashcode calculated hash bucket location is the same, then saved as a linked list, and the list of query operations will traverse the entire list, so query efficiency is not high. In the case of get and put, the query element is queried, so the efficiency of HASHMAP is improved. This is a strategy for exchanging space for time.

Why is HashMap efficient ? HashMap has ensured its efficiency through the following points:

    • Efficient hash algorithm makes it difficult to generate hash conflicts
    • Fast access for elements based on array storage
    • Can be used to exchange space for time by loading factor


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Elaborate Java.util.HashMap

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.