Java HASHMAP Source Code Analysis __java

Source: Internet
Author: User
Tags array length rehash static class wrapper concurrentmodificationexception

I. Overview of HashMap

HashMap is an asynchronous implementation based on the map interface of a hash table. This implementation provides all the optional mapping operations and allows NULL values and NULL keys to be used. This class does not guarantee the order of mappings, especially if it does not guarantee that the order is immutable.


II. data structure of HashMap
In the Java programming language, the most basic structure is two kinds, one is the array, the other is the list (reference), all the data structure can use these two basic constructs, HashMap is no exception. HashMap is actually a "linked list hash" of the data structure, that is, the combination of arrays and lists.





As can be seen from the above figure, the HashMap bottom is an array structure, and each item in the array is a linked list. When a new HashMap is created, an array is initialized.


This article will be combined with the JDK7 in the HashMap source code for analysis, the following is the source fragment:


public class hashmap<k,v>
    extends abstractmap<k,v>
    implements Map<k,v>, Cloneable, Serializable
{

    /**
     * The default initial capacity-must is a power of two.
     * *
    static final int default_initial_capacity =;

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with AR Guments.
     * Must is a power of two <= 1<<30.
     * *
    static final int maximum_capacity = 1 << 30;


You can see that the HASHMAP default load capacity is 16 and the load factor is 0.75.

Holds the key value pair entry array table

    /**
     * The load factor used when none specified in constructor.
     *
    static final float default_load_factor = 0.75f;

    /**
     * The table, resized as necessary. Length must Always is a power of two.
     * *
    transient entry<k,v>[] table;


And look at the definition of the entry class.

    Static Class Entry<k,v> implements Map.entry<k,v> {
        final K key;
        V value;
        Entry<k,v> Next;
        int hash;

        /**
         * Creates new entry.
         *
        /Entry (int h, K K, v V, entry<k,v> N) {
            value = V;
            Next = N;
            key = k;
            hash = h;
        }


As you can see, Entry is an array of elements, each map.entry is actually a key-value pair, it holds a reference to the next element, which constitutes a linked list.



HashMap Construction Method

    Public HashMap (int initialcapacity, float loadfactor) {if (Initialcapacity < 0) throw new Ille
        Galargumentexception ("Illegal initial capacity:" + initialcapacity);
        if (initialcapacity > maximum_capacity) initialcapacity = maximum_capacity; if (loadfactor <= 0 | |
                                               Float.isnan (Loadfactor)) throw new IllegalArgumentException ("Illegal load factor:" +

        Loadfactor);
        Find a power of 2 >= initialcapacity int capacity = 1;

        while (capacity < initialcapacity) capacity <<= 1;
        This.loadfactor = Loadfactor;
        threshold = (int) math.min (capacity * Loadfactor, maximum_capacity + 1);
        Table = new Entry[capacity];
        usealthashing = sun.misc.VM.isBooted () && (capacity >= holder.alternative_hashing_threshold);
    Init (); }


Third, HashMap access implementation

1, storage

Public V-Put (K key, V value) {  
    //HashMap allows null keys and null values to be stored.  
    //When key is NULL, call the Putfornullkey method and place value in the first position of the array.  
    if (key = = null) return  
        Putfornullkey (value);  
    Recalculate the hash value based on the keycode of the key.  
    int hash = hash (Key.hashcode ());  
    Searches for the index of the specified hash value in the corresponding table.  
    int i = indexfor (hash, table.length);  
    If the Entry at the I index is not NULL, iterate through the next element of the E element continuously. For  
    (entry<k,v> e = table[i]; e!= null; e = e.next) {  
        Object K;  
        if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))) {  
            V oldValue = e.value;  
            E.value = value;  
            E.recordaccess (this);  
            Return OldValue  
        }  
    }  
    If the entry at the I index is NULL, it indicates that there is no entry here.  
    modcount++;  
    Adds key and value to the I index.  
    addentry (hash, key, value, I);  
    return null;  
}  

From the above source code can be seen: when we put elements in the HashMap, first based on the key Hashcode recalculate hash value, according to the hash is worth to the element in the array position (that is, subscript), if the array that position has been stored in the other elements, Then the elements in this position will be stored in the form of a linked list, and the new ones placed in the chain, the first to be added at the end of the chain. If the array does not have an element at that location, it is placed directly in that position in the array.

The AddEntry (hash, key, value, I) method places the Key-value pair at the I index of the array table according to the computed hash value, as follows:

void AddEntry (int hash, K key, V value, int bucketindex) {  
    //Get Entry   
    entry<k,v> e = tab at the specified Bucketindex index Le[bucketindex];  
    Place the newly created Entry into the Bucketindex index, and let the new Entry point to the original Entry  
    Table[bucketindex] = to new entry<k,v> (hash, key, value, E );  
    If the number of key-value pairs in the Map exceeds the limit if  
    (size++ >= threshold)  
    //The length of the table object is expanded to twice times the original.  
        Resize (2 * table.length);  
}


According to the source code of the Put method above, when the program attempts to place a key-value pair into the HashMap, the program first determines where the Entry is stored based on the hashcode () return value of the key: if the Entry of two hashcode key ( Returns the same value, they are stored in the same location. If the key of these two Entry returns true through Equals, the value of the newly added Entry will overwrite the value of Entry in the collection, but the key will not overwrite. If the key of these two Entry returns false through Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain.

The hash (Object K) method recalculates the hash once based on the hashcode of the key. This algorithm adds a high level calculation to prevent the hash conflict caused by low level constant and high change.

Final int hash (Object k) {
        int h = 0;
        if (usealthashing) {
            if (k instanceof String) {return
                Sun.misc.Hashing.stringHash32 ((String) k);
            }
            h = hashseed;
        }

        H ^= K.hashcode ();

        This function ensures so hashcodes that differ only by
        //constant multiples in each bit position have a bounded< c10/>//number of collisions (approximately 8 at default load factor).
        H ^= (H >>>) ^ (h >>>);
        Return h ^ (H >>> 7) ^ (H >>> 4);
    }

We can see that to find an element in HashMap, we need to get the position of the corresponding array according to the hash value of the key. How to calculate this position is the hash algorithm. Previously said HashMap's data structure is the combination of array and linked list, so we certainly hope that the element position in this hashmap as far as possible evenly distributed, as far as possible the number of elements in each position is only one, then when we use the hash algorithm to obtain this position, Immediately can know that the corresponding position of the element is what we want, and do not have to go through the linked list, which greatly optimizes the efficiency of the query.


For any given object, as long as its hashcode () return value is the same, the hash code value computed by the program call hash (int h) method is always the same. The first thing we think of is to take the hash value of the array length modulo operation, so that the distribution of elements is relatively uniform. However, the consumption of the "modulo" operation is relatively large, as is done in HashMap: Call the indexfor (int h, int length) method to calculate which index the object should be saved at the table array. The code for the indexfor (int h, int length) method is as follows:

/**
     * Returns index for hash code h.
     *
    /static int indexfor (int h, int length) {return
        H & (length-1);
    }


This method is very ingenious, it through H & (table.length-1) to get the object's save bit, and hashmap the length of the underlying array is always 2 n times, this is the HashMap speed optimization. The following code is available in the HashMap constructor:

Find a power of 2 >= initialcapacity
        int capacity = 1;
        while (capacity < initialcapacity)
            capacity <<= 1;

This code guarantees that when initialized, the capacity of the HashMap is always 2 n-th, that is, the length of the underlying array is always 2 n Times Square.
The,h& (length-1) operation is equivalent to the length modulo, which is h%length, but the & ratio is more efficient when the N-second square of length is always 2.

2, take the value

Public V get (Object key) {
        if (key = null) return
            getfornullkey ();
        entry<k,v> Entry = Getentry (key);

        return NULL = = entry? Null:entry.getValue ();
    }

The Getfornullkey () method is as follows:

Private V Getfornullkey () {for
        (entry<k,v> e = table[0]; e!= null; e = e.next) {
            if (E.key = null)
                RE Turn e.value;
        }
        return null;
    }


The Getentry (Object key) method is as follows:

/**
     * Returns The entry associated with the specified key in the
     * HashMap.  Returns NULL if the HASHMAP contains no mapping
     * for the key.
     * *
    final entry<k,v> getentry (Object key) {
        int hash = (key = null)? 0:hash (key);
        for (entry<k,v> e = table[indexfor (hash, table.length)];
             e!= null;
             E = e.next) {
            Object k;
            if (E.hash = = Hash &&
                (k = e.key) = = Key | | (Key!= null && key.equals (k)
                )) return e;
        }
        return null;
    }


With the above stored hash algorithm as the basis, the understanding of this code is very easy. From the above source code can be seen: from the HashMap get elements, first determine whether the key is a null value, if it is a null value directly return Getfornullkey () (table array position is 0 of the first key is null value), if the key is not empty, The hashcode of the key is computed, the element in the corresponding position in the array is found, and the required elements are found in the corresponding list by the Equals method of the key.


Simply put, HashMap key-value as a whole at the bottom, and this whole is a Entry object. The HashMap bottom uses a entry[] array to hold all the key-value pairs, and when a Entry object needs to be stored, the hash algorithm is used to determine where it is stored in the array, and where it is stored in the list at the array location based on the Equals method When a entry is needed, it also finds its storage location in the array according to the hash algorithm, and then removes the entry from the list in that location according to the Equals method.



iv. HashMap Rehash (resize)

When the elements in the HashMap are more and more, the probability of hash conflict is increasing, because the length of the array is fixed. Therefore, in order to improve the efficiency of the query, it is necessary to expand the array of HashMap, array expansion of the operation will also appear in the ArrayList, this is a common operation, and after the HashMap array expansion, The most performance-consuming point appears: the data in the original array must recalculate its position in the new array and put it in, which is resize.

So when will hashmap be enlarged? You can view the source code

void AddEntry (int hash, K key, V value, int bucketindex) {
        if (size >= threshold) && (null!= table[bucket Index]) {
            Resize (2 * table.length);
            hash = (null!= key)? Hash (key): 0;
            Bucketindex = Indexfor (hash, table.length);
        }

        Createentry (hash, key, value, Bucketindex);
    


The definition of threshold is as follows:
threshold = (int) math.min (capacity * Loadfactor, maximum_capacity + 1);


When the number of elements in the HashMap exceeds the array size *loadfactor, the array expands, and the default value of Loadfactor is 0.75, which is a compromise value. That is, by default, the array size is 16, so when the number of elements in the HashMap exceeds the 16*0.75=12, the size of the array is expanded to 2*16=32, that is, to expand by one time, and then recalculate the position of each element in the array, which is a very performance-consuming operation, So if we have predicted the number of elements in HashMap, then the number of preset elements can effectively improve the performance of HashMap.


V. Parameters affecting the performance of HashMap
The HASHMAP contains several constructors as follows:

HashMap (): Build a HashMap with an initial capacity of 16 and a load factor of 0.75.
HashMap (int initialcapacity): Constructs a HashMap with an initial capacity of initialcapacity and a load factor of 0.75.
HashMap (int initialcapacity, float loadfactor): Creates a HashMap with the specified initial capacity, the specified load factor.
HashMap's underlying constructor hashmap (int initialcapacity, float loadfactor) has two parameters, which are the initial capacity initialcapacity and load factor loadfactor, and the code is as follows:

Public HashMap (int initialcapacity, float loadfactor) {if (Initialcapacity < 0) throw new Illegala
        Rgumentexception ("Illegal initial capacity:" + initialcapacity);
        if (initialcapacity > maximum_capacity) initialcapacity = maximum_capacity; if (loadfactor <= 0 | |
                                               Float.isnan (Loadfactor)) throw new IllegalArgumentException ("Illegal load factor:" +

        Loadfactor);
        Find a power of 2 >= initialcapacity int capacity = 1;

        while (capacity < initialcapacity) capacity <<= 1;
        This.loadfactor = Loadfactor;
        threshold = (int) math.min (capacity * Loadfactor, maximum_capacity + 1);
        Table = new Entry[capacity];
        usealthashing = sun.misc.VM.isBooted () && (capacity >= holder.alternative_hashing_threshold);
    Init (); }


The maximum capacity of the initialcapacity:hashmap, that is, the length of the underlying array. Loadfactor: Load factor loadfactor defined as: The number of actual elements of the hash table (n)/hash Table capacity (m). The load factor measures the degree of use of a hash table, and the larger the load factor indicates the higher the reload of the hash table, the smaller the vice. For a hash table using the list method, the average time to find an element is O (1+a), so if the load factor is larger, the space is more fully utilized, but the result is a decrease in lookup efficiency, and if the load factor is too small, the data in the hash table will be too sparse to cause a serious waste of space.


In the implementation of HASHMAP, the maximum capacity of HashMap is judged by threshold field

threshold = (int) math.min (capacity * Loadfactor, maximum_capacity + 1);

Threshold is the minimum value in CAPACITY * loadfactor and maximum_capacity + 12.


HashMap in the addentry (int hash, K key, V value, int bucketindex) method to determine whether the expansion is required, as follows:

void AddEntry (int hash, K key, V value, int bucketindex) {
        if (size >= threshold) && (null!= table[bucket Index]) {
            Resize (2 * table.length);
            hash = (null!= key)? Hash (key): 0;
            Bucketindex = Indexfor (hash, table.length);
        }

        Createentry (hash, key, value, Bucketindex);
    


According to the definition formula of load factor, threshold is the maximum number of elements allowed under this loadfactor and capacity, and resize to reduce the actual load factor. The default load factor 0.75 is a balanced selection of space and time efficiency. When the capacity exceeds this maximum capacity, the HashMap capacity after resize is twice times the capacity.


vi. mechanism of Fail-fast

We know that JAVA.UTIL.HASHMAP is not thread safe, so if there are other threads modifying HashMap in the process of using the iterator, then the concurrentmodificationexception is thrown, which is called the Fail-fast policy.
The implementation of this strategy in the source code is through the Modcount domain, modcount as the name implies is the number of changes, the HashMap content will be modified to increase this value, then in the iterator initialization will assign this value to the iterator's expectedmodcount.

Private abstract class Hashiterator<e> implements iterator<e> {
        entry<k,v> next;        Next entry to return
        int expectedmodcount;   for fast-fail
        int index;              Current slot
        entry<k,v> current;     Current entry

        Hashiterator () {
            expectedmodcount = Modcount;
            if (Size > 0) {//advance to-a entry
                entry[] t = table;
                while (Index < t.length && (next = t[index++]) = = null)
                    ;
            }
        


In an iterative process, it is judged whether modcount is equal to Expectedmodcount, and if not, it means that another thread has modified the map as follows:

Final entry<k,v> NextEntry () {
            if (modcount!= expectedmodcount)
                throw new Concurrentmodificationexception ();
            Entry<k,v> e = next;
            if (E = = null)
                throw new Nosuchelementexception ();

            if (next = e.next) = = null) {
                entry[] t = table;
                while (Index < t.length && (next = t[index++]) = = null
                    )
            ;
            current = e;
            return e;
        }


/** * The number of times this is
     HashMap has been structurally modified
     * Structural modifications are those that CH Ange
     the number of mappings in * HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This are used to make iterators on collection-views of
     * HashMap fail-fast.  (concurrentmodificationexception).
     * *
    transient int modcount;


Note that Modcount is declared as transient, guaranteeing the visibility of modifications between threads.



Vii. Summary
HashMap based on the hashing principle, we store and retrieve objects using the put () and get () methods. When we pass the key-value pair to the put () method, it calls the Hashcode () method of the Key object to compute the hashcode and then finds the bucket position to store the value object. When the object is fetched, the correct key-value pair is found by the Equals () method of the Key object, and then the value object is returned. HashMap uses a linked list to solve the collision problem, and when a collision occurs, the object will be stored in the next node of the list. HashMap stores key value pairs in each linked list node.

When the hashcode of two different key objects are identical, they are stored in a linked list in the same bucket position. The Equals () method of the Key object is used to find the key value pair.



Here are some of the more classic questions about HashMap on the Web:
why string, Interger such a wrapper class is suitable as a key.

String, interger such a wrapper class as a HashMap key is a good fit, and string is most commonly used. Because string is immutable and final, the Equals () and Hashcode () methods have been overridden. Other wrapper classes also have this feature. Immutability is necessary, because in order to compute hashcode (), you should prevent the key value from changing, if the key value in the time and get back to the different hashcode, then you can not find the object you want from the HashMap. Immutability has other advantages, such as thread safety. If you can guarantee that hashcode is unchanged by simply declaring a field as final, then do so. Because the Equals () and Hashcode () methods are used to get the object, it is important that the key object rewrite the two methods correctly. If two unequal objects return different hashcode, the chances of collisions will be smaller, which can improve the performance of HashMap.

Can we use a custom object as a key?
This is an extension of the previous question. Of course you might use any object as a key, as long as it follows the rules of the Equals () and Hashcode () method, and will not change after the object is inserted into the map. If this custom object is immutable, it already satisfies the condition of being a key because it cannot be changed after it is created.


Can we use Cocurrenthashmap instead of Hashtable?
This is another very popular face test, because concurrenthashmap more and more people use. We know that Hashtable is synchronized, but Concurrenthashmap synchronization is better because it locks only part of the map based on the sync level. Concurrenthashmap of course can replace Hashtable, but Hashtable provides stronger thread security. The difference between Hashtable and Concurrenthashmap can be viewed in this blog: http://javarevisited.blogspot.sg/2011/04/ Difference-between-concurrenthashmap.html








Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.