Java Advanced----HashMap source analysis

Source: Internet
Author: User
Tags rehash

Today we continue to look at the source code of HashMap, several common methods for analysis. Before we analyze, we must first understand the structure of HashMap. See before I analyzed the ArrayList and LinkedList source of friends should be clear, ArrayList internal is an array to achieve, linkedlist inside is linked to the list. And HashMap is a combination of arrays and lists, although it looks more complicated, but careful analysis, it is very well understood. Let's look at a picture, which I drew according to my understanding.


We're looking at what the entry's internal structure is:


Above two figures, I believe you have a general understanding of the structure of HashMap, before actually looking at the code, I first to introduce the basic knowledge. As can be seen from the first diagram, there are some array elements connected to another entry instance, some only one, why this? This is because the same key is assigned to the same location after the hash calculation. Why does it come to this end? Because there is a hash conflict, that is, in a smaller space to store more data, there is bound to be a part of the data has no place to store, then this extra data, will hang in the existing elements below the formation of a linked list structure. To resolve the hash conflict.

Below we follow the code, to see the specific implementation of HASHMAP.

variables for internal use

    static final int default_initial_capacity = 1 << 4; Default entry array initialization size, default to a    static final int maximum_capacity = 1 << 30;//Maximum entry array size    static final float DE Fault_load_factor = 0.75f; Load factor, default 0.75, its effect on the bottom I explain    static final entry<?,? >[] empty_table = {};//empty Entry array    transient entry<k , v>[] table = (entry<k,v>[]) empty_table;    transient int size; The number of elements in the HashMap    int threshold;//critical value, <span style= "font-family:arial, Helvetica, Sans-serif;" >threshold = load factor * Current array capacity, the actual number exceeds the threshold, will be expanded </span>    final float loadfactor;//load factor    transient int Modcount; Number of changes

Here is a description of the load factor, load factor, which we can understand as the amount of space to fill.

1, the larger the load factor, the more elements filled, space utilization increases, the chance of hash conflict increases, each element is mounted on the list will be more and more long, at the same time will lead to the efficiency of the search element becomes low.

2, the smaller the load factor, the fewer elements filled, the less space utilization, the hash conflict is reduced, but the elements in the array are too sparse, resulting in many of the array of space has not been used to start the expansion, but the advantage is that the efficiency of the search element is relatively high.

Therefore, it is necessary to make a compromise between "search efficiency" and "space utilization", so that they are in a relatively balanced state. 0.75 is such a state of relative equilibrium.

HashMap Source Code AnalysisConstruction Method Analysis
                                                                                                                       Pub LIC HashMap (int initialcapacity, float loadfactor) {if (Initialcapacity < 0) throw new Illegalargume        Ntexception ("Illegal initial capacity:" + initialcapacity);        if (initialcapacity > maximum_capacity) initialcapacity = maximum_capacity; if (loadfactor <= 0 | |                                               Float.isnan (Loadfactor)) throw new IllegalArgumentException ("Illegal load factor:" +        Loadfactor);        This.loadfactor = Loadfactor;        threshold = initialcapacity;    Init ();    } public HashMap (int. initialcapacity) {This (initialcapacity, default_load_factor);    } public HashMap () {This (default_initial_capacity, default_load_factor); } public HashMap (map<? extends K,? extends v> m) {This (Math.max (int)(M.size ()/Default_load_factor) + 1, default_initial_capacity), default_load_factor);        Inflatetable (threshold);    Putallforcreate (m); }
As we can see, there are 4 construction methods, the first 3 are very simple, just a simple assignment, and no other action.Let's take a look at the last construction method, follow the Inflatetable method
    private void inflatetable (int tosize) {        //Find a power of 2 >= tosize        int capacity = ROUNDUPTOPOWEROF2 (tosize) ;        threshold = (int) math.min (capacity * Loadfactor, maximum_capacity + 1);        Table = new Entry[capacity];        Inithashseedasneeded (capacity);    }
As you can see, the action is to initialize the entry array, and before we determine the size of the array, we have done an operation to see what we have done.
    private static int roundUpToPowerOf2 (int number) {                return number >= maximum_capacity                ? Maximum_capacity                : (number > 1)? Integer.highestonebit ((number-1) << 1): 1;    }
May be my level is not high, my Java bit operation is not very cold, see the above code, I do not know what is doing, it's okay, let's write a test code to see what he is doing.
final int maximum_capacity = 1 << @Testpublic void Test () {for (int i = 0; i <; i++) {System.out.println (" I= "+ i +"-"+ (i >= maximum_capacity?) Maximum_capacity: (i > 1)? Integer.highestonebit ((i-1) << 1): 1);}}
The test code is simple, so let's look at the results.                    did you see that? The function of this code is to find the number of n times greater than 2 of the given number. (I'm not quite sure.) The ability to express needs to be improved AH)After this number is found, it is used as the initialized array size, why it is 2 n times, for a moment in explanation:
above is the construction method, in fact, is actually doing the initialization array, assignment, and so on, no other. Put method Analysis
Public V put (K key, V value) {//For the first put, initialize the array if (table = = empty_table) {inflatetable (threshold        );        }//key is NULL, anti-table[0] is the first position of the array if (key = = null) return Putfornullkey (value);        According to the key to calculate the hash value, the specific calculation of the hash algorithm I do not understand, but also hope that a senior can point to the int hash = hash (key);        Based on the hash value and the length of the table, it is determined that the element is stored in the first position of the array, that is, the index value of the position of the element in the array int i = indexfor (hash, table.length);            The linked list that iterates through the position, if there is a duplicate key, overrides the value for (entry<k,v> e = table[i]; e = null; e = e.next) {Object K;                if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {V oldValue = E.value;                E.value = value;                E.recordaccess (this);            return oldValue;        }}//number of modifications +1 modcount++;        Attach the newly added data to the location of Table[i] addentry (hash, key, value, I);    return null; }

Let's take a look at the code, the Putfornullkey method, which stores the key as a null method, essentially not much different from the other values, except that the key is null for a hashcode of 0, so he is always stored in the table[0] position.
    Private V Putfornullkey (v value) {for        (entry<k,v> e = table[0]; E! = null; e = e.next) {            if (E.key = = null) {                V oldValue = e.value;                E.value = value;                E.recordaccess (this);                return oldValue;            }        }        modcount++;        AddEntry (0, NULL, value, 0);        return null;    }
after the hashcode is based on key, but the algorithm in the inside do not understand, read a few articles, but not very clear, but we know that its role is to calculate the hashcode can
    Final int hash (Object k) {        int h = hashseed;        if (0! = h && k instanceof String) {            return Sun.misc.Hashing.stringHash32 ((String) k);        }        H ^= K.hashcode ();        A series of bit operations, do not quite understand what to do, the general effect is to let the elements in the array of evenly distributed        H ^= (H >>>) ^ (h >>>);        Return h ^ (H >>> 7) ^ (H >>> 4);    }
Then it returns the index location of the element store, based on the length of the hashcode and the array
    static int indexfor (int h, int length) {        return H & (length-1);    }
This will verify that the length of the array before the 2 of the n-th square. Let's analyze it.first, if the array length is 2 of the N-square, then the length of the array must be an even number, then the even-1 is necessarily an odd number, in the 2 binary representation, the last digit of the odd number is 1, so, with an odd number of "&" operation, the final result may be odd, may also be an even number. second, if the array length is not even, then the odd-1 is even, even in the 2 binary in the last one is 0, then with the even do "&" operation, the final result can only be even, impossible is odd, so in the odd position space is not stored in the element, so there will be one-second of the space is wasted. In summary, the array length takes 2 of the N-square, in order to allow the elements to be evenly distributed in the array, to reduce the chance of conflict.
after the location is stored, in the array of the position to determine whether there is a value, no words to add a node, mounted to the array of the position of the list, if there is a value, you need to traverse the current list, to see if there is a duplicate key, if there is duplicate key, then overwrite the corresponding value. If the key of the existing list is not duplicated, then the node is added.
    void AddEntry (int hash, K key, V value, int bucketindex) {    //determine if the array needs to be enlarged        if (size >= threshold) && (nu ll! = Table[bucketindex])) {            Resize (2 * table.length);            hash = (Null! = key)? Hash (key): 0;            Bucketindex = Indexfor (hash, table.length);        }                Createentry (hash, key, value, Bucketindex);    }    void Createentry (int hash, K key, V value, int bucketindex) {        entry<k,v> e = Table[bucketindex];        Table[bucketindex] = new entry<> (hash, key, value, e);        size++;    }

First of all, before adding elements to determine whether to expand, we first discuss the situation does not require expansion.1, first take the array of the position of the list out, and then create a new entry object, the new entry object's next point to the existing linked list just removed

Let's take a look at the need to expand the situation, when the number of existing elements greater than or equal to the critical value of the need for expansion, follow-up resize method
void Resize (int newcapacity) {entry[] oldtable = table;        int oldcapacity = Oldtable.length;            if (oldcapacity = = maximum_capacity) {threshold = Integer.max_value;        Return        } entry[] newtable = new Entry[newcapacity];        Transfer (newtable, inithashseedasneeded (newcapacity));        Table = newtable;    threshold = (int) math.min (newcapacity * loadfactor, maximum_capacity + 1);        } void Transfer (entry[] newtable, Boolean rehash) {int newcapacity = newtable.length;                for (entry<k,v> e:table) {while (null! = e) {entry<k,v> next = E.next;                if (rehash) {E.hash = NULL = = E.key? 0:hash (E.key);                } int i = Indexfor (E.hash, newcapacity);                E.next = Newtable[i];                Newtable[i] = e;            e = next; }        }    }

The emphasis is on the transfer method, which is also the source of HashMap not guaranteeing the order of the elements. Essentially, a new entry array is initialized, and then the previous entry array is iterated over, and the location of the new array is recalculated according to their hashcode. After reallocation, the original element position is bound to change, so HashMap does not guarantee the order in which the elements are deposited.
The above is the analysis of the Put method. To sum up, use key to calculate the hashcode, and then get the element in the array in the location, and then put it into the list of the head, put into the process to determine the capacity.
The Get method analyzes the Get method, which is relatively simple, many
    Public V get (Object key) {        if (key = = null)            return Getfornullkey ();        entry<k,v> Entry = Getentry (key);        return NULL = = entry? Null:entry.getValue ();    }    Final entry<k,v> getentry (Object key) {        if (size = = 0) {            return null;        }        int hash = (key = = null)? 0:hash (key);        for (entry<k,v> e = table[indexfor (hash, table.length)];             E! = null;             E = e.next) {            Object k;            if (E.hash = = Hash &&                (k = e.key) = = Key | | (Key! = null && key.equals (k)                ))) return e;        }        return null;    }
In fact, the principle is to get the position index, traverse the linked list. The rest is a few conditional judgments, relatively well understood.
We often use a number of methods, generally is such a routine, after understanding its data structure, in conjunction with code research, still very well understood.

Java Advanced----HashMap source analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.