HashMap Bottom Implementation Principle _

HashMap Bottom Implementation Principle __hashmap

Last Update:2018-07-28 Source: Internet

Author: User

Tags data structures prev static class

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the jdk1.6,jdk1.7, the HashMap adopts a bit bucket + linked list, which is used to deal with the conflict, the same hash value of the linked list is stored in a chain list. However, when there are more elements in a bucket, that is, the number of elements with equal hash value, it is less efficient to search by key value in turn. In the JDK1.8, the HashMap adopts a bit bucket + linked list + red-black tree to realize, when the chain table length exceeds the threshold value (8), the list is converted to a red-black tree, which greatly reduces the lookup time.

The simple next HashMap principle:

First, there is an array of each element that is a list (possibly inaccurate), when an element (Key-value) is added, the hash value of the element key is first computed to determine the position in the inserted array, but elements that may have the same hash value have been placed in the same position as the array. Then add to the same hash value after the elements, they are in the same position in the array, but formed a linked list, the same list of the hash value is the same, so that the array is stored in the list. When the list length is too long, the list is converted to a red-black tree, which greatly improves the efficiency of the search.

When the capacity of the list array exceeds 0.75 of the initial capacity, the hash expands the array of the list by twice times, and moves the array of the original list to the new array.

The schematic diagram of HashMap is:

One, the data structures involved in the JDK1.8

1, bit bucket array

Transient node[] table;//an array of stored (bit barrels)

2, the array element node implements the entry interface

Node is a one-way linked list, it implements the Map.entry interface

Static Class Node implements Map.entry {final int hash;
    Final K Key;
    V value;
    Node Next;
        Constructor hash value key value the next node (int hash, K key, V value, node next) {This.hash = hash;
        This.key = key;
        This.value = value;
    This.next = Next;
    Public final K Getkey () {return key;}
    Public Final v. GetValue () {return value;}

    Public final String toString () {return key + = + value;}
    Public final int hashcode () {return Objects.hashcode (key) ^ Objects.hashcode (value);
        Public final V SetValue (v newvalue) {v oldValue = value;
        value = newvalue;
    return oldValue; //Determine whether two node is equal, and return TRUE if key and value are equal.
        Can compare itself to true public final Boolean equals (Object o) {if (o = =) return true;
            if (o instanceof map.entry) {map.entry e = (map.entry) o; if (Objects.equals (Key, E.getkey ()) && objects.equals (Value, E.getvalue ())) return true;
    return false; }

3, red-black tree

Red black tree
static final class TreeNode extends Linkedhashmap.entry {
    TreeNode parent;  Parent node
    TreeNode left;//Zoozi tree
    TreeNode right;//right subtree
    TreeNode prev;    needed to unlink next upon deletion
    Boolean red;    Color Properties
    TreeNode (int hash, K key, V Val, Node next) {
        super (hash, Key, Val, next);
    }
  Returns the root node of the current node
    final TreeNode root () {
        for TreeNode r = this, p;;) {
            if (p = r.parent) = = null) return
                R;
            r = P;
        }
    }

Second, the source of data in the field

Load factor (default 0.75): Why do I need to use the load factor? Because if the fill ratio is very large, the use of space, if not the expansion, the linked list will be more and more long, so the efficiency of the search is very low, because the length of the list is very large (of course, the latest version of the red and black trees will improve a lot), expansion, Each linked list of the original list array is divided into odd and even two sub lists, which are hung in the hash position of the array of the new list, which reduces the length of each list and increases the search efficiency.

HashMap was supposed to change time in space, so the fill ratio didn't need to be too big. But the padding ratio is too small and can cause the space to waste. If you are concerned about memory, the fill ratio can be slightly larger, and if the primary focus is on lookup performance, the padding ratio can be slightly smaller.

public class HashMap extends Abstractmap implements Map, Cloneable, Serializable {
    private static final long Serialver Sionuid = 362498820763181265L;
    static final int default_initial_capacity = 1 << 4; aka
    static final int maximum_capacity = 1 << 30;//maximum capacity
    static final float default_load_factor = 0.75f; Fill ratio
    //When add an element to a bit bucket, its list length reaches 8 to convert the list to red black tree
    static final int treeify_threshold = 8;
    static final int untreeify_threshold = 6;
    static final int min_treeify_capacity =;
    Transient node[] table;//An array of storage elements
    transient set> entryset;
    transient int size;//Storing the number of elements
    transient int modcount;//The number of times fast-fail mechanism
    int threshold;//critical value actual size (capacity * fill ratio) When the threshold is exceeded, the 
    final float loadfactor;//fill ratio (...) is enlarged. Slightly later)

Three, the HashMap constructor
There are 4 methods of constructing HashMap, which are mainly related to parameters, specifying initial capacity, specifying the fill ratio, and the map used for initialization.

Constructor 1 public HashMap (int initialcapacity, float loadfactor) {//specified initial capacity nonnegative if (initialcapacity < 0) th 
    Row new IllegalArgumentException (Illegal initial capacity: + initialcapacity);
    If the specified initial capacity is greater than the maximum capacity, the maximum capacity if (Initialcapacity > maximum_capacity) initialcapacity = maximum_capacity; Padding ratio is positive if (loadfactor <= 0 | |
                                           Float.isnan (Loadfactor)) throw new illegalargumentexception (illegal load factor: +
    Loadfactor);
    This.loadfactor = Loadfactor; This.threshold = Tablesizefor (initialcapacity)//New expansion Critical Value}//Constructor 2 public HashMap (int initialcapacity) {This (initial
Capacity, Default_load_factor); }//Constructor 3 public HashMap () {this.loadfactor = Default_load_factor;//All other fields defaulted}//Constructor 4 initializes the element with M
    Column Mappings public HashMap (map m) {this.loadfactor = Default_load_factor;
Putmapentries (M, false);
 }

Four, hashmap access mechanism

Public V-Put (K key, V value) {return Putval (hash (key), key, value, false, true);
     }/** * Implements Map.put and Related methods * * @param hash hash for key * @param key * @param value the value to put * @param onlyifabsent if true, don ' t change existing value * @param evict if FA
     LSE, the table is in creation mode. * @return Previous value, or null if none */FINAL V putval (int hash, K key, V value, Boolean onlyifabsent, Boolean Evict) {node<k,v>[] tab; Node<k,v> p;
        int n, I; If Tal is not initialized or the length of the table is 0, first call resize Initialize table if (tab = table) = = NULL | |
             (n = tab.length) = = 0) n = (tab = resize ()). length; If the value of the table in (n-1) &hash is empty, create a new node to insert at that position if ((p = tab[i = (n-1) & hash) = = null) tab[i] = NE
            Wnode (hash, key, value, NULL); Otherwise there is a conflict, start handling the conflict else {node<k,v> E;
             K K;
  Check if the first node,p is the value you're looking for.          if (P.hash = = Hash && (k = p.key) = = Key | |
                (Key!= null && key.equals (k)))
            e = p;
            else if (P instanceof TreeNode) e = ((treenode<k,v>) p). puttreeval (This, tab, hash, key, value);
                    else {//traversal to the last of the list, the pointer is empty to hang in the back for (int bincount = 0;; ++bincount) {
                        if ((e = p.next) = = null) {P.next = NewNode (hash, key, value, NULL);
                        If the number of conflicting nodes has reached treeify_threshold (8), see if you need to change the storage structure of the conflicting nodes,//treeifybin first judge the length of the current hashmap, if less than 64, only Resize, capacity table, if reached 64, then the conflicting storage structure will be the red-black tree if (Bincount >= treeify_threshold
                        -1)//-1 for 1st treeifybin (tab, hash);
                    Break ///If you have the same key value, end traversal if (E.hash = Hash && (K = e.key) = = Key | |
                        (Key!= null && key.equals (k)))
                    Break
                p = e; }//Is the same key value on the list if (e!= null) {//Existing mapping for key v.
                Value = E.value; Onlyifabsent is true so long does not perform replacement of value of duplicate key, if (!onlyifabsent | | | oldValue = NULL) E.valu
                e = value;
                Afternodeaccess (e);
            Returns the existing value value return oldValue;
        }} ++modcount;
        If the current size is greater than the threshold, the threshold is originally the initial capacity *0.75 if (++size > Threshold)//expansion twice times resize ();
        Afternodeinsertion (evict);
    return null; }

/** * replaces all linked nodes into bin at index to given hash unless * table is too SM
     All, in which case resizes instead. * * Final void Treeifybin (node<k,v>[] tab, int hash) {int n, index;
        Node<k,v> e; if (tab = = NULL | | (n = tab.length) < min_treeify_capacity)//If the length of the table does not exceed min_treeify_capacity (=64), then direct resize Res
        Ize ();
            else if ((E = Tab[index = (n-1) & hash])!= null) {treenode<k,v> HD = null, TL = NULL;
                do {treenode<k,v> p = replacementtreenode (E, null);
                if (TL = null) HD = p;
                    else {p.prev = TL;
                Tl.next = p;
            TL = p;
            while ((e = e.next)!= null);
        if ((Tab[index] = HD)!= null) hd.treeify (tab); }
    }

The following is a simple procedure to add a key value pair to put (Key,value):
1, determine whether the key value array tab[] is null or NULL, otherwise the default size resize ();
2, based on the key value of the hash value to get the inserted array index i, if tab[i]==null, add directly to the new node, otherwise transferred to 3
3, to determine whether the current array processing hash conflict is linked list or red-black tree (check the first node type can), respectively, processing

Five, the expansion mechanism of HASMAP resize ();

When you construct a hash table, if you do not indicate the initial size, the default size is 16 (that is, Node array size 16), and if the elements in the node[] array reach (padding is more than *node.length) resize hashmap to twice times the size, the expansion is time-consuming

 /** * Initializes or doubles table size.
     If NULL, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either to stay at same index
     , or Move * with a power of two offset in the new table.
        * * @return The table */FINAL node[] Resize () {node[] oldtab = table; int Oldcap = (Oldtab = null)?
        0:oldtab.length;
        int oldthr = threshold;

    int Newcap, newthr = 0; /* If the length of the old table is not empty/if (Oldcap > 0) {if (Oldcap >= maximum_capacity) {threshold = I Nteger.
                Max_value;
            return oldtab; /* * Set the length of the new table to twice times the length of the old table, newcap=2*oldcap*/else if ((Newcap = oldcap << 1) < maximum_capacity &AMP;&A
                     mp Oldcap >= default_initial_capacity)/* Sets the threshold of the new table to twice times the threshold of the old table, newthr=oldthr*2*/newthr = OldthR << 1; Double threshold}/* If the length of the old table is 0, it means that the first initialization table/else if (Oldthr > 0)//initial capacity was place
        D in threshold Newcap = OLDTHR;
            else {//Zero initial threshold signifies using defaults newcap = default_initial_capacity;
        NEWTHR = (int) (Default_load_factor * default_initial_capacity); } if (Newthr = = 0) {Float ft = (float) newcap * loadfactor;//new table length times load factor newthr = (NEWCA
                      P < maximum_capacity && ft < (float) maximum_capacity?
        (int) Ft:Integer.MAX_VALUE);
        } threshold = Newthr;
        @SuppressWarnings ({"Rawtypes", "Unchecked"})/* Start constructing the new table, initialize the data in the table/node[] Newtab = (node[)) new Node[newcap];     
            Table = newtab;//assigns a new table to table if (Oldtab!= null) {//The original table is not empty to move the data in the original table to the new table/* traverse the original old table * *
       for (int j = 0; j < Oldcap; ++j) {Node e;         if ((e = oldtab[j])!= null) {OLDTAB[J] = null; if (E.next = = null)//indicates that this node does not have a linked list directly in the E.hash & (newCap-1) position of the new table Newtab[e.hash & Newcap-
                    1)] = e;
    else if (e instanceof TreeNode) ((TreeNode) e). Split (This, Newtab, J, Oldcap); /* If there is a linked list after E, here to show E back with a single linked list, you need to traverse the single linked list, each node will be heavy/else {//Preserve order guarantee sequence////new calculation in the new table
                        The location and carry out the transport Node lohead = null, lotail = NULL;
                        Node hihead = null, hitail = NULL;

                        Node Next; do {next = e.next;//record the next node//new table is twice times the capacity of the old table, on the instance, split the list into two teams,//e.hash&
                                Amp;oldcap is an even-numbered team, E.hash&oldcap is an odd pair if (E.hash & oldcap) = = 0) {
            if (Lotail = = null) Lohead = e;                    else Lotail.next = e;
                            Lotail = e;
                                    else {if (Hitail = null)
                                Hihead = e;
                                else Hitail.next = e;
                            Hitail = e;

                        } while ((E = next)!= null);
                            if (lotail!= null) {//lo team is not NULL, placed in the new table original position lotail.next = null;
                        NEWTAB[J] = Lohead;
                            } if (Hitail!= null) {//hi team is not NULL, placed in the new table j+oldcap position hitail.next = null;
                        Newtab[j + oldcap] = Hihead;
    }}} return newtab; }

Vi. improvement of JDK1.8 using red-black tree

In the Java jdk8 to the hashmap of the source code optimization, in Jdk7, HashMap processing "collision" when, are used to store the linked list, when the collision of a lot of nodes, query time is O (n).
In Jdk8, HashMap handles "collisions" by adding red-black trees to this data structure, when the collision node is less, the use of linked list storage, when the larger (>8), the use of red-black tree (characterized by the query time is O (logn)) storage (with a threshold control, greater than the threshold (8), Convert linked list storage to red-black tree storage

Problem Analysis:

You may also know that a hash collision can have a disastrous effect on the performance of a hashmap. If multiple hashcode () values fall into the same bucket, the values are stored in a list. In the worst case, all the keys are mapped to the same bucket, so the hashmap degenerate into a list-the lookup time from O (1) to O (n).

As the size of the hashmap increases, the cost of the Get () method becomes larger. Because all of the records are in the very long list in the same bucket, the average query for one record requires traversing the half.

Jdk1.8hashmap's red and black trees are solved by this:

If the record in a bucket is too large (currently treeify_threshold = 8), HashMap will dynamically replace it with a dedicated TREEMAP implementation. The result would be better, O (Logn), rather than bad O (n).

How it works. The records that correspond to the key of the previous conflict are simply appended to a list that can only be searched through traversal. But beyond this threshold, HashMap starts to upgrade the list to a binary tree, using the hash value as the branch variable of the tree, and if the two hashes are unequal, pointing to the same bucket, the larger one is inserted into the right subtree. If the hash value is equal, HashMap expects the key value to be the best implementation of the comparable interface, so that it can be inserted in order. This is not necessary for HashMap key, but it is certainly best if implemented. If you don't implement this interface, you don't expect to get a performance boost in the event of a serious hash collision.

In short, it is a sentence:hashmap of the bottom of the bucket to achieve, the bucket is stored in the list (1.7 before) or red-black tree (ordered, 1.8 start), in fact, is the array plus linked list (or red-black tree) format, by judging the hashcode positioning bucket in the subscript, Position the target value in the list by equals, so if you use the key using the variable class (not the final modified Class), Then you must pay attention when customizing Hashcode and equals: If two objects equals must be hashcode the same, if it is hashcode the same words do not necessarily require equals. Therefore, in general, do not customize Hashcode and Equls, it is recommended to use immutable objects to do key, such as Integer, string and so on.

You can also read this article: http://www.cnblogs.com/beatIteWeNerverGiveUp/p/5709841.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More