Analyzing Java Collection "HashMap" from source code

Last Update:2015-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Map as we know, stored is a key-value pair, its basic unit is implemented map.entry<k,v> Node<k,v>,node properties as follows:

Static Class Node<k,v> implements map.entry<k,v> {        final int hash;        Final K key;        V value;        Node<k,v> Next;}

Look at the definition can know its role, can see it stored a point to the next node of the object next, so we are probably able to think of its storage method is stored by the linked list, in fact, by flipping through the data we can also know clearly, HashMap is actually a list of linked arrays. Its structure is as follows:

After seeing this, look at the source code, should be able to clear a lot.

public class Hashmap<k,v> extends abstractmap<k,v> implements Map<k,v>, Cloneable, Serializable { /* * HASHMAP Default initial capacity * */static final int default_initial_capacity = 1 << 4; aka/** * HashMap maximum capacity, if stored 1 << 30, need to exist in the stack 2 ^ 30 point to the specified heap * Space of the object space, under the 32-bit machine, each such object space is 4 bytes; This means that all the memory space is used to store the map, and the JVM's heap size and stack size are capped, so it's hard to reach this big * * * static final int Maximum_capac        ity = 1 << 30;        /** * Default load Factor */static final float default_load_factor = 0.75f;        /** * List of linked lists, whose index will be explained later in detail */transient node<k,v>[] table;        /** * * * use keyset () and values () will use the */transient set<map.entry<k,v>> entryset;        /** * Stores the number of key-value pairs */transient int size;        /** * Current map modification times, maintain consistency */transient int modcount; /** * VALUE (capacity * load factor), whether reSize of the judging threshold * */INT threshold;    /** * Map Loading Factor * * @serial */final float loadfactor; }

It has four constructors, as follows:

        HashMap ()        HashMap (int initialcapacity)        HashMap (int initialcapacity, float loadfactor)        HashMap (map<? Extends K,? Extends v> m)

If you do not pass in initialcapacity or Loadfactor, the default 1<< 4 is 16 and 0.75 two parameters are used.

When Initialcapacity is passed in, the value is not used directly, but the power value greater than his minimum of 2 is calculated. In jdk1.8 relative to the previous version was improved by:

JDK 1.8    Static final int tablesizefor (int cap) {        int n = cap-1;        n |= n >>> 1;        N |= n >>> 2;        N |= n >>> 4;        N |= n >>> 8;        n |= n >>>;        Return (n < 0)? 1: (n >= maximum_capacity)? Maximum_capacity:n + 1;    } JDK 1.7    while (capacity < initialcapacity)        capacity <<= 1;

Every time I think of these two code is very excited, the code is beautiful, but also will produce a question, when we are learning data structure, the book teaches us that the size of the hash table should be set to be greater than the specified value of the minimum prime number, and this obviously does not conform to the description, in fact, this is for the subsequent many operations to provide

After this operation n is actually a greater than the specified value of the minimum 2 of the power value and less 1, if the value of the 9,n is 15, single-looking decimal seems to be nothing, but into 2 is 1111, all 1 words, can be very good to do a lot of operations.

Calculate the size, allocate space, we can do the storage work.

Final V putval (int hash, K key, V value, Boolean onlyifabsent, Boolean evict) {node<k,v>[] tab Node<k,v> p;        int n, I; if (tab = table) = = NULL | |        (n = tab.length) = = 0) n = (tab = resize ()). length;        if (p = tab[i = (n-1) & hash]) = = null) tab[i] = NewNode (hash, key, value, NULL); else {node<k,v> E;            K K; if (P.hash = = Hash && (k = p.key) = = Key | |                (Key! = null && key.equals (k))))            e = p;            else if (P instanceof TreeNode) e = ((treenode<k,v>) p). puttreeval (This, tab, hash, key, value);                        else {for (int bincount = 0;; ++bincount) {if ((E = p.next) = = null) {                        P.next = NewNode (hash, key, value, NULL);      if (bincount >= treeify_threshold-1)//-1 for 1st treeifybin (tab, hash);                  Break } if (E.hash = = Hash && (k = e.key) = = Key | |                        (Key! = null && key.equals (k))))                    Break                p = e;                }} if (E! = null) {//Existing mapping for key V oldValue = E.value;                if (!onlyifabsent | | oldValue = = NULL) E.value = value;                Afternodeaccess (e);            return oldValue;        }} ++modcount;        if (++size > Threshold) resize ();        Afternodeinsertion (evict); return null;}

We throw aside the treexxx, just look at the operation of the table, we can know that the node in the table index is actually through the hash & (n-1), where n is actually table[] length. The hash calculation is obtained by calling the following methods:

Static final int hash (Object key) {        int h;        return (key = = null)? 0: (H = key.hashcode ()) ^ (h >>> 16);}

It ensures that the Hashcode high level is available at the time of calculation, so that the deposit element is as scattered as possible, and it can also be observed that the index value of the element exists is related to key.

From the overall point of view, when the value passed in, through the incoming hash value with Capacity-1 & operation of the index I, by accessing P = table[i], determine whether p is empty or if the p.key is equal to key, when empty, insert the element directly; when P.key = = key , update value, when not equal, inserts the specified node in the last face of the list.

The NewNode code is as follows:

    node<k,v> newNode (int hash, K key, V value, node<k,v> next) {        return new node<> (hash, key, value, NE XT);    }

There is an operation when we go to the interview, the interviewer may be very interested, resize ();

Before we go into the code, let's talk about its main operations.

Regardless of extreme circumstances, when the previous capacity is greater than 0 o'clock, the new capacity will become twice times the previous, and the threshold will also increase, when the previous capacity is less than 0, but the threshold is greater than 0, the new capacity is assigned the threshold value The capacity and threshold values are equal to 0 o'clock and are assigned their default values. Capacity requirements, you need to put the key value pairs in the previous map into the new container table, because it is determined by the calculation of the hash value index, so it is not destined to copy directly like List, The index is then recalculated based on the hash and capacity-1 stored in each entry. This is what we all think of. But by flipping through the source code, we can find the subtleties of it. Let's just attach the code.

    Final node<k,v>[] Resize () {node<k,v>[] oldtab = table; int Oldcap = (Oldtab = = null)?        0:oldtab.length;        int oldthr = threshold;        int Newcap, newthr = 0;                if (Oldcap > 0) {if (Oldcap >= maximum_capacity) {threshold = Integer.max_value;            return oldtab; } else if ((Newcap = oldcap << 1) < maximum_capacity && Oldcap >= Defau lt_initial_capacity) Newthr = oldthr << 1; Double threshold} else if (Oldthr > 0)//initial capacity is placed in threshold Newcap        = Oldthr;            else {//Zero initial threshold signifies using defaults newcap = default_initial_capacity;        NEWTHR = (int) (Default_load_factor * default_initial_capacity);            } if (Newthr = = 0) {Float ft = (float) newcap * loadfactor; Newthr = (Newcap < Maximum_capacity && ft < (float) maximum_capacity?        (int) Ft:Integer.MAX_VALUE);        } threshold = Newthr;        @SuppressWarnings ({"Rawtypes", "Unchecked"}) node<k,v>[] NewTab = (node<k,v>[]) new NODE[NEWCAP];        Table = NewTab;                if (oldtab! = null) {for (int j = 0; j < Oldcap; ++j) {node<k,v> e;                    if ((e = oldtab[j])! = null) {OLDTAB[J] = null;                    if (E.next = = null) Newtab[e.hash & (newCap-1)] = e;                    else if (e instanceof TreeNode) ((treenode<k,v>) e). Split (This, NewTab, J, Oldcap);                        else {//preserve order node<k,v> Lohead = null, lotail = NULL;                        Node<k,v> hihead = null, hitail = NULL;                        Node<k,v> Next;                      do {      Next = E.next; if ((E.hash & oldcap) = = 0) {if (Lotail = = null) lo                                Head = e;                                else Lotail.next = e;                            Lotail = e;                                    } else {if (Hitail = = null)                                Hihead = e;                                else Hitail.next = e;                            Hitail = e;                        }} while ((e = next) = null);                            if (lotail! = null) {lotail.next = null;                        NEWTAB[J] = Lohead;                            } if (Hitail! = null) {hitail.next = null;    Newtab[j + oldcap] = Hihead;                    }}}}} return newTab; }

After reading the code, let's say it's subtle, when a number expands to twice times before, such as 8 and 16, they subtract one of the worth of binary code 111---1111; you can see an increase of 1000, which is the size of the original capacity. When there is a hash value and the capacity minus one after the increase of 15 is carried out and the operation, actually with the previous capacity of the index of 8 o'clock is changed to 1000 and the bit, the capacity is enlarged to twice times before, can be seen as the addition of a container as large as the previous container. It is obvious that the index values are small and placed in front of the container, and placed large behind. So we only need to calculate the hash and the value of the previous capacity and the result of the operation is 1, we can know whether the original value is placed in place or in the new container corresponding to the position. Just looking at the text is a bit hard to understand, let's look at a picture:

If a node needs to be reassigned, it can only be stored as the last two positions, when its hash & oldcapacity = = 1 o'clock, there is a new zone; otherwise in the old area. He also explained that the key-value pair is still in the fixed position or the new area corresponding to the fixed position, it is not possible to run in the old area of the old area of the other location.

Then look at the specific code:

First Look at if (E.next = = null), when its next is null, there is only one key value pair in this position, so can be inserted directly, when not NULL, in order to improve efficiency, each region has tail and head node, When tail! = is null, the new node is inserted directly after the tail, otherwise it is added to the head. Its hi and lo areas are like the new and old regions in our diagram.

From the above can be seen, each talk capacity expansion into 2 power series, is very beautiful design, which greatly improved efficiency.

Analyzing Java Collection "HashMap" from source code

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analyzing Java Collection "HashMap" from source code

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support