From the source code analysis java Collection [HashMap], the source code javahashmap

Last Update:2015-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As we know, Map stores key-value pairs. Its basic unit is to implement Node <K, V> of Map. Entry <K, V>. The Node attributes are as follows:

static class Node<K,V> implements Map.Entry<K,V> {        final int hash;        final K key;        V value;        Node<K,V> next;}

You can see its function by looking at the definition. It stores an object next to the next node. Therefore, we can probably think of its storage method stored by the linked list. In fact, by reading the information, we can also clearly understand that HashMap is actually a linked list array. Its structure is as follows:

After seeing this, we should be able to see the source code more clearly.

Public class HashMap <K, V> extends AbstractMap <K, V> implements Map <K, V>, Cloneable, default initial capacity of Serializable {/** HashMap **/static final int DEFAULT_INITIAL_CAPACITY = 1 <4; // aka 16/*** Maximum capacity of HashMap, if the storage space is 1 <30, there must be 2 ^ 30 object spaces pointing to the specified heap * space in the stack. On a 32-bit machine, each of these object spaces is 4 bytes; that is to say, all * memory space will be used to store Map, and the jvm heap size and stack size are limited, generally, * it is difficult to reach this large */static final int MAXIMUM_CAPACITY = 1 <30;/*** default loading Factor */static final float DEFAULT_LOAD_FACTOR = 0.75f; /*** linked list array. Its Index will be described in detail later */transient Node <K, V> [] table;/*** use keySet () and values () */transient Set <Map. entry <K, V> entrySet;/*** stores the number of key-value pairs */transient int size;/*** number of changes to the current Map, maintain consistency */transient int modCount;/*** the value is (capacity * load factor) and whether to determine the threshold value of resize **/int threshold; /*** Map loading Factor ** @ serial */final float loadFactor ;}

It has four constructors:

        HashMap()        HashMap(int initialCapacity)        HashMap(int initialCapacity, float loadFactor)        HashMap(Map<? extends K,? extends V> m)

If initialCapacity or loadFactor is not input, the default 1 <4 is the 16 and 0.75 parameters.

When initialCapacity is passed in, this value is not directly used, but a power value greater than its minimum 2 is calculated. Compared with the previous version, jdk1.8 is improved:

//jdk 1.8    static final int tableSizeFor(int cap) {        int n = cap - 1;        n |= n >>> 1;        n |= n >>> 2;        n |= n >>> 4;        n |= n >>> 8;        n |= n >>> 16;        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;    }//jdk 1.7    while (capacity < initialCapacity)        capacity <<= 1;

Every time we think of these two codes, we are very excited and the code is very beautiful, but there will also be a question. When we are learning the data structure, in the book, the size of the hash table should be set to a value greater than the specified minimum prime number, which obviously does not comply with the description. In fact, this provides convenience for many subsequent operations.

After this operation, n is actually a power value greater than the specified value of least 2 and then 1 less. If the input value is 9, the value of n is 15. It seems that there is nothing in decimal, however, a lot of operations can be well performed if the binary system is 1111 and all is 1.

After calculating the size, we can store the allocated space.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,                   boolean evict) {        Node<K,V>[] tab; Node<K,V> p; int n, i;        if ((tab = table) == null || (n = tab.length) == 0)            n = (tab = resize()).length;        if ((p = tab[i = (n - 1) & hash]) == null)            tab[i] = newNode(hash, key, value, null);        else {            Node<K,V> e; K k;            if (p.hash == hash &&                ((k = p.key) == key || (key != null && key.equals(k))))                e = p;            else if (p instanceof TreeNode)                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);            else {                for (int binCount = 0; ; ++binCount) {                    if ((e = p.next) == null) {                        p.next = newNode(hash, key, value, null);                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st                            treeifyBin(tab, hash);                        break;                    }                    if (e.hash == hash &&                        ((k = e.key) == key || (key != null && key.equals(k))))                        break;                    p = e;                }            }            if (e != null) { // existing mapping for key                V oldValue = e.value;                if (!onlyIfAbsent || oldValue == null)                    e.value = value;                afterNodeAccess(e);                return oldValue;            }        }        ++modCount;        if (++size > threshold)            resize();        afterNodeInsertion(evict);        return null;}

Let's leave TreeXXX alone. We only look at the operation on the table. We can know that the index of the node in the table is actually through hash & (n-1 ), n is actually the length of table. The hash calculation is obtained by calling the following methods:

static final int hash(Object key) {        int h;        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);}

It ensures the high availability of hashCode during computing, so that the stored elements can be scattered as much as possible. It can also be observed that the sum of the index values of the elements is related to the key.

In general, when the input value is passed in, the index I is obtained through the input hash value and capacity-1, and the access to p = table [I], determine whether p is null or p. whether the key is equal to the key. If it is null, the element is directly inserted. When p. when key = key, update the value. If they are not equal, insert the specified node at the end of the linked list.

The newNode code is as follows:

    Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {        return new Node<>(hash, key, value, next);    }

When we go to the interview, the interviewer may be very interested, resize ();

Before running the code, let's talk about its main operations.

In extreme cases, when the previous capacity is greater than 0, the new capacity will be twice the previous one, and the threshold value will also increase; when the previous capacity is less than 0, but the threshold value is greater than 0, the new capacity is assigned a threshold value. When the capacity and the threshold value are equal to 0, they are assigned a default value. After capacity is defined, you need to store the key-value pairs in the previous Map to the new container table, because the index is determined by calculating the hash value, therefore, it is doomed that you cannot directly copy data like a List. Therefore, you only need to re-calculate the index based on the hash and capacity-1 stored in each Entry. This is what we all think. However, by looking at the source code, we can find its subtlety. Add the code first.

    final Node<K,V>[] resize() {        Node<K,V>[] oldTab = table;        int oldCap = (oldTab == null) ? 0 : oldTab.length;        int oldThr = threshold;        int newCap, newThr = 0;        if (oldCap > 0) {            if (oldCap >= MAXIMUM_CAPACITY) {                threshold = Integer.MAX_VALUE;                return oldTab;            }            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&                     oldCap >= DEFAULT_INITIAL_CAPACITY)                newThr = oldThr << 1; // double threshold        }        else if (oldThr > 0) // initial capacity was placed in threshold            newCap = oldThr;        else {               // zero initial threshold signifies using defaults            newCap = DEFAULT_INITIAL_CAPACITY;            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);        }        if (newThr == 0) {            float ft = (float)newCap * loadFactor;            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?                      (int)ft : Integer.MAX_VALUE);        }        threshold = newThr;        @SuppressWarnings({"rawtypes","unchecked"})            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];        table = newTab;        if (oldTab != null) {            for (int j = 0; j < oldCap; ++j) {                Node<K,V> e;                if ((e = oldTab[j]) != null) {                    oldTab[j] = null;                    if (e.next == null)                        newTab[e.hash & (newCap - 1)] = e;                    else if (e instanceof TreeNode)                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);                    else { // preserve order                        Node<K,V> loHead = null, loTail = null;                        Node<K,V> hiHead = null, hiTail = null;                        Node<K,V> next;                        do {                            next = e.next;                            if ((e.hash & oldCap) == 0) {                                if (loTail == null)                                    loHead = e;                                else                                    loTail.next = e;                                loTail = e;                            }                            else {                                if (hiTail == null)                                    hiHead = e;                                else                                    hiTail.next = e;                                hiTail = e;                            }                        } while ((e = next) != null);                        if (loTail != null) {                            loTail.next = null;                            newTab[j] = loHead;                        }                        if (hiTail != null) {                            hiTail.next = null;                            newTab[j + oldCap] = hiHead;                        }                    }                }            }        }        return newTab;    }

After reading the code, we can say that it is exquisite. When a number is doubled, for example, 8-> 16, they are reduced by a binary code worth 111-> 1111; we can see that a 1000 increase is added, that is, the size of the original capacity. When there is a hash value and the capacity after the increase is reduced by 15 and the operation is performed, in fact, with the previous capacity of 8, the index is changed with the bit of 1000 and, when the capacity is doubled, it can be seen that a container is added as large as the previous container. Obviously, a small index value is placed in front of the container, and a large index value is placed in the back. Therefore, we only need to calculate the hash value and the previous capacity value and determine whether the calculation result is 1, so that we can know whether the original value is placed in the original position or in the corresponding position of the new container. Reading only text is a bit hard to understand. Let's look at an image:

If you need to re-allocate a node, it can only be stored in the above two locations. When its hash & oldCapacity = 1, there is a New region; otherwise, it is in the Old region. At the same time, it also shows that the key-value pair is still in a fixed position or a fixed position corresponding to the New area. It is impossible for the Old area to simply go to another location in the Old area.

Let's look at the specific code:

First, let's look at if (e. next = null). When its next value is null, it indicates that there is only one key-value pair at this position. Therefore, it can be directly inserted. If it is not null, to improve efficiency, each region has a tail and head node! = Null, insert the new node directly after tail; otherwise, add it to the head. Its hi and lo areas are like the New and Old areas in our figure.

From the above we can see that each time the capacity is expanded to the power series of 2, it is a very beautiful design, which greatly improves the efficiency.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

From the source code analysis java Collection [HashMap], the source code javahashmap

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

From the source code analysis java Collection [HashMap], the source code javahashmap

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support