HashMap source Analysis (based on JDK1.6)

Source: Internet
Author: User

SOURCE Analysis

To analyze HashMap, first review the hash table in the data structure. What is a hash table? is a data structure that isaccessed directly from key code values (keys). That is, it accesses records by mapping key code values to a location in the table to speed up lookups. This mapping function is called a hash function, and the array that holds the record is called a hash table.
For example, if the keyword is k, its value is stored in the location of F (k) . As a result, the records can be obtained directly without comparison. The corresponding relationship F is called a hash function, and the table created by this thought is a hash list (hash table).

From this picture we can know:
1. The size of the hash table is 16
2. The hash function uses the remainder method, which is the key word to 16
3. Dealing with conflicts using the chain address method (Zipper method)

The following combines the structure of the hash table to analyze HashMap:

public   class  hashmap  <k , V  > extends  abstractmap  <k , v  > implements  map  <k , v ;, cloneable , serializable  { //.....  }

HashMap implements the map interface, which is introduced by Java1.2 , which abstracts the map to provide the method, and also contains a map.entry interface. This interface encapsulates the Key-value pair and sees it as a whole, so it Map<K, V> becomesSet<Entry<K, V>>

Then look at the Entryimplemented in HashMap :

    staticclass Entry<K,V> implements Map.Entry<K,V> {        final K key;        V value;        Entry<K,V> next;        finalint hash;    }

In the properties of this class, key, value, and so on are better understood. Also contains a self-type of next, is not very familiar with the thought of the data structure of the linked list? Yes, handling conflicts inHashMap is using the Zipper method. A hash of type int represents the entry of this value. Let's start with a general understanding of the structure.

Look at some of the important attributes in HashMap:

    /**     * 增长的哈希表,长度必须是2的N次方     */    transient Entry[] table;    /**     * map中存储的key-value对数     */    transientint size;    /**     * 临界值,即存储的key-value对的数量到达这个值时就要扩充了(等于size*loadFactor)     */    int threshold;    /**     * 负载因子(用于计算threshold)     */    finalfloat loadFactor;

After a general understanding, here's a look at the constructor:

    Public HashMap(intInitialcapacity,floatLoadfactor) {if(Initialcapacity <0)Throw NewIllegalArgumentException ("Illegal initial capacity:"+ initialcapacity);if(Initialcapacity > maximum_capacity) initialcapacity = maximum_capacity;if(Loadfactor <=0|| Float.isnan (Loadfactor))Throw NewIllegalArgumentException ("Illegal load factor:"+ Loadfactor);//Find a power of 2 >= initialcapacity        intCapacity =1; while(Capacity < initialcapacity) capacity <<=1; This. loadfactor = Loadfactor; Threshold = (int) (capacity * loadfactor); Table =NewEntry[capacity];    Init (); } Public HashMap(intinitialcapacity) { This(Initialcapacity, Default_load_factor); } Public HashMap() { This. loadfactor = Default_load_factor; Threshold = (int) (default_initial_capacity * default_load_factor); Table =NewEntry[default_initial_capacity];    Init (); }

For these three constructors, in general, only two parameters are required, one is the initialization size of the hash table, and the other is the load factor. By these two parameters can be calculated threshold, when the number of key-value pairs stored in HashMap reached threshold , then the hash table will be expanded, We will verify this when we analyze the put method.
The following is an analysis of the put method, which is one of the most important methods in HashMap :

   PublicVput(K key, Vvalue) {if(Key = =NULL)returnPutfornullkey (value);inthash = hash (Key.hashcode ());inti = indexfor (hash, table.length); for(entry<k,v> e = table[i]; E! =NULL; E = E.next) {Object k;if(E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {V OldValue = e.value; E.value=value; E.recordaccess ( This);returnOldValue;        }} modcount++; AddEntry (hash, key,value, i);return NULL; }

HashMap can exist in the null key or null value(this differs from HashTable ). So here we handle the case of key null alone--called the Putfornullkey () method:

    PrivateVPutfornullkey(Vvalue) { for(entry<k,v> e = table[0]; E! =NULL; E = e.next) {if(E.key = =NULL) {V OldValue = e.value; E.value=value; E.recordaccess ( This);returnOldValue;        }} modcount++; AddEntry (0,NULL,value,0);return NULL; }

Because null 's key cannot be hashed, it simply resets its hash value to 0 and saves it to the first bucket in the hash table. In If no data is stored in the bucket , the Null-value is saved directly as Entry . If there is already Entry , then determine if there is a null Entry , and if so, replace OldValue directly and return OldValue . How exactly is Entry added to the hash table? Let's take a look at the addentry () method:

    void addEntry(intvalueint bucketIndex) {        Entry<K,V> e = table[bucketIndex];        newvalue, e);        if (size++ >= threshold)            resize(2 * table.length);    }

First we get a list of the buckets in the previous hash table, and then create a new entry as a new linked header (that is, set next as the previous linked header), so the added entry will be in the position of the chain header. And each time after adding to determine whether the size reaches threshold, if the threshold value, then you need to expand (resize):

    void resize(int newCapacity) {        Entry[] oldTable = table;        int oldCapacity = oldTable.length;        if (oldCapacity == MAXIMUM_CAPACITY) {            threshold = Integer.MAX_VALUE;            return;        }        new Entry[newCapacity];        transfer(newTable);        table = newTable;        threshold = (int)(newCapacity * loadFactor);    }

If the hash table has reached the maximum capacity at this time: 1 << 30 then the method does not expand the hash table, just set the value of threshold to Integer.MAX_VALUE . As a general rule, create a hash table with the new capacity, and then copy the contents of the old hash table into the new table, specifying the new table as the hash table in the HashMap , and finally recalculating the threshold.
The new cousin conversion method is transfer ():

void Transfer (entry[] newtable) {entry[] src = table;intnewcapacity = Newtable.length; for(intj =0; J < Src.length; J + +) {entry<k,v> e = src[j];if(E! =NULL) {Src[j] =NULL; Do{entry<k,v>Next= e.Next;inti = indexfor (E.hash, newcapacity); E.Next= Newtable[i];                    Newtable[i] = e; E =Next; } while(E! =NULL); }        }    }

This includes a two-layer loop: The first layer is the traversal of each bucket in the table, and the second layer is the traversal of the linked list stored in each bucket . Of course, because the underlying hash table structure has changed, the original stored entry needs to recalculate the location of the bucket It should place, and then place it on the linked list header at that location. Here the indexfor () method is used to calculate the bucket position:

    int indexFor(intintlength) {        return h & (length-1);    }

For the little white that I do not understand bit arithmetic, it is impossible to understand the mystery for a time. Online said: When length=2^n , hashcode & (length-1) == hashcode % length . Of course, for the specific principle I still do not know, here and it first as a modulo operation, it is very well understood.

Just said Putfornullkey () method and then called a lot of methods, the following analysis of the general non-null key put, in fact, and Putfornullkey () similar:

   PublicVput(K key, Vvalue) {if(Key = =NULL)returnPutfornullkey (value);inthash = hash (Key.hashcode ());inti = indexfor (hash, table.length); for(entry<k,v> e = table[i]; E! =NULL; E = E.next) {Object k;if(E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {V OldValue = e.value; E.value=value; E.recordaccess ( This);returnOldValue;        }} modcount++; AddEntry (hash, key,value, i);return NULL; }

First the hash value of the key is computed, and then the key in the hash table bucket position, if there is a entry chain on the bucket, traverse the list to find out if there is already the entry of the key, if present, replace the old value with the new value, if it does not exist, call The addentry () method is added to the hash table.

As soon as the put method is analyzed, another important method, theget method, is analyzed below:

public  V get  (Object key) {if  (key = = null ) return  Getfornullkey ();        int  hash = hash (Key.hashcode ()); for  (entry<k,v> e = table[indexfor (hash, table.length)]; E! = null ; e = e.next)            {Object k; if                 (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k)))        return  e.value ;    } return  null ; }

Theget method is well understood after the put method is clear. As with put , first deal with the case of key null, theGetfornullkey () method, remember where the key was previously null stored? Yes, the first bucket on the hash table, so it should be to go through the list on the first bucket and find the entry where key is null:

    privategetForNullKey() {        for (Entry<K,V> e = table[0null; e = e.next) {            ifnull)                return e.value;        }        returnnull;    }

In the case of key non-null, first calculate the hash value of key, and then get the key in the hash table bucket position, the process is very fast and efficient, otherwise only through the traversal to find, performance imaginable. After finding the bucket, if there is only one entry in the bucket, then it is immediately found, which is also the best hashmap performance-there is only one entry on each bucket. But the reality is often not so ideal, when there are multiple entry on a bucket, you need to traverse the entry chain to match the key.

Performance analysis

Finished HashMap The core method, the following analysis of the performance of HashMap, review the above method, which method is the most time-consuming? That must be the resize method, because it wants to recreate a hash table and hash all the data on the previous hash tables into a new hashtable. As you can imagine, this resize will certainly consume a lot of performance as the size of the hashmap grows larger. And think about why we need resize? Because it is often necessary to resize, the data stored in the hash table is already many, the list may have been relatively long, for the performance of the lookup will have a more serious impact, based on this, the use of resize method, so that the load factor down.
Well, if we had known about HashMap storage, we could have specified a good initcapacity at initialization, so we could avoid a lot of resize calculations. For example, we have 1000 elements new HashMap(1024) , why not here new HashMap(1000) , remember the HashMap constructor? It guarantees that the underlying hash table is 2 of the length of the sub-square, which is the smallest of the 2 that is greater than the incoming initcapacity . But if you set it to 1000, you have to go into the while loop, so 1024 is more appropriate. But it new HashMap(1024) is not more appropriate, because 0.75 * 1000 < 1000, then call put will be resize, that is, in order to let 0.75 * size > 1000 , we must be new HashMap(2048) the most suitable, both considering the construction HASHMAP when the performance of the problem, also avoids the problem of resize .

HashMap source Analysis (based on JDK1.6)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.