Java's hashmap underlying data structure

Last Update:2016-03-30 Source: Internet

Author: User

Tags modulus

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HashMap is also the implementation of a hash table-based MAP interface that we use very much collection, which exists in the form of Key-value. In HashMap, Key-value is always treated as a whole, and the system calculates the storage location of Key-value based on the hash algorithm, and we can always save and fetch value quickly by key. The following is an analysis of hashmap access.

First, the definition

HASHMAP implements the map interface and inherits the Abstractmap. Where the map interface defines the key mapping to the value of the rules, and the Abstractmap class provides the backbone of the map interface implementation to minimize the implementation of this interface required work, in fact, the Abstractmap class has implemented a map, here the map LZ think it should be more clear it!

 Public classHashmap<k,v>extendsAbstractmap<k,v>ImplementsMap<k,v>, cloneable, serializable{/*** The default initial capacity-must be a power of. */    Static Final intdefault_initial_capacity = 1 << 4;//aka    /*** The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with a     Rguments.     * Must be a power of <= 1<<30. */    Static Final intmaximum_capacity = 1 << 30; /*** The load factor used when none specified in constructor. */    Static Final floatDefault_load_factor = 0.75f; /*** An empty table instance to share when the table is not inflated. */    Static Finalentry<?,? >[] empty_table = {}; /*** The table, resized as necessary.     Length must always be a power of. */    transiententry<k,v>[] Table = (entry<k,v>[]) empty_table; /*** The number of key-value mappings contained in this map. */    transient intsize; /*** The next size value at which to resize (capacity * load factor). * @serial     */    //If Table = = Empty_table Then the initial capacity at which the//table'll is created when inflated.    intthreshold; /*** The load factor for the hash table. *     * @serial     */    Final floatLoadfactor; /*** The number of times this HashMap have been structurally modified * Structural modifications is those that C  Hange the number of mappings in * the HASHMAP or otherwise modify its internal structure (e.g., * rehash).  This field was used to make iterators on Collection-views of * the HashMap fail-fast.     (see Concurrentmodificationexception). */    transient intModcount; /*** The default threshold of map capacity above which alternative hashing is * used for String keys.     Alternative hashing reduces the incidence of * collisions due to weak hash code calculation for String keys. * <p/> * This value is overridden by defining the system property * {@codeJdk.map.althashing.threshold}. A property value of {@code1} * Forces alternative hashing to being used at all times whereas * {@code-1} value ensures this alternative hashing is never used. */    Static Final intAlternative_hashing_threshold_default =Integer.max_value;}

Second, the structure function

The HashMap provides three constructors:

HashMap (): Constructs an empty HashMap with the default initial capacity (16) and the default load factor (0.75).

HashMap (int initialcapacity): Constructs an empty HashMap with the specified initial capacity and default load factor (0.75).

HashMap (int initialcapacity, float loadfactor): Constructs an empty HashMap with the specified initial capacity and load factor.

Two parameters are mentioned here: initial capacity, load factor. These two parameters are important parameters that affect the performance of the HashMap, where capacity represents the number of buckets in the hash table, the initial capacity is the capacity to create a hashtable, and the load factor is the size of the hash table before its capacity is automatically increased by a scale, which measures how much space is used for a hash list, The larger the load factor, the higher the filling of the hash table, and the smaller the inverse. For a hash table using the list method, the average time to find an element is O (1+a), so if the load factor is larger, the use of space is more adequate, but the result is a reduction of the search efficiency, if the load factor is too small, then the hash table data will be too sparse, the space caused a serious waste. The system default load factor is 0.75, and we don't need to modify it in general.

HashMap is a data structure that supports fast access, and it is important to understand its data structure in order to understand its performance.

Third, data structure

We know that the two most commonly used structures in Java are arrays and analog pointers (references), and almost all data structures can be combined using both, HashMap. In fact, HashMap is a "chain table hash", which is the following data structure:

From what we can see is the HashMap bottom implementation or an array, except that each item of the array is a chain. Where the parameter initialcapacity represents the length of the array. The following is the source code for the HashMap constructor:

  PublicHashMap (intInitialcapacity,floatloadfactor) {        if(Initialcapacity < 0)            Throw NewIllegalArgumentException ("Illegal initial capacity:" +initialcapacity); if(Initialcapacity >maximum_capacity) initialcapacity=maximum_capacity; if(loadfactor <= 0 | |Float.isnan (loadfactor))Throw NewIllegalArgumentException ("Illegal load factor:" +loadfactor);  This. Loadfactor =Loadfactor; Threshold=initialcapacity;    Init (); }

As can be seen from the source code, each time a new HashMap is created, a table array is initialized. The elements of the table array are entry nodes.

Static class Implements Map.entry<k,v> {        final  K key;        V value;        Entry<K,V> next;         int Hash;}

Where entry is the inner class of HashMap, it contains key keys, value values, next node next, and the hash value, which is very important, precisely because the entry constitutes the table array of items as a linked list.

The data structure of HashMap is analyzed briefly, and the following will discuss how HASHMAP can achieve fast access.

Iv. Storage implementation: put (Key,vlaue)

First we look at the source code

 public V put (K key, V value) {///when key is NULL, call the Putfornullkey method, save null with table in the first position, this is the reason that HashMap allows null if        (key = = null) return Putfornullkey (value);                  Computes the hash value of the key int hash = hash (Key.hashcode ());             ------(1)//calculates the position of the key hash value in the table array int i = indexfor (hash, table.length);            ------(2)//start iteration e from I, find the location where key is saved for (entry<k, v> e = table[i]; E! = null; e = e.next) {            Object K;  Determine if the chain has the same hash value (key same)//if it exists the same, then directly overwrite value, return the old value if (E.hash = = Hash && (k = e.key) = = Key | |    Key.equals (k))) {V oldValue = E.value;                Old value = new value E.value = value;                E.recordaccess (this);     return oldValue;        return old Value}}//modification increased by 1 modcount++;        Add key and value to the I position addentry (hash, key, value, I);    return null; }

Through the source code we can clearly see that the process of HashMap saving data is: First determine whether the key is NULL, if NULL, call the Putfornullkey method directly. If not NULL, the hash value of the key is calculated first, and then the index position in the table array is searched based on the hash value, if the table array has elements at that position, by comparing the existence of the same key, if present, the value of the original key is overwritten, Otherwise, the element is saved in the chain header (the first saved element is placed at the end of the chain). If the table does not have an element at that point, it is saved directly. This process seems to be relatively simple, in fact, deep inside. There are several points:

1, first look at the iteration. The reason for this iteration is to prevent the existence of the same key value, if two hash value (key) is found, HashMap is handled by replacing the old value with the new value, which does not deal with the key, which explains that there are two identical keys in HashMap.

2, in view (1), (2) place. Here is the essence of HashMap. The first is the hash method, which is a purely mathematical calculation, which computes the hash value of H.

Final intHash (Object k) {inth =Hashseed; if(0! = h && kinstanceofString) {            returnSun.misc.Hashing.stringHash32 (String) k); } h^=K.hashcode (); //This function ensures, hashcodes that differ//constant multiples at each bit position has a bounded//Number of collisions (approximately 8 at default load factor).H ^= (H >>>) ^ (H >>> 12); returnH ^ (H >>> 7) ^ (H >>> 4); }

The underlying array length of the HashMap is always 2 N, which is present in the constructor: capacity <<= 1, which always guarantees that the underlying array of HashMap is 2 of the n-th square. When the length is 2 of the N-square time,h& (length-1) is equivalent to the length of the modulus, and the speed is much faster than the direct modulus, which is hashmap in the speed of an optimization. As for why is 2 of the N times explained below.

We go back to the Indexfor method, which has only one statement:h& (LENGTH-1), which has a very important responsibility in addition to the above modulo operation: evenly distributing table data and making full use of space.

Here we assume that length is (2^n) and 15,h is 5, 6, 7.

When n=15, the results of 6 and 7 are the same, which means that they are stored in the same location in table, that is, the collision, 6, 7 will be in a position to form a linked list, which will cause the query speed down. It is true that only three numbers are analyzed here, so we look at 0-15.

From the chart above we see a total of 8 this collision, and also found that the space is very large, there are 1, 3, 5, 7, 9, 11, 13, 15 are not recorded, that is, no data stored. This is because when they perform the & operation with 14, the last one will always be 0, i.e. 0001, 0011, 0101, 0111, 1001, 1011, 1101, 1111 position is impossible to store data, space is reduced, further increase collision probability, This causes the query to be slow. When the length = 16 o'clock, length–1 = 15 is 1111, then the low & operation, the value is always the same as the original hash value, while the high-level operation, its value equals its low value. Therefore, when the length = 2^n, different hash value collision probability is relatively small, which will make the data in the table array distribution is more uniform, the query speed is also relatively fast.

Here we'll review the put process: when we want to add a pair of key-value to a hashmap, the system first calculates the hash value of the key, and then confirms the location stored in the table based on the hash value. If the position has no elements, it is inserted directly. Otherwise, iterate over the list of elements and compare the hash value of its key accordingly. If the two hash values are equal and the key value is equal (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k)), the value of the original node is overwritten with the value of the new entry. If two hash values are equal but the key value is unequal, the node is inserted into the chain header of the linked list. The specific implementation process is shown in the AddEntry method, as follows:

void addentry (intint  bucketindex) {        if (size >= threshold) & & (null ! = Table[bucketindex])) {            Resize (2 * table.length)            ; = (null ! = key)? Hash (key): 0;             = Indexfor (hash, table.length);        }        Createentry (hash, key, value, Bucketindex);    }

There are two points to note in this method:

the first is the production of the chain. This is a very elegant design. The system always adds a new entry object to the Bucketindex. If the object is already in the Bucketindex, the newly added entry object will point to the original entry object, forming a entry chain, but if there is no Bucketindex object at entry, that is E==null, Then the newly added entry object points to null, and no entry chain is generated.

Second, the problem of expansion.

With the number of elements in the HashMap more and more, the probability of collision is more and more large, the resulting list length will be more and more long, this will inevitably affect the speed of hashmap, in order to ensure the efficiency of HASHMAP, the system must be at a certain point of expansion processing. The point at which the number of elements in the HashMap equals the table array length * load factor. But scaling is a very time-consuming process because it needs to recalculate the location of the data in the new table array and copy processing. So if we have predicted the number of elements in HashMap, then the number of preset elements can effectively improve the performance of HashMap.

V. Read implementation: Get (Key)

Relative to the hashmap of the deposit, take it seems relatively simple. Find the entry at the index in the table array by the hash value of the key, and return the value corresponding to the key.

 Public V get (Object key) {        ifnull)            return  getfornullkey ();        Entry<K,V> Entry = getentry (key);         return NULL NULL : Entry.getvalue ();    }

Final entry<k,v> getentry (Object key) {

if (size = = 0) {

return null;

}

int hash = (key = = null)? 0:hash (key);

for (entry<k,v> e = table[indexfor (hash, table. Length)];

E! = null;

e = E.next) {

Object K;

if (e.hash = = Hash &&

((k = E.key) = = Key | | (key! = null && key.equals (k) )))

return e;

}

return null;

}

Here can be quickly based on key to take value in addition to HASHMAP data structure inseparable, but also with the entry has a great relationship, mentioned earlier, hashmap in the stored procedure does not separate the Key,value to store, But as a whole key-value to deal with, this whole is entry object. At the same time, value is only equivalent to the attachment of key. In the process of storage, the system determines the storage position of entry in the table array according to the hashcode of key, and extracts the corresponding entry object according to the hashcode of key in the process of fetching.

Java's hashmap underlying data structure

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More