Java Improvement article (two or three)-----HashMap detailed

Java Improvement article (two or three)-----HashMap detailed _java

Last Update:2017-01-18 Source: Internet

Author: User

Tags array length hash modulus static class

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HashMap is also our use of very many collection, it is based on the hash table Map interface implementation, in the form of Key-value. In HashMap, Key-value is always treated as a whole, the system will be based on the hash algorithm to calculate the storage location of Key-value, we can always quickly through the key to save, take value. The following is an analysis of hashmap access.

I. Definition

HASHMAP implements the map interface and inherits Abstractmap. The map interface defines the rules for key mapping to values, while the Abstractmap class provides the backbone implementation of the map interface to minimize the work required to implement this interface, in fact, the Abstractmap class has implemented the map, where the map LZ think should be more clear!

public class hashmap<k,v>
  extends abstractmap<k,v>
  implements Map<k,v>, Cloneable, Serializable

Second, the constructor

The HashMap provides three constructors:

HashMap (): Constructs an empty HashMap with a default initial capacity (16) and a default load factor (0.75).

HashMap (int initialcapacity): Constructs an empty HashMap with a specified initial capacity and a default load factor (0.75).

HashMap (int initialcapacity, float loadfactor): Constructs an empty HashMap with specified initial capacity and load factor.

Two parameters are mentioned here: initial capacity, load factor. These two parameters are important parameters that affect the performance of HashMap, where the capacity represents the number of buckets in the hash table, the initial capacity is the capacity to create a hashtable, and the load factor is a measure of how full a hashtable can be before its capacity automatically increases, which measures how much space is used for a Hashtable, The larger the load factor indicates the higher the reload of the hash table, the smaller the vice. For a hash table using the list method, the average time to find an element is O (1+a), so if the load factor is larger, the space is more fully utilized, but the result is a decrease in lookup efficiency, and if the load factor is too small, the data in the hash table will be too sparse to cause a serious waste of space. The system default load factor is 0.75, and in general we do not need to modify it.

HashMap is a data structure that supports fast access, and you must understand its data structure to understand its performance.

Third, data structure

We know that two of the most commonly used structures in Java are arrays and analog pointers (references), and almost all data structures can be combined to implement these two, as is the case with HashMap. In fact, HashMap is a "linked list hash", as follows is its data structure:

From the above we can see that the HashMap bottom implementation is an array, except that each item of the array is a chain. Where the parameter initialcapacity represents the length of the array. The following is the source code for the HashMap constructor:

  Public HashMap (int initialcapacity, float loadfactor) {
    //initial capacity cannot <0
    if (initialcapacity < 0)
      throw new IllegalArgumentException ("Illegal initial capacity:"
          + initialcapacity);
    Initial capacity cannot > maximum capacity value, HashMap maximum capacity value is 2^30
    if (initialcapacity > maximum_capacity)
      initialcapacity = maximum_ CAPACITY;
    Load factor cannot < 0
    if (loadfactor <= 0 | | Float.isnan (loadfactor))
      throw new IllegalArgumentException ("Illegal load factor:"
          + loadfactor);

    Calculates the smallest 2 n-th square value greater than the initialcapacity.
    int capacity = 1;
    while (capacity < initialcapacity)
      capacity <<= 1;
    
    This.loadfactor = Loadfactor;
    The capacity limit of the HashMap is set, when the capacity of the hashmap reaches that limit, the expansion operation is carried out
    threshold = (int) (capacity * loadfactor);
    Initialize table array
    table = new Entry[capacity];
    Init ();
  }

From the source, you can see that each time you create a new HashMap, you initialize an array of table. The elements of the table array are entry nodes.

 Static Class Entry<k,v> implements Map.entry<k,v> {
    final K key;
    V value;
    Entry<k,v> Next;
    final int hash;

    /**
     * Creates new entry.
     *
    /Entry (int h, K K, v V, entry<k,v> N) {
      value = V;
      Next = N;
      key = k;
      hash = h;
    }
    .......
  }

Where entry is the inner class of HashMap, it contains key keys, value values, next node next, and hash values, which is very important because entry is the list of items that form the table array.

With a simple analysis of the HASHMAP data structure, the following discusses how HASHMAP is implemented for fast access.

Iv. Storage implementation: put (Key,vlaue)

First we look at the source code

 public V-put (K key, V value) {//When key is NULL, invoke the Putfornullkey method, save null and table in the first position, this is ha
    Shmap allow null reason if (key = null) return Putfornullkey (value);         Calculates the hash value of the key int hash = hash (Key.hashcode ());       ------(1)//calculates the position of the key hash value in the table array int i = indexfor (hash, table.length);
      ------(2)//starts from I out iteration E, finds the location of the key save for (entry<k, v> e = table[i]; e!= null; e = e.next) {Object K; To determine if the chain has the same hash value (key same)//if there is the same, then directly overwrite value, return the old value if (E.hash = = Hash && (k = e.key) = key ||  Key.equals (k))) {V oldValue = E.value;
        Old value = new value E.value = value;
        E.recordaccess (this);   return oldValue;
    Return the old value}//change the number of times increase by 1 modcount++;
    Add key, value to I location addentry (hash, key, value, I);
  return null; }

Through the source we can see clearly hashmap the process of saving data is: first to determine whether the key is NULL, if NULL, then call the Putfornullkey method directly. If not NULL, first compute the hash value of the key, and then search the index in the table array based on the hash value, if the table array has elements at that position, then by comparing the existence of the same key, if the existence of the value of the original key, Otherwise, the element is saved in the chain header (the first saved element is placed at the end of the chain). If the table has no elements at that point, it is saved directly. This process seems to be relatively simple, in fact, deep inside. There are several points:

1, first look at the iteration. Here the reason for the iteration is to prevent the existence of the same key value, if the discovery of two hash value (key) is the same, hashmap processing is to replace the old value with the new value, there is no processing key, which explains that HashMap no two of the same key.

2, in the Look (1), (2) place. Here is the essence of HashMap. The first is the hash method, which is a pure mathematical calculation, which computes the hash value of H.

static int hash (int h) {
    h ^= (H >>>) ^ (h >>>);
    Return h ^ (H >>> 7) ^ (H >>> 4);
  }

We know that for HashMap table, the data distribution needs to be evenly distributed (it is best to have only one element per item, so you can find it directly), not too tight or too loose, too tight will lead to slow query speed, Taisong waste space. After calculating the hash value, how can the table element distribution be guaranteed? We will think of modulo, but because of the large consumption of the modulus, the hashmap is handled like this: Call the Indexfor method.

static int indexfor (int h, int length) {return
    H & (length-1);
  }

The HashMap's underlying array length is always 2 n-th and exists in the constructor: capacity <<= 1; This always guarantees that the HashMap's underlying array is 2 of the N-second party. When the N-second square of length is 2,,h& (length-1) is equivalent to the length modulo, and the speed is much faster than the direct modulus, which is an optimization of the hashmap speed. As for why it is 2 of the N-Times side explained below.

We return to the Indexfor method, which has only one statement:h& (LENGTH-1), which has a very important responsibility in addition to the above modulo operation: evenly distribute table data and make full use of space.

Here we assume length (2^n) and 15,h are 5, 6, 7.

When the n=15, 6 and 7 of the same result, so that they are stored in the table is the same location, that is, the collision, 6, 7 will be in a position to form a linked list, which will cause the query speed down. It is true that there are only three numbers here, so we'll look at 0-15.
From the chart above we saw that there were 8 of these collisions, and that there was a lot of wasted space, 1, 3, 5, 7, 9, 11, 13, 15, no records, that is, no data stored. This is because they are in the & operations with 14, the results of the last one is always 0, that is 0001, 0011, 0101, 0111, 1001, 1011, 1101, 1111 location is impossible to store data, less space, further increase the chance of collision, This can result in slow query speed. And when length = 16 o'clock, length–1 = 15 is 1111, then the low & operation, the value is always the same as the original hash value, while the high operation, the value is equal to its low value. So when length = 2^n, different hash value collision probability is relatively small, this will make the data in the table array distributed more evenly, the query speed is also faster.

Here we review the put process: when we want to add a pair of key-value to a hashmap, the system first calculates the hash value of the key and then confirms the location stored in the table based on the hash value. If the position has no elements, it is inserted directly. Otherwise, the hash value of the key is compared by iterating over the element linked list. If two hash values are equal and the key value is equal (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k)), the value of the original node is overwritten with the new entry value. If two hash values are equal but the key value is unequal, the node is inserted into the chain of the list. The specific implementation process see AddEntry method, as follows:

   void AddEntry (int hash, K key, V value, int bucketindex) {
    //Get Entry entry<k at Bucketindex
    , v> e = Table[buck Etindex];
    Place the newly created Entry into the Bucketindex index, and let the new Entry point to the original Entry 
    Table[bucketindex] = to new entry<k, v> (hash, key, value, E );
    If the number of elements in the hashmap exceeds the limit, the capacity is enlarged twice times
    if (size++ >= threshold)
      Resize (2 * table.length);
  }

There are two points to note in this approach:

one is the production of the chain. This is a very elegant design. The system always adds a new entry object to the Bucketindex. If an object is already in place at Bucketindex, the newly added entry object will point to the original entry object, forming a entry chain, but if Bucketindex does not have a entry object, that is, E==null, The newly added entry object points to null and does not produce a entry chain.

Second, the issue of expansion.

With the number of elements in the HashMap more and more, the probability of collision will become more and more, the resulting chain table length will become longer, this will inevitably affect the speed of hashmap, in order to ensure the efficiency of HASHMAP, the system must be at a certain critical point of expansion processing. The critical point in the HashMap is equal to the number of elements in the table array length * load factor. But expansion is a time-consuming process because it needs to recalculate the location of the data in the new table array and replicate it. So if we have predicted the number of elements in HashMap, then the number of preset elements can effectively improve the performance of HashMap.

V. Read implementation: Get (Key)

Compared to the hashmap, it is simpler to take. The hash value of the key is used to find the entry at the index in the table array, and then the value corresponding to the key is returned.

 Public V get (Object key) {
    //If NULL, invoke the Getfornullkey method returns the corresponding value
    if (key = = null) return
      Getfornullkey (); C4/>//calculates its hash code 
    int hash = hash (Key.hashcode ()) According to the hashcode value of the key;
    Removes the value for the specified index at the table array for
    (entry<k, v> e = table[indexfor (hash, table.length)]; e!= null; e = e.next) {
      Ob Ject K;
      If the search key is the same as the lookup key, returns the corresponding value
      if (E.hash = Hash && (k = e.key) = = Key | | key.equals (k))) return
        E.va Lue;
    }
    return null;
  }

Here can be based on key fast to the value in addition to and HASHMAP data structure inseparable, but also with entry has a great relationship, in the previous mentioned, HashMap in the stored procedure does not separate the Key,value to store, But as a whole key-value to deal with, this whole is entry object. At the same time value is only equivalent to the key of the subordinate. In the process of storage, the system decides the storage position of entry in the table array according to the hashcode of key, and takes out the corresponding entry object according to the hashcode of key in the process.

Original link: http://www.cnblogs.com/chenssy/p/3521565.html

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java Improvement article (two or three)-----HashMap detailed _java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java Improvement article (two or three)-----HashMap detailed _java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support