Java 7 source code analysis part 1-Implementation of Map sets

Last Update:2014-01-06 Source: Internet

Author: User

Tags concurrentmodificationexception

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Maybe we often pay attention to the characteristics of each Set in practical applications or interviews. For example, the Set cannot store duplicate elements, and cannot maintain the insertion and size order, map sets store key-value pairs and so on. However, if you want to learn Java well or have a higher goal and be proficient in Java, it is far from enough. In fact, we still need to know more about the collection.

Let's first look at the implementation of Map sets rather than Set sets. Because Map and Set sets are very similar in implementation, many methods in Set are implemented by calling methods in Map. Why? Because Map can be said to be a Set, Set sets are not repeated and unordered, and keys in Map also have this feature. When we regard the value in the Map as a subsidiary of the key, all the keys obtained can form a Set.

Let's take a look at the class diagrams implemented by the two:

Set framework

Map set framework

Let's take a look at the Map interface. The source code is as follows:

Public interface Map
 
  
{// Query Operations int size (); boolean isEmpty (); boolean containsKey (Object key); boolean containsValue (Object value); V get (Object key ); // Modification Operations V put (K key, V value); V remove (Object key ); // Bulk Operations/* The behavior of this operation is undefined if the specified map is modified while the operation is in progress. */void putAll (Map
  M); void clear (); // Views Set
  
   
KeySet (); // because the keys of the Map Set cannot be repeated and there is no order between keys, all keys in the Map Set can constitute a Set Collection.
   
    
Values (); Set
    
     
> EntrySet (); interface Entry
     
      
{K getKey (); V getValue (); V setValue (V value); boolean equals (Object o); int hashCode ();} // Comparison and hashing boolean equals (Object o); int hashCode ();}

The interface has a values method. By calling this method, all values in the Map set can be returned. There is a keySet () method, and the key values in all maps can be obtained after the call; call the entrySet () method to obtain all key-value pairs in the Map and store them in the form of a Set. To better represent the key-value, an Entry Interface, and defines some key and value operations in this interface.

Public class HashMap
 
  
Extends AbstractMap
  
   
Implements Map
   
    
, Cloneable, Serializable {// The default initial capacity-MUST be a power of two. static final int DEFAULT_INITIAL_CAPACITY = 16; static final int MAXIMUM_CAPACITY = 1 <30; static final float DEFAULT_LOAD_FACTOR = 0.75f; // specify The load factor // The table, resized as necessary. length MUST Always be a power of two. // use the Entry array to store the Key-Value. Similar to ArrayList, Object [] is used to store the set element transient Entry [] table; transient int size; // The next size value at which to resize (capacity * load factor ). // The maximum int threshold of mapping supported by HashMap;/** load factor: */final float loadFactor; transient int modCount ;}

Some important variables are defined as above. The loadFactor is the load factor. Increasing the value can reduce the memory space occupied by the Hash table (that is, the Entry array, however, it will increase the time overhead for data query, and query is the most frequent operation. Reducing the value will improve the performance of data query, but will increase the memory space occupied by the Hash table, therefore, the default value is 0.75.
Threshold indicates the key-value Pair limit that HashMap can accommodate. If the storage size is greater than threshold, it needs to be expanded.
Hashmap provides several constructors:

// The actual capacity of Jian HashMap must be greater than or equal to initialCapacity. When this value is equal to the Npower of 2, it is exactly equal to public HashMap (int initialCapacity, float loadFactor) {if (initialCapacity <0) throw new capacity ("Illegal initial capacity:" + initialCapacity); if (initialCapacity> MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 | Float. isNaN (loadFactor) throw new IllegalArgumentException ("Illegal load factor:" + loadFactor); // Find a power of 2> = initialCapacity int capacity = 1; while (capacity <initialCapacity) // calculate the n value capacity of 2, which is greater than initialCapacity <= 1; this. loadFactor = loadFactor; threshold = (int) (capacity * loadFactor); // set the capacity limit table = new Entry [capacity]; init ();} public HashMap (int initialCapacity) {this (initialCapacity, DEFAULT_LOAD_FACTOR);} public HashMap () {this. loadFactor = DEFAULT_LOAD_FACTOR; threshold = (int) (DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR); table = new Entry [DEFAULT_INITIAL_CAPACITY]; init ();} public HashMap (Map
 M) {this (Math. max (int) (m. size ()/DEFAULT_LOAD_FACTOR) + 1, DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR); putAllForCreate (m );}

From the first constructor, we can see that the actual capacity is generally greater than the initialCapacity we specified, unless initialCapacity is exactly the n value of 2. Next, let's talk about the implementation principle of HashMap. This class starts processing and defines an array of transient Entry []. The implementation of this Entry is as follows:

private static class Entry
 
   implements Map.Entry
  
    {        int hash;        K key;        V value;        Entry
   
     next;        protected Entry(int hash, K key, V value, Entry
    
      next) {            this.hash = hash;            this.key = key;            this.value = value;            this.next = next;        }        protected Object clone() {            return new Entry<>(hash, key, value,(next==null ? null : (Entry
     
      ) next.clone()));        }        // Map.Entry Ops        public K getKey() {            return key;        }        public V getValue() {            return value;        }        public V setValue(V value) {            if (value == null)                throw new NullPointerException();            V oldValue = this.value;            this.value = value;            return oldValue;        }        public boolean equals(Object o) {            if (!(o instanceof Map.Entry))                return false;            Map.Entry e = (Map.Entry)o;            return (key==null ? e.getKey()==null : key.equals(e.getKey())) &&               (value==null ? e.getValue()==null : value.equals(e.getValue()));        }        public int hashCode() {            return hash ^ (value==null ? 0 : value.hashCode());        }        public String toString() {            return key.toString()+"="+value.toString();        }    }

After learning about the basic structure of key-value storage, you can consider how to store it. As its name implies, HashMap uses a hash table for storage. To solve conflicts, the hash table uses the open address method and the link address method to solve the problem. In Java, HashMap uses the link address method. The link address method is simply a combination of arrays and linked lists. Each array element has a linked list structure. when the data is hashed, the array subscript is obtained and the data is placed on the linked list corresponding to the subscript element. When the program tries to put multiple key-values into HashMap, take the following code snippet as an example:

HashMap
 
  
Map = new HashMap
  
   
(); Map. put ("language", 80.0); map. put ("Mathematics", 89.0); map. put ("English", 78.2 );

HashMap uses a so-called "Hash algorithm" to determine the storage location of each element.
When the program executes map. put ("", 80.0); The system calls the hashCode () method of "" to obtain its hashCode value-each Java object has a hashCode () method, you can obtain its hashCode value through this method. After obtaining the hashCode value of this object, the system determines the storage location of the element based on the hashCode value.

Let's look at the source code of the put (K key, V value) method of the HashMap class:

// When mapping is added to a HashMap, The hashCode value of the key determines the storage location of the Entry object. When the hashCode of the two keys is the same, // compare them using the equals () method, return false to generate the Entry chain. if true, overwrite the public V put (K key, V value) {if (key = null) return putForNullKey (value ); int hash = hash (key. hashCode (); int I = indexFor (hash, table. length); for (Entry
 
  
E = table [I]; e! = Null; e = e. next) {Object k; if (e. hash = hash & (k = e. key) = key | key. equals (k) {V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue;} modCount ++; addEntry (hash, key, value, I); return null ;}

When the system decides to store the key-value Pair in HashMap, the value in the Entry is not considered at all. It only calculates and determines the storage location of each Entry based on the key. This also illustrates the previous conclusion: we can regard the value in the Map set as a subsidiary of the key. When the system determines the storage location of the key, the value will be saved there.
The preceding method provides a method for calculating the Hash code based on the return value of hashCode (): The indexFor () method is called to calculate the index of the object that should be stored in the table array. The source code is as follows:

static int hash(int h) {        // This function ensures that hashCodes that differ only by        // constant multiples at each bit position have a bounded        // number of collisions (approximately 8 at default load factor).        h ^= (h >>> 20) ^ (h >>> 12);        return h ^ (h >>> 7) ^ (h >>> 4);    }    static int indexFor(int h, int length) {        return h & (length-1);    }

Length is table. length, so the length of the array is always the n power of 2. After h & (length-1) is used, the calculated index value must be within the index of the table array, this is not always the case for computed functions. A good computed function should also make the index values evenly distributed to the array as much as possible.

When the put () method is called to add a key-value pair to HashMap, the return value of its key hashCode () determines the storage location of the key-value Pair (that is, the Entry object. When the hashCode () return values of the keys of the two Entry objects are the same, the overwrite behavior is determined by the key comparison value through eqauls () (true is returned ), or generate an Entry chain (return false), and the newly added Entry is in the header of the Entry chain. This is done by calling the addEntry () method. The source code is as follows:

Void addEntry (int hash, K key, V value, int bucketIndex) {Entry
 
  
E = table [bucketIndex]; // obtain the Entry table [bucketIndex] = new Entry at the specified bucketIndex.
  
   
(Hash, key, value, e); // place the newly created Entry to the bucketIndex index, and point the new Entry to the original Entry // if the number of key-value pairs in the Map exceeds the limit if (size ++> = threshold) resize (2 * table. length); // extend the table object length to 2 times}

The program always places the newly added Entry object to the bucketIndex index of the table array. If an Entry object already exists at the bucketIndex, the newly added Entry object points to the original Entry object (which generates an Entry chain). If no Entry object exists at the bucketIndex, the e variable is null, that is, the newly added Entry object points to null, that is, no Entry chain is generated.

Next, let's take a look at how HashMap is read. The source code of the get () method is as follows:

Public V get (Object key) {if (key = null) return getForNullKey (); int hash = hash (key. hashCode (); // search for the next Entry of the Entry chain. When multiple Entry chains exist, they must be traversed sequentially, reducing the indexing speed. // If the Entry chain is too long, it indicates that "Hash" conflicts occur frequently, and a new algorithm or a larger space for (Entry
 
  
E = table [indexFor (hash, table. length)]; e! = Null; e = e. next) {Object k; if (e. hash = hash & (k = e. key) = key | key. equals (k) return e. value;} return null ;}

When the Entry stored in each bucket of HashMap is only a single Entry, HashMap has the best performance: when the program extracts the corresponding value through the key, it only needs to calculate the hashCode () of the key first () return value: Find the index of the key in the table Array Based on the returned value of the hashCode, and then traverse it cyclically to find the value with the same hash value and the same key value.

The following describes how HashMap implements the following three methods. The source code is as follows:

 public Set
 
   keySet() {        Set
  
    ks = keySet;        return (ks != null ? ks : (keySet = new KeySet()));    }    public Collection
   
     values() {        Collection
    
      vs = values;        return (vs != null ? vs : (values = new Values()));    }    public Set
     
      > entrySet() {        return entrySet0();    }    private Set
      
       > entrySet0() { Set
       
        > es = entrySet; return es != null ? es : (entrySet = new EntrySet()); }

The KeySet, Values, and EntrySet private class instances are obtained respectively. How do they retrieve these Values from HashMap? In fact, a lot of classes and methods are involved here. The approximate framework is as follows:

<喎?http: www.bkjia.com kf ware vc " target="_blank" class="keylink"> VcD4KPHA + signature + CjxwcmUgY2xhc3M9 "brush: java;"> private abstract class HashIterator Implements Iterator {Entry Next; // next entry to return int expectedModCount; // For fast-fail int index; // current slot Entry Current; // current entry HashIterator () {expectedModCount = modCount; if (size> 0) {// advance to first entry [] t = table; while (index <t. length & (next = t [index ++]) = null); // point the index to the location where the first table is not null} public final boolean hasNext () {return next! = Null;} final Entry NextEntry () {// traverse the Entry chain if (modCount! = ExpectedModCount) throw new ConcurrentModificationException (); Entry E = next; if (e = null) throw new NoSuchElementException (); if (next = e. next) = null) {Entry [] t = table; while (index <t. length & (next = t [index ++]) = null);} current = e; return e;} public void remove () {if (current = null) throw new IllegalStateException (); if (modCount! = ExpectedModCount) throw new ConcurrentModificationException (); Object k = current. key; current = null; HashMap. this. removeEntryForKey (k); expectedModCount = modCount ;}}

Like ArrayList, after obtaining the Iterator object of HashMap, you cannot use ArrayList to add or delete it. Otherwise, an exception occurs. Let's take a look at several important variables.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More