Java Collection Source learning Note (iv) HASHMAP analysis

Source: Internet
Author: User

ArrayList, LinkedList and HashMap source code is to see together, horizontal contrast bar, feel to these three kinds of data structure understanding deepened a lot.>> arrays, linked lists, and hash table structures

The data structure has the array and the linked list to realize the storage of the information, the two have different application scenarios,
The characteristics of the array are: easy addressing, insertion and deletion difficulties, the list is characterized by: difficult to address, insert and delete easy;
The implementation of the hash table combines these two points, the hash table is implemented in a variety of ways, in the HashMap is used in the chain address method, that is, the Zipper method .
Look at this very wide picture,

The Zipper method is actually the structure of a list of linked arrays, consisting of array-linked lists, in this array of length 16 (hashmap default initialization size isall), each element stores a list of the head node.
This element is inserted into the linked list at the corresponding position of the array, by means of the hash value of the key of the element, modulo the length of the group.
For example, in the figure, 337%16=1,353%16=1, it is inserted into the list header node of array position 1.

>> about HashMap (1) Inheritance and implementation

Inheriting the Abstractmap abstract class, some operations of map have already provided the default implementation in Abstractmap,
To implement the map interface, some operations defined by the map interface can be applied to clarify that HashMap belongs to the map system,
The Cloneable interface indicates that the HashMap object overrides the Java.lang.object#clone () method, and HashMap implements a shallow copy (shallow copy),
Serializable interface, indicating that the HashMap object can be serialized

(2) Internal data structure

The actual data for HashMap is stored in an array of entry classes,
The basis for HashMap is a linear array, which is entry[].

/**     * Internal actual storage array, if need to adjust, capacity must be 2 power     */    transient entry[] table;

Take a look at this internal static class entry,

Static Class Entry<k,v> implements Map.entry<k,v> {        final K key;//key-value structure key        V value;//stored value        entry<k,v> next;//points to the next linked list node        final int hash;//hash value        /**         * Creates new Entry.         *        /Entry (int h, K K, v V, entry<k,v> N) {            value = V;            Next = N;            key = k;            hash = h;        } ......  }
(3) Thread safety

HashMap is non-synchronous, that is, thread is unsafe, in the multithreaded conditions, there may be many problems,
1. Multithreaded put may result in a get-dead loop, which is expressed as CPU utilization 100% (Put transfer method loop moves the linked list in the old array to the new array)
2. Multi-threaded put can lead to element loss (in the AddEntry method of the new entry<k,v> (hash, key, value, E), if two threads have both obtained E, then their next element is E, Then assign a value to the table element when there is a success with one missing)

For more on HashMap thread safety Learn more about reference-related online resources, which are not described in more detail here.

A thread-safe hash table structure is required, which can be considered in the following ways:

Using the Hashtable class, Hashtable is thread-safe;
The use of Java.util.concurrent.concurrenthashmap,concurrenthashmap under concurrent packages enables more advanced thread safety;
or use the Synchronizedmap () synchronization method to wrap HashMap object, get a thread-safe map, and manipulate it on this map.

>> Common Methods(1) method of map interface definition
public interface map<k,v> {public static interface Entry<k,v> {// Gets the entry key public abstract object GetKey ();//Gets the entry Valuepublic abstract object GetValue ();//Set entry's value public Abstract object SetValue (Object obj);p ublic abstract Boolean equals (Object obj);p ublic abstract int hashcode ();} Returns the number of key-value pairs int size ();//Determines whether the container is empty Boolean isEmpty ();//Determines whether the container contains the keyword key boolean containskey (Object key);//Determines whether the container contains the value b Oolean Containsvalue (Object value); Gets the value object get (Object key) based on key; Add a new key-value to the container to object put (object key, object value);//Remove the corresponding key value from the key to object remove (object key);//  Add all key-value pairs in another Map to void Putall (map<? extends K,? extends v> m);//Clears all key values in the container to void clear ();//Returns a set set of all keys in the container Set KeySet ();//Returns the set of all value components Collection values (); Returns all key values to Set<map.entry<k, v>> entryset ();//methods that inherit from Object Boolean equals (object obj); int hashcode ();} 
(2) Construction method

HashMap uses the entry[] array to store data,
In addition, two very important variables were maintained:initialcapacity (initial capacity), Loadfactor (load factor).

The initial capacity is the size of the initial constructed array, any value can be specified,
But at the end of the HashMap, it is converted to a power of a minimum of 2 greater than the specified value, such as specifying an initial capacity of 12, but eventually becoming 16, specifying 16, and finally 16.
The load factor is the saturation of the control array table, the default loading factor is 0.75,

Default_load_factor = 0.75f;

That is, the array reaches the capacity of 75%, it will automatically expand.

In addition, the maximum capacity of the HashMap is 2^30,
static final int maximum_capacity = 1 <<;
The default initialization size is 16,
static final int default_initial_capacity = +;
HashMap provides four construction methods that can be initialized using the default capacity, etc.
You can also explicitly size and load factors, and you can use a different map for construction and initialization.

Public HashMap () {    this.loadfactor = default_load_factor;    threshold = (int) (default_initial_capacity * default_load_factor);    Table = new Entry[default_initial_capacity];     Init ();    } Public HashMap (map<. extends K,? extends v> m) {This        (Math.max (int) (M.size ()/Default_load_factor) + 1,                      default_initial_capacity), default_load_factor);        Putallforcreate (m);    }  Public HashMap (Int. initialcapacity) {This        (initialcapacity, default_load_factor);    }   Public HashMap (int initialcapacity, float loadfactor) {  ...  }

>> Solutions to the conflict between Kazakhstan and Greece(1) What is a hash conflict

In theory, the input field of a hash function is infinite, and a good hash function minimizes the conflict, but it cannot be avoided, and here is an example of a typical hash conflict:

Use the class to do metaphor, the following student data

Zhang San, John Doe, Harry, Zhao Gang, Wulu .....
If we address a rule to take the first letter of the last name of the surname in the relative position of the alphabet, the following hash table will be generated

Position Letters Name
0 A
1 B
2 C

...

10 L John doe

...

22 W Harry, Wulu

..

25 Z Zhang San, Zhao Gang


We noticed that the gray background marked the two lines inside, the keyword Harry, Wulu was made to the same position, the keyword Zhang San, Zhao Gang was also made into the same position. The teacher took the number to find Zhang San, there are two people in the seat, "who are you two Zhang San?" "( how does a hash deal with a conflict? ) )

(2) Ways to resolve the conflict between Kazakhstan and Greece

Common methods of open addressing, re-hashing, chaining address method, and building a public overflow area, are just the chain address method.
The chain address method is the list-array structure we mentioned at the beginning,


Stores all keywords as synonyms in the same linear list.

>> Source Analysis(1) Access implementation of HashMap

The access of HashMap is primarily the implementation of put and get operations.

When you execute the Put method, the subscript of the table array is calculated based on the hash value of the key.
If the hash goes to the same subscript, the new put element is placed in the head of the entry chain.

Public V put (K key, V value) {        if (key = = null)            return Putfornullkey (value);        int hash = hash (Key.hashcode ());        int i = indexfor (hash, table.length);        for (entry<k,v> e = table[i]; E! = null; e = e.next) {            Object K;            if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {                V oldValue = e.value;                E.value = value;                E.recordaccess (this);                return oldValue;            }        }        modcount++;        AddEntry (hash, key, value, I);        return null;    }

Implementation of the Get operation:

Public V get (Object key) {        if (key = = null)            return Getfornullkey ();        int hash = hash (Key.hashcode ());        for (entry<k,v> e = table[indexfor (hash, table.length)];             E! = null;             E = e.next) {            Object k;            if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))                return e.value;        return null;    }

  

Note HashMap supports Key=null, look at this code:

Private V Putfornullkey (v value) {for        (entry<k,v> e = table[0]; E! = null; e = e.next) {            if (E.key = = nul L) {                V oldValue = e.value;                E.value = value;                E.recordaccess (this);                return oldValue;            }        } ......    }
(2) hash function

Here's a look at the hash function used by HashMap, source code from JDK1.6:

/**     * Hash function     * Look at the specific operation, first to the H respectively unsigned right 20-bit and 12-bit,     * Then the two values are bitwise XOR, and finally with H bitwise XOR,     * Get a new h after the same operation, Move right 7-bit and 4-bit respectively, the specific hash function does not go to study * This method can minimize collisions */    static int hash (int h) {         h ^= (H >>>) ^ (H >>> ; );        Return h ^ (H >>> 7) ^ (H >>> 4);    }
(3) Re-hashing the rehash process

When the hash table capacity exceeds the default capacity, the table must be resized.
When the capacity has reached the maximum possible value, then the method adjusts the capacity to Integer.max_value return, at which point a new table array needs to be created to transfer the elements of the table array to the new table array.

/** * Re-hash process * Rehashes the contents of this map to a new array with a * larger capacity.     This method was called automatically when the * number of the keys on this map reaches its threshold.        */void Resize (int newcapacity) {entry[] oldtable = table;        int oldcapacity = Oldtable.length;            if (oldcapacity = = maximum_capacity) {threshold = Integer.max_value;        Return        } entry[] newtable = new Entry[newcapacity];        Transfer (newtable);        Table = newtable;    threshold = (int) (newcapacity * loadfactor);        /** * Transfer all elements of the current entry[] table to the new table */void Transfer (entry[] newtable) {entry[] src = table;        int newcapacity = Newtable.length;            for (int j = 0; J < Src.length; J + +) {entry<k,v> e = src[j];                if (E! = null) {SRC[J] = null;                    do {entry<k,v> next = E.next; int i = IndeXFor (E.hash, newcapacity);                    E.next = Newtable[i];                    Newtable[i] = e;                e = next;            } while (E! = null); }        }    }

  

Reference Java.util.HashMap Problems that may occur in a multithreaded environment
The dead loop of Java HashMap

Thinking in Java HashMap source code analysis

How does a hash deal with conflicts?

Java Collection Source learning Note (iv) HASHMAP analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.