Java HashMap implementation principle source code analysis, javahashmap

Last Update:2017-01-09 Source: Internet

Author: User

Tags rehash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java HashMap implementation principle source code analysis, javahashmap

HashMap is implemented based on the Map interface of the hash table. It provides all the optional ing operations and allows creation of null values and null values. It is not synchronized and the ing sequence is not guaranteed. Next we will record the implementation principles of HashMap.

HashMapInternal Storage

In HashMap, all key-value pairs are stored by maintaining an array of instantaneous variables (table (also called Bucket). A bucket is an array of Entry objects, the bucket size can be adjusted as needed, and the length must be a power of 2. The following code:

/*** An empty entry array. The default value of the bucket */static final Entry <?,?> [] EMPTY_TABLE ={};/*** bucket, which is adjusted as needed, but must be the power of 2 */transient Entry <K, v> [] table = (Entry <K, V> []) EMPTY_TABLE;

Initial Capacity and load factor

HashMap has two parameters that affect performance, initial capacity, and load factor. Capacity is the number of buckets in the hash table. The initial capacity is only the capacity of the hash table when it is created. The load factor is a scale in which the hash table can be full before its capacity increases automatically. When the number of entries in a Hash table exceeds the product of the load factor and the current capacity, You need to rehash the Hash table (that is, rebuild the internal data structure ), create a new one at twice the current capacity during reconstruction. You can use the constructor to set the initial capacity and load factor. The default initial capacity is 16 entries, and the maximum capacity is 2 ^ 30 to the power. The default load factor is 0.75.

A bucket is like a water bucket. Its default initial water storage capacity is 16 units of water. By default, when the water is filled to 16*0.75, the next time you add data, the capacity is expanded to 32 units. 0.75 is the load factor. You can set the initial capacity and load factor when creating a bucket. The maximum capacity of a bucket is 2 to the power of 30. When the initial capacity is greater than the maximum capacity, the maximum capacity prevails. If the expansion is greater than or equal to the maximum capacity, the system returns the result directly.

HashMapPart of the source code defines the default initial capacity, load factor and other constants:

/*** Default initialization capacity, which MUST be 2 power times The default initial capacity-MUST be a power of two. */static final int DEFAULT_INITIAL_CAPACITY = 1 <4; // aka 16/*** maximum capacity. If the initial capacity passed by the constructor parameter is greater than the maximum capacity, this capacity will also be used as the initial capacity

* It must be the power of 2 and be less than or equal to the power of 2 to the power of 30

*/Static final int MAXIMUM_CAPACITY = 1 <30;/*** default load factor, which can be specified by the constructor */static final float DEFAULT_LOAD_FACTOR = 0.75f; /*** an empty array table. When the bucket is not initialized */static final Entry <?,?> [] EMPTY_TABLE ={};/*** bucket, which stores all key-Value Pair entries and can be adjusted as needed, the length must be 2 power */transient Entry <K, V> [] table = (Entry <K, V> []) EMPTY_TABLE; /*** the number of key-value pairs in the Map. Each time a value is added or deleted, the size operation is + 1 or-1. */transient int size;/*** load value. The threshold value to be adjusted is: (capacity * load factor ). after each adjustment, the new capacity will be used to calculate * @ serial * // If table = EMPTY_TABLE then this is the initial capacity at which the // table will be created when inflated. int threshold;/*** load factor. If not specified in the constructor, the default load factor is used, ** @ serial */final float loadFactor; /*** Number of HashMap structure modifications. This field is used in

* The iterators generated on the HashMap collection view are processed as fast failed */transient int modCount;

Initial Capacity and load factor performance Adjustment

Generally, the default load factor (0.75) seeks a compromise between time and space costs. Although the load factor is too high, it reduces the space overhead, but it also increases the query cost (which is reflected in most HashMap operations, including get and put operations ). When setting the initial capacity, you should consider the number of entries and their load factors required in the ing to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, the rehash operation is not performed.

If many ing relationships need to be stored in the HashMap instance, compared with the on-demand automatic rehash operation to increase the table capacity, creating a ing with enough initial capacity will make the ing more effective.

The code for rebuilding the HashMap data structure is as follows:

Void resize (int newCapacity) {Entry [] oldTable = table; int oldCapacity = oldTable. length; if (oldCapacity = MAXIMUM_CAPACITY) {// if the capacity has reached the maximum limit, threshold = Integer is returned directly after the load value is set. MAX_VALUE; return;} // create a new table to store data Entry [] newTable = new Entry [newCapacity]; // transfers data in the old table to the new table, this step will take a lot of time transfer (newTable, initHashSeedAsNeeded (newCapacity); table = newTable; // finally set the next load value threshold = (int) Math. min (newCapacity * loadFactor, MAXIMUM_CAPACITY + 1 );}

HashMapConstructor

The fourth constructor creates a new HashMap with an existing Map. Later, the first three constructor methods actually call the third method with two parameters, if no parameter is passed, the default value is used. The Code is as follows:

    /**     * Constructs an empty <tt>HashMap</tt> with the default initial capacity     * (16) and the default load factor (0.75).     */    public HashMap() {        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);    }    /**     * Constructs an empty <tt>HashMap</tt> with the specified initial     * capacity and the default load factor (0.75).     *     * @param  initialCapacity the initial capacity.     * @throws IllegalArgumentException if the initial capacity is negative.     */    public HashMap(int initialCapacity) {        this(initialCapacity, DEFAULT_LOAD_FACTOR);    }    /**     * Constructs an empty <tt>HashMap</tt> with the specified initial     * capacity and load factor.     *     * @param  initialCapacity the initial capacity     * @param  loadFactor      the load factor     * @throws IllegalArgumentException if the initial capacity is negative     *         or the load factor is nonpositive     */    public HashMap(int initialCapacity, float loadFactor) {        if (initialCapacity < 0)            throw new IllegalArgumentException("Illegal initial capacity: " +                                               initialCapacity);        if (initialCapacity > MAXIMUM_CAPACITY)            initialCapacity = MAXIMUM_CAPACITY;        if (loadFactor <= 0 || Float.isNaN(loadFactor))            throw new IllegalArgumentException("Illegal load factor: " +                                               loadFactor);        this.loadFactor = loadFactor;        threshold = initialCapacity;        init();    }

As can be seen from the above, in the constructor, if the initial capacity is greater than the maximum capacity, it is directly replaced by the maximum capacity.

PutMethod

Next, let's take a look at the important parts of HashMap.

/*** Associate the specified value with the specified value in this ing. If the ing previously contains a ing relationship of the key, the old value is replaced with ** @ param to specify the key to be associated * @ param to specify the value to be associated * @ return to the old value associated with the key. If the key has no ing relationship, returns null (null may also indicate that null is associated with the key before the ing) */public V put (K key, V value) {if (table = EMPTY_TABLE) {inflateTable (threshold);} if (key = null) return putForNullKey (value); int hash = hash (key); int I = indexFor (hash, table. length); for (Entry <K, V> e = table [I]; e! = Null; e = e. next) {Object k; if (e. hash = hash & (k = e. key) = key | key. equals (k) {V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue;} modCount ++; addEntry (hash, key, value, I); return null ;}

When a new entry is added, the hash value needs to be calculated. If the length is not enough, the length needs to be adjusted. When the calculated storage location already has elements, you need to store them in a linked list, therefore, the efficiency of adding operations using HashMap is not too high.

GetMethod

First, let's look at the source code of the get method:

/*** Returns the value mapped to the specified key. If this key does not contain any ing relationship, if null * is returned, the null value does not necessarily indicate that the ing does not contain the ing of the key. You may also change the ing to null, you can use the containsKey operation to differentiate the two cases * @ see # put (Object, Object) */public V get (Object key) {if (key = null) return getForNullKey (); Entry <K, V> entry = getEntry (key); return null = entry? Null: entry. getValue ();} final Entry <K, V> getEntry (Object key) {if (size = 0) {return null;} int hash = (key = null )? 0: hash (key); for (Entry <K, V> e = table [indexFor (hash, table. length)]; e! = Null; e = e. next) {Object k; if (e. hash = hash & (k = e. key) = key | (key! = Null & key. equals (k) return e;} return null ;}

The get method is easy to implement. The following are several steps:

By viewing the get source code, we can find that the get method calculates the storage location through the hash value of the key and the length of the bucket. Basically, the elements to be found can be located, even if you traverse several keys with duplicate hash values, it is very fast. Because the hash value is relatively unique, HashMap is very fast in searching.

Custom objectHashMapKey

Class User {// id number protected int idNumber; public User (int id) {idNumber = id;} public class TestUser {public static void main (String [] args) {Map <User, String> map = new HashMap <User, String> (); for (int I = 0; I <5; I ++) {map. put (new User (I), "Name:" + I);} System. out. println ("User3 name:" + map. get (new User (3 )));}}

Output: User3 name: null

As shown in the above Code, when a User class instance is used as a HashMap Object, the User3 name cannot be found during printing, because the User class automatically inherits the base class Object, therefore, the hashCode method of the Object is used to generate the hash value, which is calculated by default using the Object address. Therefore, the hash value of the First Instance generated by new User (3) is different from that of the second instance. However, if you only need to overwrite the hashCode method, it will not work normally, unless the equals method is overwritten at the same time, it is also part of the Object. HashMap uses equals () to determine whether the current key is the same as the existing key in the table. You can refer to the get or put method above.

The correct equals () method must meet the following five conditions: --- see Java programming ideas-page 489

Emphasize again:The default Object. equals () is only the address of the Object, so a new User (3) is not equal to another new User (3 ). Therefore, if you want to use your own class as the key of HashMap, you must reload both hashCode () and equals ().

The following code works properly:

Class User {// id number protected int idNumber; public User (int id) {idNumber = id ;}@ Override public int hashCode () {return idNumber ;} @ Override public boolean equals (Object obj) {return obj instanceof User & (idNumber = (User) obj ). idNumber) ;}} public class TestUser {public static void main (String [] args) {Map <User, String> map = new HashMap <User, String> (); for (int I = 0; I <5; I ++) {map. put (new User (I), "Name:" + I);} System. out. println ("User3 name:" + map. get (new User (3);} output: User3 name: 3

The above is just a simple hashCode that returns the idNumber as the unique discriminant. You can also implement your own methods based on your own business. In the equals method, instanceof quietly checks whether the object is null. If the parameter on the left of instanceof is null, false is returned. If the equals () parameter is not null and the type is correct, the comparison is based on the actual idNumber in each object. The output shows that the current method is correct.

Refer:

Java programming ideas

Jdk api help documentation

JDK source code

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More