The source code interpretation of HashMap in Java JDK

Source: Internet
Author: User
Tags rehash concurrentmodificationexception

HashMap is one of the most common data structures we use to write code in our daily lives, and it provides us with key-value forms of data storage. At the same time, its query, insert efficiency is very high.

In the previous sort algorithm summary, I learned about HashMap's implementation principle and made a simplified version of the HashMap. Today, taking advantage of the intermittent phase of the project, I carefully read the implementation of HashMap in Java.

Initialization of the HashMap:

Java code
    1. Public HashMap (int initialcapacity, float loadfactor)
    2. Public HashMap (int initialcapacity)
    3. Public HashMap ()
    4. Public HashMap (map<? extends K,? extends v> m)

Recently I have seen several wonderful articles:

Access Beauty--hashmap principle, source code, practice

Hash collision and Denial of service attack

These articles let me gain a lot, but some places not enough detailed, write down my summary and understanding of the above article, hope to bring some help to the needs of friends.

1. Overview

HashMap stores key-value pairs in the form of an array + list at the bottom.

An inner class entry<k, V>, is defined in HashMap, which is an abstraction of key-value. The entry class contains 4 Members: key, value, hash, next. The meaning of key and value is clear, the hash denotes the hash value of key, and next refers to a reference to a entry object.

HashMap internally maintains a entry<k, v>[] table, the entry element in the array table is the head node of a entry list (it is important to understand this).

2. Put/get method

When you add a key-value pair to a HashMap, the program calculates the hash value based on the hashcode value of the key, and then modulo the hash value, which is the modulus of table.length. If the result of the modulo is index, remove Table[index]. Table[index] may be null, or it may be a entry object. If NULL, it is stored directly. Otherwise calculate key.equals (Table[index].key), if False, remove table[index].next and continue calling the Equals method of key until the Equals method returns True. or compare all entry objects in the linked list.

Java code
  1. Public V put (K key, V value) {
  2. if (key = = null)
  3. return Putfornullkey (value);
  4. Two hash of the hashcode value to get the final hash value
  5. int hash = hash (Key.hashcode ());
  6. Locates the index position in an array based on the hash value
  7. int i = indexfor (hash, table.length);
  8. The linked list at the Traverse Table[i] Position
  9. For (entry<k, v> e = table[i]; E! = null; e = e.next) {
  10. Object K;
  11. If the hash value is the same and equals returns True, the original value value is replaced
  12. if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k)) {
  13. V oldValue = E.value;
  14. E.value = value;
  15. E.recordaccess (this);
  16. return oldValue;
  17. }
  18. }
  19. modcount++;
  20. If the previous function does not return, insert the key-value pair into the table[i] linked list
  21. AddEntry (hash, key, value, I);
  22. return null;
  23. }

If you understand the Put method, the Get method is easy to understand:

Java code
  1. Public V get (Object key) {
  2. if (key = = null)
  3. return Getfornullkey ();
  4. int hash = hash (Key.hashcode ());
  5. First, the index is calculated based on the hash value, and then the head node of the linked list at index is taken out. Traverse the linked list.
  6. For (entry<k, v> e = table[indexfor (hash, table.length)]; E! = null; e = e.next) {
  7. Object K;
  8. if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k)))
  9. return e.value;
  10. }
  11. return null;
  12. }

3. HashMap capacity and index position determination

The capacity problem of the HashMap is not described earlier because the capacity is closely related to the index location calculation.

Understanding HashMap's capacity requires attention to member variables size, loadfactor, threshold.

The size represents the number of key-value pairs actually contained in the HashMap.

Loadfactor represents a load factor, the larger the value of the Loadfactor, the greater the utilization of the table array, which is equivalent to saving memory space. However, the value of loadfactor increases and the probability of hash collisions increases, which makes the program less efficient. The value of the loadfactor should take into account both memory space and efficiency, with a default value of 0.75.

Threshold represents the limit capacity, the formula is threshold = (int) (capacity * loadfactor); When size reaches threshold, you need to expand the table array.

The capacity of the HashMap is table.length. Since the Java modulo is an inefficient operation, for performance reasons, the HASHMAP capacity is designed to be 2 of the n-th side. So hash%table.length can be converted to hash& (table.length-1). And the efficiency of the operation is much more efficient than the modulo operation.

Java code
  1. Public HashMap (int initialcapacity, float loadfactor) {
  2. if (initialcapacity < 0)
  3. throw new IllegalArgumentException ("Illegal initial capacity:" + initialcapacity);
  4. if (Initialcapacity > Maximum_capacity)
  5. initialcapacity = maximum_capacity;
  6. if (loadfactor <= 0 | | Float.isnan (Loadfactor))
  7. throw new IllegalArgumentException ("Illegal load factor:" + loadfactor);
  8. Computes the minimum number of n squares of 2 greater than initialcapacity
  9. int capacity = 1;
  10. while (Capacity < initialcapacity)
  11. Capacity <<= 1;
  12. This.loadfactor = Loadfactor;
  13. Find the Limit capacity
  14. threshold = (int) (capacity * loadfactor);
  15. The capacity of the table is designed to be 2 of the n-th square
  16. Table = new Entry[capacity];
  17. Init ();
  18. }

If you create a hashmap using a parameterless constructor, the capacity defaults to 16 and the load factor defaults to 0.75.

The Indexfor function is used to determine the index location:

Java code
    1. static int indexfor (int h, int length) {
    2. When the length is 2 of the N-square is equivalent to h%table.length, but efficiency is much more efficient
    3. Return H & (LENGTH-1);
    4. }

4. Rehash

As mentioned earlier, when size reaches threshold, you need to expand the table array. Calling the put function to insert a key-value pair into the HashMap will call to the addentry (hash, key, value, I) method:

Java code
    1. Void addentry (int hash, k key, v value, int  Bucketindex)  {  
    2.     //  Remove Entry object at index   
    3.     entry<k, v> e = table[bucketindex];  
    4.     //  Update the head node of the linked list at index,  and make the next property of the new head node point to the original header node   
    5.      table[bucketIndex] = new Entry<K, V> (Hash, key, value, e) ;   
    6.     //  expands the array,  capacity to twice times the original when size is greater than threshold .  Ensure that the capacity of the table is always 2 n-th square   
    7.     if  (size++ >= threshold)   
    8.         resize (2 * table.length);   
    9. }  

Resize for expanding arrays. If the length of the array is increased, then the key pairs already in the HashMap must be hashed again, which is rehash. If rehash is not performed, it causes the table array length to be different when put and get, resulting in the Get method failing to remove the key-value pairs that were stored in the previous put.

Java code
  1. void Resize (int newcapacity) {
  2. entry[] oldtable = table;
  3. int oldcapacity = Oldtable.length;
  4. if (oldcapacity = = maximum_capacity) {
  5. threshold = Integer.max_value;
  6. Return
  7. }
  8. entry[] newtable = new Entry[newcapacity];
  9. Transfer (newtable);
  10. Table = newtable;
  11. threshold = (int) (newcapacity * loadfactor);
  12. }
  13. void Transfer (entry[] newtable) {
  14. entry[] src = table;
  15. int newcapacity = Newtable.length;
  16. Rehash the existing key-value pairs
  17. for (int j = 0; J < Src.length; J + +) {
  18. Get the head node of the linked list at J
  19. Entry<k, v> e = src[j];
  20. Traversing a linked list
  21. if (E! = null) {
  22. SRC[J] = null;
  23. do {
  24. For rehash
  25. Entry<k, v> next = E.next;
  26. int i = indexfor (E.hash, newcapacity);
  27. E.next = Newtable[i];
  28. Newtable[i] = e;
  29. e = next;
  30. } while (E! = null);
  31. }
  32. }
  33. }

From the source code can be seen, rehash on the performance of the impact is very large, so we should try to avoid the occurrence of rehash. This requires that we estimate the number of key-value pairs that need to be deposited into the hashmap, specifying the appropriate capacity and load factor when creating the HashMap object, depending on the quantity.

5. Hash collisions and degradation of hashmap

The performance of hash collisions in HASHMAP is: Different keys, the same index is calculated. If the return value of the Indexfor method is the same for all key calls, then the hashmap is degraded to a linked list, which has a very large effect on performance. A few months ago the noisy hash attack was based on this principle.

The common web framework will save the parameters in the request in HashMap (or Hashtable), if the client based on the Web application framework using a hash function to obtain a large number of collisions through a hash attack, then the HashMap will degenerate into a linked list, It may take more than 10 minutes or even hours for the server to process a request ...

6. Thread Safety

HashMap is thread insecure, and if HashMap is modified during traversal of HashMap, a Java.util.ConcurrentModificationException exception is thrown:

Java code
  1. Final entry<k, v> NextEntry () {
  2. if (modcount! = expectedmodcount)
  3. throw new Concurrentmodificationexception ();
  4. Entry<k, v> e = next;
  5. if (E = = null)
  6. throw new Nosuchelementexception ();
  7. if (next = e.next) = = null) {
  8. entry[] t = table;
  9. while (Index < t.length && (next = t[index++]) = = null)
  10. ;
  11. }
  12. current = e;
  13. return e;
  14. }

Modcount is a member variable of HASHMAP, which is used to represent the state of HashMap. Expectedmodcount is the value of Modcount at the beginning of the traversal. If you change the value of modcount during traversal, it causes modcount and expectedmodcount to be unequal, which throws an exception. Put, clear, remove, and so on will cause the value of modcount to change.

The source code interpretation of HashMap in Java JDK

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.