HashMap, HashTable, TreeMap in-depth analysis and source code parsing, hashmaptreemap

Source: Internet
Author: User
Tags rehash

HashMap, HashTable, TreeMap in-depth analysis and source code parsing, hashmaptreemap

HashMap is used in implementation instances of Map interfaces in Java sets. Today we will learn HashMap together and learn the HashTable and HashTree associated with it by the way.

I. HashMap

1. Implementation of Map interface based on hash table. This implementation provides all optional ing operations and allows the use of null values and null keys. (The HashMap class is roughly the same as that of Hashtable except for non-synchronous and allowed null .) This class does not guarantee the order of mappings, especially it does not guarantee that the order remains unchanged.

2. The HashMap instance has two parameters that affect its performance:Initial CapacityAndLoad Factor.Capacity is the number of buckets in the hash table.The initial capacity is only the size of the hash table at creation.Load FactorIt is a scale in which a hash table can be full before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacityRehashOperation (that is, rebuilding the internal data structure), so that the hash table will have about twice the number of buckets.

3. The bottom layer of HashMap is the implementation of Hash tables (similar to the combination of array linked lists). When a HashMap object is created, a Hash table is created. The capacity of the Hash table is the number of buckets in the Hash, if the capacity is specified during object creation, the capacity of the created hash table is the number of buckets, and the number of buckets is the initial capacity.

If the initial capacity is not specified during creation, the default value is 16.

<span style="font-family:SimSun;font-size:18px;"> /**     * The default initial capacity - MUST be a power of two.     */    static final int DEFAULT_INITIAL_CAPACITY = 16;</span>

4. Whether the capacity is specified during creation or the default capacity is used, this value is not equal to the number of storage objects, because at the beginning, it is implemented based on arrays and linked lists, and there are Loading factor,Therefore, the capacity is not equal to the number of storage objects.

5. Two factors that affect the performance of an instance have been mentioned in section 2. Therefore, when creating an instance, we need to set these two values as needed, when the space is large and the query efficiency is high, the initial capacity can be set to a greater value, while the loading factor is smaller. In this case, the query efficiency is high, but the space utilization is not high, when the space is relatively small and the efficiency requirement is not high, you can set the initial capacity to a smaller value and the loading factor to a greater value. In this way, the query speed will be slower and the space utilization will be higher, this is because the implementation of arrays and linked lists is used at the underlying layer of HashMap. For specific analysis, see the following content.

6. Hash table structure:

7. Perform a modulo query on the bucket Location Based on the Hash value of the key keyword and the length of the buckets array. If the Hash value of the key is the same, the Hash conflict (that is, pointing to the same bucket) the newly added node is the first to be added at the end of the table.

8. The number of buckets in HashMap is the length of the 0-n Array. The first entry is stored in the bucket) 'The bucket can only store one value, that is, the head node of the linked list. Each node of the linked list is the added value (which attributes of the Instance Entry of the HashMap internal class are described in detail ), it can also be understood that an array of entry-type storage linked lists. The index location of the array is the index address of each bucket.

9. We have learned the structure of the hash table through two figures 6 and 7, and we can see from the two figures that their format is like an array of a linked list.

10. We can find that the hash table is composed of arrays and linked lists. In an array with a length of 16, each element stores a head node of the linked list. So what rules are these elements stored in the array. It is generally obtained through hash (key) % len, that is, the hash value of the element's key is modeled on the array length. For example, in the hash table above, 12% 16 = 12,108%, 12,140% = 16 = 12. Therefore, 12, 28, 108, and 140 are stored at the position where the array subscript is 12.

11. HashMap is actually a linear array, so it can be understood that the container for storing data is a linear array. This may make us puzzled. How does a linear array implement key-value pairs to access data? Here, HashMap does some processing.

First, a static internal class Entry is implemented in HashMap. Its important attributes include:Key, value, nextFrom the attribute key and value, we can clearly see that Entry is a basic bean implemented by the HashMap key-value pair. The above mentioned HashMap is based on a linear array, this array is an array of the Entry [] type, and the content in the Map is saved in the Entry.


Ha <span style = "font-family: SimSun;"> shMap class source code: </span> public class HashMap <K, V> extends AbstractMap <K, v> implements Map <K, V>, Cloneable, Serializable {/*** The default initial capacity-MUST be a power of two. */static final int DEFAULT_INITIAL_CAPACITY = 16;/*** The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power Of two <= 1 <30. */static final int MAXIMUM_CAPACITY = 1 <30;/*** The load factor used when none specified in constructor. */static final float DEFAULT_LOAD_FACTOR = 0.75f;/*** The table, resized as necessary. length MUST Always be a power of two. <span style = "font-family: SimSun;"> * Table adjustment is required and the length must be a power of 2. * An array of the Entry [] type is defined in line with the above understanding </span> */<span style = "color: # CC0000;"> transient Entry <K, v> [] table; </span>

HashMap class constructor source code:

/*** Constructs an empty <tt> HashMap </tt> with the specified initial * capacity and load factor. ** @ param initialCapacity the initial capacity * @ param loadFactor the load factor * @ throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive * <span style = "font- family: simSun; "> initial capacity and load </span> factor to initialize Ha <span style =" font-family: SimSun; "> shMap object </span> <span style =" font-family: SimSun; "> </span> */public HashMap (int initialCapacity, float loadFactor) {if (initialCapacity <0) throw new capacity ("Illegal initial capacity:" + initialCapacity); if (initialCapacity> MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 | Float. isNaN (loadFactor) throw new IllegalArgumentException ("Illegal load factor:" + loadFactor); // Find a power of 2> = initialCapacity int capacity = 1; while (capacity <initialCapacity) <span style = "font-family: SimSun;"> // high efficiency of using displacement Operations </span> <span style = "color: # CC0000; "> capacity <= 1; </span> this. loadFactor = loadFactor; threshold = (int) Math. min (capacity * loadFactor, MAXIMUM_CAPACITY + 1); <span style = "color: #990000;"> <span style = "font-family: SimSun; "> // create an array of the Entry [] type </span> <span style =" font-family: SimSun; "> </span> table = new Entry [capacity]; </span> useAltHashing = sun. misc. VM. isBooted () & (capacity> = Holder. ALTERNATIVE_HASHING_THRESHOLD); init ();}

12. HashMap -- put:

Question: If the two keys get the same index through hash % Entry []. length, is there any risk of overwriting?

Here, HashMap uses the concept of chained data structure. As mentioned above, the Entry class has a next attribute to point to the next Entry. For example, if the first key-value pair A comes in, calculate the hash value of its key to obtain index = 0, and record it as Entry [0] =. After a while, another key-Value Pair B will be added. Its index is equal to 0 after calculation. What should I do now? HashMap will do this:B. next =, Entry [0] = B. If the input is C and the index is equal to 0C. next = B, Entry [0] = C; in this way, we find that the index = 0 actually accesses three key-value pairs A, B, and C. They are linked together through the next attribute. So don't worry. That is to say, the array (bucket) stores the last inserted element. If hash % Entry []. length returns the same index and key. equals (keyother), the value corresponding to this Key is replaced with the new value.

Public V put (K key, V value) {if (key = null) <pre name = "code" class = "html"> <span style = "color: # CC0000; "> // null is always placed in the first linked list of the array, that is, in the bucket mentioned above </span>
Return putForNullKey (value );
// Obtain the hash value of the key
Int hash = hash (key. hashCode ());
// Determine the location of the 'bucket' by modulo the hash value of the key and the length of the table
Int I = indexFor (hash, table. length); // for (Entry <K, V> e = table [I]; e! = Null; e = e. next) {Object k; // if the key-mapped entry already exists in the linked list, replace the value of the entry with the new value if (e. hash = hash & (k = e. key) = key | key. equals (k) {V oldValue = e. value; e. value = value; e. recordAccess (this); return oldValue;} modCount ++; addEntry (hash, key, value, I); return null ;}
Entry internal class: 

Static class Entry <K, V> implements Map. entry <K, V> {// The added kye keyword final K key; // The value of the added value V value; // The Entry object points to the next Entry object Entry <K, v> next; // The hash value of the key keyword int hash;/*** Creates new entry. */Entry (int h, K k, V v, Entry <K, V> n) {value = v; next = n; key = k; hash = h ;}

AddEntry (hash, key, value, I) method:

Void addEntry (int hash, K key, V value, int bucketIndex) {Entry <K, V> e = table [bucketIndex]; <pre name = "code" class = "html"> // parameter e, which is Entry. next, pointing to the next node, that is, the entry added before him, // place the newly created Entry to the bucketIndex index, and let the new Entry point to the original Entry
Table [bucketIndex] = new Entry <K, V> (hash, key, value, e); // If the size exceeds threshold, the table size is expanded. Then hash if (size ++> = threshold)
// Expand the table object to double the original one
Resize (2 * table. length );}

HashMap also includes some optimization implementations. For example, after Entry [] has a certain length, as the data in the map grows, the probability of key hash conflicts increases, and the chain of the same index will be long, will it affect performance? Set a factor (load factor) in the HashMap. As the size of the map increases, Entry [] will extend the length according to certain rules. 

Resize expansion: when the capacity of the hash table exceeds the default capacity, the size of the table must be adjusted. When the capacity has reached the maximum possible value, this method will adjust the capacity to Integer. MAX_VALUE to return. In this case, you need to create a new table to map the original table to the new table.

Void resize (int newCapacity) {Entry [] oldTable = table; int oldCapacity = oldTable. length; if (oldCapacity = MAXIMUM_CAPACITY) {threshold = Integer. MAX_VALUE; return;} Entry [] newTable = new Entry [newCapacity]; boolean oldAltHashing = useAltHashing; useAltHashing | = sun. misc. VM. isBooted () & (newCapacity> = Holder. ALTERNATIVE_HASHING_THRESHOLD); boolean rehash = oldAltHashing ^ useAltHashing; // copy data <span style = "color: # CC0000;"> transfer (newTable, rehash ); </span> table = newTable; threshold = (int) Math. min (newCapacity * loadFactor, MAXIMUM_CAPACITY + 1 );}

The transfer method reconstructs the hash table and reconstructs the linked list.

 /**     * Transfers all entries from current table to newTable.     */    void transfer(Entry[] newTable, boolean rehash) {        int newCapacity = newTable.length;        for (Entry<K,V> e : table) {            while(null != e) {                Entry<K,V> next = e.next;                if (rehash) {                    e.hash = null == e.key ? 0 : hash(e.key);                }                int i = indexFor(e.hash, newCapacity);                e.next = newTable[i];                newTable[i] = e;                e = next;            }        }    }


13. HashMap-get

 public V get(Object key) {        if (key == null)            return <span style="color:#FF6600;">getForNullKey()</span>;        Entry<K,V> entry =<span style="color:#FF6600;"> getEntry(key);</span>        return null == entry ? null : entry.getValue();    }

GetForNullKey () gets the value of the null key:

/*** Offloaded version of <span style = "color: # FF6600;"> get () to look up null keys. null keys map * to index 0. </span> This null case is split out into separate methods * for the sake of performance in the two most commonly used * operations (get and put ), but ininitialized with conditionals in * others. * If the key is null, the obtained entry is the index value 0 */private V getForNullKey () {for (Entry <K, V> e = <span style =" Color: # FF6600; "> table [0] </span>; e! = Null; e = e. next) {if (e. key = null) return e. value;} return null ;}

GetEntry (key) method: obtains the entry object corresponding to the key. If HashMap does not contain a key, the ing returns null.

/*** Returns the entry associated with the specified key in the * HashMap. returns null if the HashMap contains no mapping * for the key. */final Entry <K, V> getEntry (Object key) {// get the key's hash value int hash = (key = null )? 0: <span style = "color: # FF6600;"> hash (key); </span> // use the hash value of the key to determine the index position of the array (bucket location) for (Entry <K, V> e = table [<span style = "color: # FF6600;"> indexFor </span> (hash, table. length)]; e! = Null; e = e. next) {Object k; if (e. hash = hash & (k = e. key) = key | (key! = Null & key. equals (k) return e;} return null ;}

Obtain the hash value of the key:

 final int hash(Object k) {        int h = 0;        if (useAltHashing) {            if (k instanceof String) {                return sun.misc.Hashing.stringHash32((String) k);            }            h = hashSeed;        }        h ^= k.hashCode();        // This function ensures that hashCodes that differ only by        // constant multiples at each bit position have a bounded        // number of collisions (approximately 8 at default load factor).        h ^= (h >>> 20) ^ (h >>> 12);        return h ^ (h >>> 7) ^ (h >>> 4);    }

Determine the index of the array: hashcode % table. length modulo

When accessing HashMap, you must calculate the element of the Entry [] array corresponding to the current key, that is, the array subscript. The calculation method is as follows:


/*** Returns index for hash code h. * return the index location of the hashcode h */static int indexFor (int h, int length) {return h & (length-1 );}

Summary:

1. HashMap is a chained array (an array storing the Linked List) to achieve query speed and quickly obtain the value corresponding to the key;

2. the query speed is affected by the capacity and load factors. The large capacity and load factors make queries faster but waste space. The opposite is true;

3. The index value of the array is (key keyword, hashcode is the hash value of the key, and len array size): the value of hashcode % len is determined, if the capacity is large and the load factor is small, the probability of having the same index (the same index points to the same bucket) is small. If the chain table length is small, the query speed is fast, otherwise, the chain table with the same probability as index is slow.

4. For HashMap and its sub-classes, they use the hash algorithm to determine the storage location of elements in the collection. When HashMap is initialized, the system creates an Entry array with capacity length, the location of an element in this array is called a bucket. Each bucket has its specified index. The system can quickly access the elements stored in the bucket based on the index.

5. At any time, each bucket in HashMap stores only one element (Entry object ). Because the Entry object can contain a reference variable used to point to the next Entry, there may be only one Entry in the HashMap bucket, however, this Entry points to another Entry to form an Entry chain.

6. The above source code shows that HashMap treats the key_value pair as a whole (Entry object) at the underlying layer as an Entry object. When the system decides to store the key_value pair in the HashMap, the value in the Entry is not considered at all, but the storage location of each Entry is determined only based on the hash value of the key.




Ii. HashTree

1. You have time to complete the rest ......



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.