[Java data structure] from source analysis HashMap

Source: Internet
Author: User

The difference between Hashmaphashmap and Hashtable:
    1. Most of Hashtable's methods have been synchronized, hashmap not, so hashmap is not thread-safe.
    2. Hashtable does not allow key or value to use a null value, while HashMap can.
    3. On the internal algorithm, they are different from the hash algorithm of key and the mapping algorithm of hash value to memory index.
The realization principle of HashMap

Simply put, HashMap is the key to do the hash algorithm, and then the hash corresponding to the data mapped to the memory address, directly obtain the corresponding Data key. In the HashMap. The underlying data structure uses an array, the so-called memory address, which is the subscript index of the array. HashMap's high performance needs to ensure the following points:

    • Hash algorithms must be efficient
    • The algorithm of hash value to memory address (array index) is fast
    • The corresponding value can be obtained directly according to the memory address (array index)

How to ensure the hash algorithm efficient, hash algorithm related code is as follows:

int hash = hash(key.hashCode());publicnativeinthashCode();staticint hash(int h){    2012);    return74);}

The first line of code is the hash value used by HashMap to calculate the key, which invokes the Hashcode () method of the object class and the inner function hash () of the HashMap. The Hashcode () method of the object class defaults to the implementation of native, and can be considered as having no performance problems. The implementation of the hash () function is all based on bit operations and is therefore efficient.

When the hash value of key is obtained, the memory address needs to be obtained by hash value:

int i = indexFor(hash, table.lengthint indexFor(intintlength{    return h & (length1);}

The Indexfor () function obtains the array index by bitwise and directly from the hash value and array length.
Finally, the array index returned by the Indexfor () function can get the corresponding value directly through the array subscript, and the direct memory access speed is quite fast, so the HashMap is considered to be high performance.

Hash conflict

3.11, the need to store in the HashMap two elements 1 and 2, through the hash calculation, found corresponding to the same address in memory, how to handle?
In fact, the underlying implementation of HASHMAP uses an array, but the elements inside the array are not simple values. Instead, it is an object of the entry class. Therefore, the HASHMAP structure is aptly described in Figure 3.12.

As you can see, HashMap maintains an array of entry, each of which consists of key, value, next, and hash items in the Entry table. The next section points to another entry. Further reading HashMap's put () method source code, you can see that when the put () operation conflict, the new entry will still be placed in the corresponding index subscript, and replace the original value. Also, to ensure that the old values are not lost, the new entry next points to the old values. This enables multiple value entries to be stored within an array index space. So, as shown in 3.12, HashMap is actually an array of linked lists.

 PublicVput(K key, Vvalue){if(Key = =NULL)returnPutfornullkey (value);inthash = hash (Key.hashcode ());inti = indexfor (hash, table.length); for(entry<k, v> e = table[i]; E! =NULL; E = E.next) {Object k;//If the current key already exists in HashMap        if(E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {V OldValue = e.value;//Get old valueE.value=value; E.recordaccess ( This);returnOldValue;//return old value}} modcount++; AddEntry (hash, key,value, i);//Add current table entry to I position    return NULL;}

The AddEntry () method is implemented as follows:

void addEntry(intvalueint bucketIndex){    Entry<K,V> e = table[bucketIndex];    //将新增元素放到i的位置,并让它的next指向旧的元素    newvalue, e);    if(size++ >= threshold){        resize(2 * table.length);    }}

This implementation mechanism based on HASHMAP, as long as the hashcode and hash () method is good enough to minimize the generation of conflicts, then the operation of HashMap is almost equivalent to the random access operation of the array, and has good performance. However, if the hashcode () or hash () method is poor, in the case of a large number of conflicts, hashmap in fact degenerate into a few linked lists, the operation of HashMap is equivalent to traversing the linked list, when performance is poor.

Capacity parameters

In addition to the implementation of Hashcode (), it also has its capacity parameters that affect HashMap performance. Like ArrayList and vectors, this array-based structure inevitably needs to be extended when the array space is insufficient. The reorganization of arrays is relatively time-consuming, so a certain understanding of them can help optimize the performance of HashMap.

HashMap provides two constructors that can specify the size of an initialization:

publicHashMap(int initialCapacity)publicHashMap(intfloat loadFactor)

Where initialcapacity specifies the initial capacity of the HashMap, loadfactor specifies its load factor. The initial capacity is the size of the array, and HashMap uses the smallest integer greater than or equal to initialcapacity and the exponent of 2 to be the size of the built-in array. The load factor, also called the fill ratio, is a floating-point number between 0 and 1, which determines the hashmap of the internal array before the expansion. By default, HashMap has an initial size of 16 and a load factor of 0.75.

Load factor = number of elements/total size of internal array

In practice, the load factor can also be set to a number greater than 1, but if you do this, HashMap will inevitably generate a lot of conflicts, because this is undoubtedly in the attempt to only 10 pockets in the bag to put 15 items, must have a few pockets to be larger than an object. Therefore, it is not usually used in this way.

Within HashMap, a threshold variable is maintained, which is always defined as the product of the current array's total capacity and load factor, which represents the threshold value of the hashmap. When the actual capacity of the hashmap exceeds the threshold, the HashMap expands. Therefore, when the actual capacity of the hashmap exceeds the threshold, the HashMap expands. Therefore, the actual fill rate of the HashMap does not exceed the load factor.

The code for HASHMAP expansion is as follows:

void resize(int newCapacity){    Entry[] oldTable = table;    int oldCapacity= oldTable.length;    if(oldCapacity == MAXMUM_CAPACITY){        threhold = Integer.MAX_VALUE;        return;    }    //建立新的数组    new Entry[newCapacity];    //将原有数组转到新的数组中    transfer(newTable);    table = newTable;    //重新设置阈值,为新的容量和负载因子的乘积    threshold = (int)(newCapacity * loadFactory);}

Where the array migration logic is mainly implemented in the transfer () function, the function is implemented and annotated as follows:

voidTransfer (entry[] newtable) {entry[] src = table;intnewcapacity = Newtable.length;//Iterate through all the table entries in the array     for(intj =0; J < Src.length; J + +) {entry<k,v> e = src[j];//When the table item index has a value exists, the migration        if(E! =NULL) {Src[j] =NULL; Do{//Data MigrationEntry<k,v> next = E.next;//Calculates the index that is represented in the new array and places it in the new array                //Create a new linked list relationship                inti = indexfor (E,hash, newcapacity);                E.next = Newtable[i];                Newtable[i] = e;            e = next; } while(E! =NULL)        }    }}

Obviously, the HashMap expansion operation will traverse the entire hashmap, should try to avoid this operation, set a reasonable initial size and load factor, can effectively reduce the number of hashmap expansion.

Reference book: Java Program performance optimization

[Java data structure] from source analysis HashMap

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.