"HashMap" in-depth principle analysis

Source: Internet
Author: User


equals and "= ="

1 , the basic data type ( byte , Short , Char , int , Long , float , Double , Boolean )

Use "= =" to compare whether the specific values are equal


2 , composite data types

"= = " compared to the address stored in memory

Object in the equals The initial behavior is to compare addresses in memory, but is overwritten in some class libraries such as ( String , Integer , Date , etc.)

therefore, for composite data types, use equals Compared to the memory address that is not covered by the overwrite, it is generally a more specific value .


Note: the underlying implementation of equals

/** default = =, Direct comparison Object */public Boolean equals (Object obj) {return (this = = obj); }

override equals to satisfy several conditions:

Reflexivity: X,x.equals (x) should return true for any non-null reference value.

Symmetry: x.equals (y) should return true for any non-null reference value x and Y, if and only if Y.equals (x) returns True.

Transitivity: For any non-null reference value x, Y, and Z, if X.equals (y) returns true and Y.equals (z) returns True, then X.equals (z) should return true.

Consistency: For any non-null reference value x and Y, multiple calls to X.equals (Y) always return TRUE or always return false, provided that the information used in the Equals comparison on the object has not been modified.

X,x.equals (NULL) should return FALSE for any non-null reference value.

For rewriting equals to rewrite hashcode, refer to the blog

http://blog.csdn.net/hejingyuan6/article/details/22398151

Java for the Eqauls method and the Hashcode method are defined as:

1. If two objects are the same, their hashcode values must be the same;

2. If the hashcode of two objects are the same, they are not necessarily the same (the objects mentioned here refer to the comparison by the Eqauls method). If you do not do as required, you will find that the same object can appear in the set set, while the efficiency of adding new elements will be greatly reduced.

3.equals () equal to two objects, hashcode () must be equal, equals () unequal two objects, but does not prove that their hashcode () is not equal.    

The most common thing we use in HashMap is put (k,v) and get (K). We all know that the K value of HashMap is unique, so how to guarantee uniqueness?

The first thing we think of is the comparison with equals, yes, this can be achieved, but with the increase in internal elements, put and get efficiency will be lower, here the time complexity is O (n), if there are 1000 elements, put to compare 1000 times. In fact, HashMap rarely uses the Equals method because it manages all the elements through a hash table, which is a hash of the hashed word, or a hash, and the hash algorithm can quickly access the element, and when we call the put value, HashMap first calls the K's Hashcode method, obtains the hash code, through the hash code to quickly find a place of storage, this position can be called Bucketindex, through the Hashcode protocol described above can know, if hashcode different, equals must be false, and equals is not necessarily true if Hashcode is the same.

So theoretically, hashcode may be in conflict situations, there is a professional noun called collision, when the collision occurs, the calculated Bucketindex is the same, then take the bucketindex position stored elements, Finally, by equals, the Equals method is the method that will be executed when the hash code collides, so it is said that HashMap seldom uses equals.

HashMap through hashcode and equals to determine whether K is already present, if it already exists, replace the old V value with the new V value, and return the old V value, if not present, hold the new key value pair <k, v> to the Bucketindex location.

Put process Flowchart:



Now we know that after executing the Put method, the final HASHMAP storage structure will have these three cases, case 3 is the least occurrence, the hash code collision is a small probability event. So far, we've learned two things:

HashMap the hashcode of the keys to quickly access the elements.

When different object hashcode collide, the HashMap is resolved by a single linked list, adding the new element to the list header and pointing to the original element through next. The implementation of a single-linked list in Java is an object reference (composite).

two ways to resolve conflicts:

When the system decides to store the Key-value pair in the HASHMAP, it does not take into account the value in Entry, but only calculates and determines the storage location of each Entry based on key. That is, we can take the value of the Map collection as a subsidiary of the key, and when the system determines where the key is stored, value is stored there.

Experiment:

The hashmap program has been modified deliberately to construct a hash conflict, because the initial size of HashMap 16, but I put more than 16 elements inside the HashMap, and I blocked its resize () method. Don't let it go to capacity. At this point the underlying array of HashMap entry[] table structure is as follows:




HashMap inside the bucket appeared in the form of a single-linked list, a hash table to solve the problem is the hash value of the conflict, usually two methods: linked list method and open address method .

The chain list method is to organize the same hash value of the object into a chain table placed in the hash value corresponding to the slot;

The Open address method is a detection algorithm that continues to find the next slot that can be used when a slot has been occupied.

JAVA.UTIL.HASHMAP uses the Chain list method, the linked list is a one-way linked list.

The core code for forming a single linked list is as follows:

void AddEntry (int hash, K key, V value, int bucketindex) {      entry<k,v> e = Table[bucketindex];      Table[bucketindex] = new entry<k,v> (hash, key, value, e);      if (size++ >= threshold)          Resize (2 * table.length);  

The code for the above method is simple, but it contains a design: The system always places the newly added Entry object in the Bucketindex index of the table array--If a Entry object is already at the Bucketindex index, The newly added Entry object points to the original Entry object (producing a Entry chain), and if there is no Entry object at the Bucketindex index, that is, the e variable of the program code above is null, that is, the newly placed Entry object points to null, that is, no production Raw Entry chain.


HashMap There is no hash conflict, when there is no single linked list, HashMap find the element quickly, the get () method can directly locate the element, but the single-linked list, the single bucket is not a Entry, but a Entry chain, The system must traverse each Entry sequentially, until it finds the Entry to search for--if the Entry that happens to be searched is at the very end of the Entry chain (the Entry is first placed in the bucket), the system must loop to the last to find the element.

Description:

HashMap has two parameters that affect its performance:

initial capacity and load factor. The default initial capacity is 16, and the load factor is 0.75, which is a tradeoff between time and space costs. Capacity is the number of buckets (entry arrays) in the hash table, and the initial capacity is just the capacity at the time of creation of the Hashtable. A load factor is a scale in which a hash table can reach a full amount before its capacity increases automatically. Doubles the capacity by calling the Rehash method when the number of entries in the hash table exceeds the product of the load factor to the current capacity.

the high load factor reduces the overhead, but it also increases the query cost (the load factor is the degree to which the elements in the Hsah table are filled.) if: The larger the load factor, the more elements filled, the advantage is that space utilization is high, but the chances of conflict are increased. Conversely, The smaller the load factor, the less the element that fills up, and the advantage is that the chance of conflict is reduced, but the space is much wasted. When setting the initial capacity, you should take into account the number of entries required in the mapping and their loading factors in order to minimize the number of rehash operations. The rehash operation does not occur if the initial capacity is greater than the maximum number of entries divided by the load factor (in fact, the maximum number of entries is less than the initial capacity * load factor).

when the HashMap store more and more elements, to reach the threshold value (threshold) threshold, it is necessary to expand the entry array, which is the Java Collection Class framework The greatest charm, hashmap in the expansion, the capacity of the new array will be twice times the original, As the capacity changes, each element of the original needs to recalculate the Bucketindex and then store it in a new array, known as rehash. HashMap default initial capacity 16, load factor 0.75, that is, the maximum can be put 16*0.75=12 elements, when put 13th, HashMap will occur rehash,rehash a series of processing compared to affect performance, So when we need to store more elements to HashMap, it is best to specify the appropriate initial capacity and loading factor, otherwise hashmap can only save 12 elements by default, and multiple rehash operations will occur.


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

"HashMap" in-depth principle analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.