The Java hashcode you don't know

Last Update:2017-12-10 Source: Internet

Author: User

Tags array length modulus

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The reason why we write hashcode, because usually we always hear it. But do you really know hashcode? Where will it be used? How should it be written?

I believe reading this article, can let you see different hashcode.

The purpose of using hashcode is to use one object to find another object. For data structures that use hashing, for example HashSet、HashMap、LinkedHashSet、LinkedHashMap , if there is no good hashcode () and Equals () method to overwrite the key, the key will not be processed correctly.

Please override the Hashcode () method in the following code to Person see what happens?

Overwrite hashcode@OverridePublicIntHashcode () {return age;} @testpublic void testhashcode (new Hashset<person> (); Person person = null; for (int i = 0; i < 3; i++) {person = new person ( "name-" + I, i); People.add (person); } person.age = 100; System. out.println (people.contains (person)); people. add (person); System. out.println (People.size ());}

The running results are not expected true and 3 , instead, false and 4 ! After the change person.age HashSet cannot find person This object, it can be seen that overwrite Hahcode has an impact on hashset storage and query.

So how does hashcode affect the storage and querying of hashset? What kind of impact will it have?

HashSet is implemented internally using HashMap, and all of the collection elements placed in HashSet are converted to the HashMap key to be saved. HashMap uses a hash table to store, which is the array + linked list + red-black tree (JDK1.8 adds the red-black tree portion).

The storage structure diagram is as follows:

HASHMAP Storage structure diagram

The default length of the array is 16, and each element in the array stores the head node of a linked list. The node structure that makes up the list is as follows:

node<K,Map.  entry<K,final K key; V value; Node<k,v> Next; ...}

Each node holds a hash----the hashcode of the key object, and if the key is not saved in any particular order, it is compared to each array element by Equals (), and the time complexity is O (n), the greater the length of the arrays, the lower the efficiency.

So the bottleneck is the key query speed, how to quickly locate the key to the storage location?

HashMap maps the hash value of the key to the array subscript, generating a value from the hash function of the key object as the subscript of the array, so that we can quickly navigate to the storage location by the key. If the hash function is perfectly designed, there are fewer values for each position in the array, then at O (1) We can find the required elements, thus eliminating the need to traverse the linked list. This greatly improves the query speed.

So hashmap how to get the array subscript according to Hashcode? Can be split into the following steps:

The first step:h = key.hashCode()
Step Two:h ^ (h >>> 16)
Step Three:(length - 1) & hash

Analysis

The first step is to get the hashcode value of the key;

The second step is to hashcode the key of the high 16-bit or low 16-bit (upper operation), so that even if the length of the array table is relatively small, it can also ensure that the low bit is involved in the calculation of the hash, and there is not too much overhead;

The third step is the hash value and array length to take the modulo operation, so that the distribution of the elements is relatively uniform. When the length is always 2 of the N-square, the h & (length-1) operation is equivalent to the length of the modulus, so that the modulus of conversion to the displacement operation faster.

However, hashmap default array initialization capacity size is 16. When the length of the array is much smaller than the number of keys, different keys may produce the same array subscript, that is, a hash conflict has occurred!

There are solutions for hash conflicts, such as open addressing, link address, and public overflow area method.

The open addressing method is to find the next empty hash address once a conflict occurs. The procedure can be described in the following style:

f I (key) = (f (key) + D i) mod m (d i =1,2,3,..., m-1)

For example, the key collection is {12,67,56,16,25,37,22,29,15,47,48,34} , table length n = 12 , Fetch f(key) = key mod 12 .

The first 5 calculations are not conflicting and are deposited directly. As shown in the table

subscript

Array	Key
0	12
1	25
2
3
4	16
5
6
7	67
8	56
9
10
11

When key = 37 , the f(37) = 1 position conflicts with 25. The formula is applied f(37) = (f(37) + 1) mod 12 = 2 , so 37 is placed in the array subscript 2. As shown in the table

subscript

Array	Key
0	12
1	25
2	37
3
4	16
5
6
7	67
8	56
9
10
11

There key = 48 , the 0 conflict with 12. Keep looking down and find that there f(48) = (f(48) + 6) mod 12 = 6 is no vacancy until then. As shown in the table

subscript

Array	Key
0	12
1	25
2	37
3
4	16
5	29
6	48
7	67
8	56
9
10	22
11	47

So there will be 48 and 37 conflicts when resolving conflicts, that is, the accumulation , whether the search or deposit efficiency is greatly reduced.

The chain address method resolves the conflict by [0～m-1] setting a one-dimensional array of m pointer components Array[m] , where the hash address of I is inserted into Array[i] the linked list with the header pointer.

The basic idea is to create a single-linked list for each hash value, and insert the record into the linked list when a conflict occurs. ：

Chain Address method

The benefits of the list are shown in:

The remove operation is highly efficient and only maintains pointer changes, eliminating the need for shift operations
When re-hashing, the original elements scattered in the same slot may be scattered in different places, the array needs to be shifted, and the linked list only need to maintain pointers.
However, this also brings a performance loss that requires traversing a single-linked list.

The public overflow method is that we place a public overflow area for all conflicting keys to be stored separately.

For example {37,48,34} , there are conflicts in the previous examples, and they are stored in overflow tables.

Public Overflow method

When looking, the comparison is made with the base table, and if it is equal, the lookup succeeds, and if not unequal, the order is searched in the overflow table. The public overflow method applies to situations where there is little conflict data.

HashMap to resolve conflicts is to take the link address method. The overall flow chart (temporarily does not consider the expansion) is as follows:

HashMap Storage Flow Diagram

After understanding the hashcode and hash conflicts as solutions, how do we design our own hashcode ()

Method?

The following guidance is given in the effective Java book to overwrite Hashcode ():

	calculation
boolean	`C = (f? 0:1)`
Byte, char, short, int	`c = (int) f`
long	`c = (int) (f ^ (f >>> C))`
float	`c = float.floattointbits (f)`
Double	`long L = double.doubletointlongbits (f)`
	`c = (int) (l ^ (l >>> +))`
Object	`c = f.hashcode ()`
Array	`apply the above rules for each element`
boolean	`C = (f? 0:1)`
boolean	`C = (f? 0:1)`

Hash codes are calculated by mergingresult = 37 * result + c

The modern IDE automatically generates HASHCODE methods by right-clicking the context menu, such as the hashcode generated by idea as follows:

@Overridehashcodereturn result;}

However, in enterprise code, it is best to use third-party libraries such as Apache commons to generate the Hashocde method. The advantage of using a third-party library is that you can repeatedly verify the attempt code. The following code shows how to Apache Commons hash code build a build hashcode using a custom class.

int Hashcode () {    new Hashcodebuilder ();    Builder. append (mostsignificantmembervariable);    ........................    Builder. append (leastsignificantmembervariable);    return Builder.tohashcode ();}

As the code shows, the most important signature member variable should be passed first and then followed by a less important member variable.

Summarize

Through the above analysis, we design hashcode () should pay attention to:

Whenever you call Hashcode () on the same object, you should generate the same value.
Hashcode () Try to use meaningful identifying information within the object.
A good hashcode () should produce a uniformly distributed hash value.
Java Learning Group 669823128

The Java hashcode you don't know

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More