What is the difference between equals () and hashcode ()?

Source: Internet
Author: User

What is the difference between equals () and hashcode ()?

-------------------------------------------------

Equals (): reflects the specific value of an object or variable, that is, the value contained within two objects- -either a reference to an object or a value of a value type.

Hashcode (): Computes the hash code of the object instance and returns a hash code, also known as a hash function. The calculation of the Hashcode () method of the root class object depends on the D (memory address) of the object instance, so the hashcode of each object is unique; Of course, when the class that corresponds to the object overrides the Hashcode () method, the result is quite different.

  There is a hashcode method, because in the bulk of the object comparison,Hashcode is faster than equals, a lot of collections are used hashcode, such as HashTable.

Two obj, if equals () is equal,hashcode () must be equal.
Two obj, if hashcode () is equal,equals () is not necessarily equal (hash hash value has a conflict, although the probability is very low).
So:
You can consider that in the collection, the rule that determines whether two objects are equal is:
The first step, if hashcode () is equal, look at the second step, otherwise not equal ;
The second step is to see if equals () is equal, and if they are equal, the two obj are equal or not equal.

1, firstEquals () andHashcode () These two methods are all fromInherited from the object class.

Equals () is a comparison of the address values of two objects (that is, whether the comparison reference is the same).

Hashcode () is a local method, and its implementation is dependent on the local machine.

2.Java language pairsThe requirements of equals () are as follows, and these requirements must be followed:

Asymmetry of: IfX.equals (y) returns yes "True ", thenY.equals (x) should also return yes "True ".

BReflective Nature:X.equals (x) must return yes "True ".

Canalogical Nature: IfX.equals (y) returns yes "True ", andY.equals (z) returns yes "True ", thenZ.equals (x) should also return yes "true ".

D   Consistency : if x.equals (y) returns is " Span lang= "en-US" >true ", as long as x and y content has remained unchanged, whether you repeat x.equals (y) How many times, the return is "true".

in any case, x.equals (NULL) and always returns "false"; x.equals (and x different types of objects false".

3, equals () equal two objects, hashcode () must be equal;

in turn: hashcode (), will be able to launch equals () also ranged;
Hashcode () are equal, equals () may be equal or unequal.  


Why choose the Hashcode method?

----------------------------------------------

With Java.lang.Object, every new object in the JVM will throw the object into a hash table, so that the next time the object is compared or taken, It will then take this object from the hash table based on the hashcode of the object. The purpose of this is to improve the efficiency of the objects being taken. The process is this:

1. New Object (), the JVM according to the object's hashcode value, put into the corresponding hash table corresponding to the key, if different objects do produce the same hash value, that is, a hash key is the same cause of the conflict, then in this hash Key places a list of all the objects that produce the same hashcode on the single-linked list, strung together.

2. Compare two objects, first of all according to their hashcode to find his object in the hash table, when two objects hashcode the same, so that they are placed in the hash table on the same key, then they must be on the list on this key. At this point, it is only possible to compare this object to equal based on the equal method of object. When the hashcode of two objects are different, surely they cannot be equal.

Perhaps after the above theory to say that everyone is confused, I also looked after the indefinitely. Let me give you an example to illustrate the details below.

A list can be repeated, and set cannot be duplicated. So when the set stores the data, how to determine whether the stored data already exists. Use the Equals () method, or the Hashcode () method.

If you are using equals (), then storing an element is compared to all existing elements, such as 100 elements, then the Equals method will be called 100 times when the 101 elements are saved.

But if you use the Hashcode () method, he uses the hash algorithm to store the data.

In this case, each time a data is saved, a hashcode () method is called to get a hashcode value and deposit position. If there is no data in that location, then it is stored directly, otherwise the Equals () method is called once, and the same is not saved. This way down the entire store does not need to call the Equals method several times, although several times the Hashcode method, but compared to the previous efficiency is much higher.

Why would you want to override the Equals method?

-------------------------------------------

Because the equal method of object is a reference comparison of two objects by default, it means pointing to the same memory , the address being equal, otherwise unequal, or overloading the equal method if you now need to use the value inside the object to determine whether it is equal .

Say this place I believe a lot of people will have doubts, I believe that everyone is the string object's Equals () method and "= =" entangled over a period of time, when we know that the Equals method in the string object is the judge value, and = = is the address judgment.

So, how does equals compare to the address?

That's because in fact, in the JDK, a wrapper class such as String, math, and so on has overridden the Equals () method in object.

Let's take a look at the source code of the Equals method in object:

123 public boolean equals ( Object obj) {            return ( This == obj);  }

We all know that all objects have identities (memory addresses) and state (data), while "= =" compares the memory addresses of two objects, so the Equals () method using object is to compare the memory addresses of two objects for equality, even if Object1.equals ( OBJECT2) is true to indicate that EQUALS1 and EQUALS2 actually refer to the same object. Although sometimes the Equals () method of the object satisfies some of our basic requirements, we have to be clear that most of the time we are comparing two objects, this time the Equals () method of object is not allowed, So there's a string of these classes that rewrite the Equals method, and so on, double, Integer, Math .... Wait, these classes are all overriding the Equals () method, thus making a comparison of the content. I hope you don't confuse me.

Rewriting equals always overwrites hashcode

-------------------------------------------

Java.lnag.Object in the hashcode of the contract:
1. During an application execution, if the information used to compare an object's Equals method is not modified, the Hashcode method is called multiple times for that object, and it must consistently return the same integer.
2. If two objects are equal according to the Equals (object O) method, the Hashcode method that calls either object in both objects must produce the same integer result.
3. If two objects are not equal according to the Equals (object O) method, calling the Hashcode method of either object in both objects does not require a different integer result. However, if they can be different, the performance of the hash table may be improved.

According to the previous question, we have actually been able to explain this very simply, for example, to rewrite equals in a string to be based on content comparisons rather than memory addresses, so although equals is equal, it does not mean that memory addresses are equal. The definition of the Hashcode method indicates that the memory address is different, and the hashcode value that is not rewritten may be different. So it violates the second Convention.

Another object, such as new, new, is an object of equal value, calling the Equals method to return true, but their hashcode values are different, storing two objects in HashSet, which makes them contain two equal objects, because the hashcode value is retrieved first. Compared to the Equals method.

Introduction to the use of Hashcode methods

------------------------------------------------

Hash table data Structure common sense:
One, the hash table is based on an array.
Second, the disadvantage: array-based, array creation after the difficult to expand. When some hash tables are basically filled, performance drops very seriously.
Third, there is no easy way to iterate through the data items in a table in any order.
The hash table is unmatched in terms of speed and ease of use if you do not need to traverse the data in an orderly manner and anticipate the size of the data in advance.

First, why Hashcode is so important to objects:
The hashcode of an object is a simple hash algorithm implementation, although it and those real complex hash algorithm can not be called real algorithm, it how to implement it, not only programmer's programming level problem,
It's about your object. Access is a very important relationship for performance. It is possible that different hashcode may cause your object to access, hundreds of times the performance difference.
Let's take a look at two important data structures in Java: HashMap and Hashtable, although they are very different, such as inheritance relationships, different constraints on value (whether NULL is allowed), and thread safety, and so on, but from the implementation principle, They are consistent. So, we only use Hashtable to illustrate:
In Java, the performance of accessing data is, of course, the Hashtable array, but in a slightly larger container selection, there will be a higher query speed than array performance. See below for specific reasons.
Hashtable when storing data, the object's hashcode and 0x7fffffff are generally done and manipulated, because the hashcode of an object can be negative, This will ensure that it is a positive integer. Then, the length of the Hashtable is modeled to get the index of the object in the Hashtable.
index = (O.hashcode () & 0x7FFFFFFF)%hs.length;
This object will be placed directly at the hashtable of each index position, for writing, this array, and so on, put an object in the index position, but if it is a query, after the same algorithm, Hashtable can directly from index to obtain this object, The array has to be done in a circular comparison. So for a slightly larger amount of data, Hashtable's queries are more performance than arrays.
Since an object can directly locate its position in the Hashtable according to Hashcode, then why does the Hashtable use the key to do the mapping? This is the most important problem with the relationship Hashtable performance problem: hash conflict.
A common hash conflict is that different objects end up with the same index, and a very, if not uncommon, hash conflict is that if the number of objects is larger than the int range, and the length of the hashcode is only in the int range, it is important that the same set of elements have the same hashcode. In any case they will have the same index. Of course, this extreme situation is very rare, can be temporarily not considered, but for the same hashcode after the modulo, will produce the same index, or different objects have the same hashcode, of course, with the same index.
So for an object with the same index, multiple values are stored at the index position, and the values must be identified by key if they want to be correctly differentiated.
In fact, a good design of the hashtable, generally will be more evenly distributed each element, because the length of the Hashtable is always more than the actual number of elements by a certain proportion of self-increment (filling factor is generally 0.75) around, so most of the index position only one object, There are few elements in a few places. So each location in the Hashtable is a linked list, for only one object is the location, the list has only one first node (Entry), The next of entry is null. Then there is the Hashcode,key,value property that holds the Hashcode,key and value (the object itself) of the object at that location. If an object with the same index comes in, it goes to the next node in the list. If you have multiple objects in the same index, you can find an object in the linked list that matches the key queried by Hashcode and key.
From the above I can see that for HashMap and Hashtable access performance has a significant impact on the first is that the elements in the data structure should be as large as possible with different hashcode, although this does not guarantee that different hashcode produce different index, But the same hashcode must produce the same index, thus affecting the resulting hash conflict.
For an elephant, if you have a lot of attributes and all the attributes are involved in hashing, it's obviously a clumsy design. Because the Hashcode () method of an object is automatically called almost everywhere, such as equals, if too many objects are involved in hashing.
The required operating constant time will increase greatly. So, picking which attributes to participate in hashing is definitely a programming level problem.
From the implementation, the general Hashcode method would be this:
Return Attribute1.hashcode () Attribute1.hashcode (): [Super. Hashcode ()], we know that every time we call this method, we have to re-evaluate their hashcode operations on the participating hash objects in the method, and if the properties of an object are not changed, they are still evaluated every time, so if you set a tag to cache the current hash code, As long as the objects participating in the hash are recalculated, otherwise the hashcode of the cache is called, which can improve performance to a large extent.
The default implementation is to convert an object's internal address to an integer as Hashcode, which of course ensures that each object has a different hascode, because the internal address of the different objects must be different (nonsense), but the Java language does not allow the programmer to get the object's internal address, so Having a different hashcode for each object has many techniques to study.
If you sample the properties of a hashcode that can have an average distribution from multiple attributes, this is a contradiction between performance and diversity, and if all attributes are involved in hashing, of course the diversity of hashcode will be greatly improved, but the performance is sacrificed, and if only a small number of attributes are sampled for hashing, Extreme situations can result in a large number of hash conflicts, such as in the attribute of "person", if sex rather than name or date of birth, there will be only two or a few optional hashcode values, will result in more than half of the hash conflicts. So if possible, it would be a good choice to produce a sequence specifically to generate Hashcode (of course, the performance of the resulting sequence would be better than the performance of all attributes participating in the hash, otherwise it would be better to hash directly with all attributes).
How to achieve a balance of performance and diversity of hashcode, can refer to the relevant algorithm design book, actually does not necessarily require very good, As long as possible to reduce the aggregation of the hash value. It's important to remember that Hashcode has an important impact on our program performance and should always be noticed when designing.
Remember: If you want to use HashMap effectively, you will have to rewrite it in its hashcode ().
There are two principles for rewriting hashcode ():
You do not have to produce a unique hashcode for each different object, as long as your Hashcode method enables get () to get the contents of the put () to be put in. That is, "not a principle". The algorithm that generates hashcode as far as possible to make the value of hashcode scattered some, not many hashcode are concentrated in one range, this is advantageous to improve hashmap performance. That is, the principle of decentralization.
With these two principles in hand, you will be able to write your own program with good HashMap. I don't know. No, there are three methods available in Java.lang.Object: Clone (), Equals (), and hashcode () are typical, but in many cases are not applicable, they are simply the object's address to derive results. This requires us to rewrite them in our own programs, and in fact the Java class Library has rewritten hundreds of thousands of such methods. Using object-oriented polymorphism-overlay, Java's designers have built the Java architecture gracefully, and have also embodied the Java is a pure OOP language features.
Java provides collection and map capabilities that make your program more flexible and more efficient to implement. I hope this article can be better for everyone to use hashmap helpful.

What is the difference between equals () and hashcode ()?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.