Use of equals () and hashcode () for parsing Java objects 
 
 
 
Use of equals () and hashcode () for parsing Java objects
 
Preface
 
 
 
In Java, equals () and hashcode () functions work closely together. If you design one of them, you need to design another one. In most cases, these two functions do not need to be considered. Simply use their default design. However, in some cases, it is best to design these two functions to ensure the normal operation of the entire program. The most common is that when an object is added to a collection object, these two functions must be designed by themselves. A more refined definition is: if you want to put an object a into another collection object B, or use this object A to find the key of a meta object in collection object B, it also supports the inclusion or deletion of meta objects in collection object B. Therefore, the equals () and hashcode () functions must be defined by the developer. In other cases, these two functions do not need to be defined.
 
Equals ():
 
It is used to compare two objects and compare object content. Of course, it can also be used to compare object reference values. What is the comparison of object reference values? It is worth comparing two variables. We all know that the value of a variable is actually a number, which can be seen as the code used to identify different objects. The comparison of two objects refer to the value, that is, the comparison of two numbers and two codents. This comparison is the default object comparison method. In the object, this method has been designed. Therefore, you do not need to rewrite it on your own, wasting unnecessary time.
 
The comparison of object content is the true purpose of designing equals (). the Java language has the following requirements for equals (), which must be followed. Otherwise, you should not waste time:
 
 
 - Symmetry: If X. Equals (y) returns "true", then Y. Equals (x) returns "true ".
- Reflex: X. Equals (x) must return "true ".
- Analogy: If X. equals (y) returns "true", and Y. equals (z) returns "true", then z. equals (x) should also return "true ".
- There is also consistency: If X. equals (y) returns "true", as long as the content of X and Y remains unchanged, No matter you repeat X. "True" is returned for the number of equals (y) times ".
- In any case, X. Equals (null) always returns "false"; X. Equals (and X objects of different types) always returns "false ".
Hashcode ():
This function returns an integer code used for Hexi operations. Do not confuse this code with the Code represented by the variable mentioned above. The latter is not only a code, but also a function to find the object location in memory. The value returned by hashcode () is used to classify the location of an object in a specific collection object. These objects are hashmap, hashtable, hashset, and so on. This function and the above equals () function must be designed by yourself to assist hashmap, hashtable, hashset, and so on in searching and locating a large number of objects collected by yourself.
 
How do these collection objects work? Imagine that each meta-object hashcode is the encoding of a box. According to the encoding, each meta-object is included in the corresponding box according to the Code provided by hashcode. All the boxes add up to a hashset, hashmap, or hashtable object. When we need to find a meta object, we should first look at its code, which is the integer value returned by hashcode, in this way, we find the box where it is located, and then in the box, each meta object is taken out one by one and compared with the object we are looking for. If the content of the two objects is equal, our search is over. This operation requires two important information: The hashcode () of the object, and the comparison of the object content.
 
The relationship between the return value of hashcode () and equals () is as follows:
 
 
 - If X. Equals (y) returns "true", the hashcode () of X and Y must be equal.
- If X. Equals (y) returns "false", the hashcode () of X and Y may be equal or different.
The reason for these two rules is actually very simple. For hashset, hashset can have one or more boxes, there can be one or more unique meta objects in the same box (the hashset must contain unique meta objects ). This example shows that a meta object can have the same hashcode as other meta objects. However, a metadata object can only be the same as a metadata object with the same content. Therefore, these two rules must be established.
 
Note the following when designing these two functions: 
If the object type you designed does not use a collection object, you do not need to design the processing methods for these two functions. This is the correct object-oriented design method. Do not design any features that users cannot use at the moment, so as to avoid the trouble of function expansion in the future.
 
If you want to be specific during design and do not comply with the above two sets of rules, we recommend that you do not want to do anything you want. I have never met any developer or I said that designing these two functions violates the two rules mentioned above. When I encounter these violations, they are all handled as design errors.
 
When an object type is used as a collection object meta object, this object should have its own design for processing equals (), and/or processing hashcode, in addition, we must abide by the two principles mentioned above. Equals () First, check whether null is of the same type. The same type is checked to avoid the loss of exceptions such as classcastexception. Null is checked to avoid the loss of exceptions such as nullpointerexception.
 
If your object contains too much data, the two functions equals () and hashcode () will become less efficient. If the object has data that cannot be serialized, equals () may encounter an error in the operation. Imagine an object X, whose integer data type is transient (it cannot be converted into binary data streams by serialize ). However, equals () and hashcode () depend on this integer data. Is this object the same before and after serialization? The answer is different. Because the integer data before serialization is valid data, after serialization, the value of this integer data is not stored, and then converted from the binary data stream to the object, the statuses of the two (objects before and after serialization) are different. This is also worth noting.
 
 
 
Implementation
 
 
 
1. First, the equals () and hashcode () methods are inherited from the object class.
The equals () method is defined in the object class as follows:
Public Boolean equals (Object OBJ ){
Return (this = OBJ );
}
 
It is obvious that the address values of the two objects are compared (that is, whether the reference is the same)
 
However, classes that inherit objects will override the equals method to achieve content comparison:
 
 
 
2. The second is the hashcode () method, which is defined in the object class as follows:
Public native int hashcode ();
The description is a local method, which is implemented based on local machines. Of course, we can overwrite the hashcode () method in the class we write.
 
The implementation of string hashcode () is as follows:
 
Public int hashcode (){
Int H = hash;
If (H = 0 ){
Int off = offset;
Char Val [] = value;
Int Len = count;
 
For (INT I = 0; I <Len; I ++ ){
H = 31 * H + val [Off ++];
}
Hash = h;
}
Return h;
}
Explain this program (written in the string API ):
S [0] * 31 ^ (n-1) + s [1] * 31 ^ (n-2) +... + s [n-1]
The Int algorithm is used. Here, s [I] is the I character of the string, n is the length of the string, and ^ represents the power. (The hash code of the Null String is 0 .)
 
 
 
First, to understand the role of hashcode, you must first know the set in Java.
In general, collections in Java have two types: List and set.
Do you know the differences between them? The elements in the former set are ordered, and the elements can be repeated. The latter elements are unordered, but the elements cannot be repeated.
So here is a serious problem: to ensure that the elements are not repeated, what is the basis for determining whether the two elements are repeated?
This is the object. Equals method. However, if each added element is checked once, when there are many elements, the number of times that the elements added to the set are compared is very large.
That is to say, if there are already 1000 elements in the Set, it will call the 1,001st equals method when 1000 elements are added to the set. This will obviously greatly reduce the efficiency.
Therefore, Java uses the principle of hash tables. Hash is actually a personal name. As he proposed a hash algorithm, he named it.
A hash algorithm is also called a hash algorithm. It directly specifies an address based on a specific data algorithm. If you want to describe the hash algorithm in detail, more articles are required. I will not describe it here.
As a beginner, The hashcode method actually returns the physical address of the Object Storage (which may not actually be ).
In this way, when a set needs to add a new element, the hashcode method of this element is called first, and the physical location where it should be placed can be located at once.
If there are no elements in this position, it can be directly stored in this position without any comparison. If there are already elements in this position,
You can call its equals method to compare it with the new element. If it is the same, it will not be saved. If it is different, other addresses will be hashed.
Therefore, there is a conflict resolution problem. In this way, the number of actually called equals methods is greatly reduced, and it takes almost one or two times.
Therefore, Java specifies the eqauls method and hashcode method as follows:
1. If the two objects are the same, their hashcode values must be the same; 2. If the two objects have the same hashcode, they are not necessarily the same as the objects mentioned above. They are compared using the eqauls method.
Of course you can do it as required, but you will find that the same object can appear in the Set set. At the same time, the efficiency of adding new elements will be greatly reduced.
 
3. here we need to understand the following question:
Two objects with equal equals () must have equal hashcode;
Equals () is not equal to two objects, but it cannot prove that their hashcode () is not equal. In other words, hashcode () may be equivalent to two objects whose equals () method is not equal. (In my understanding, the hash code is generated in a conflict ).
In turn, hashcode () is not equal. Equals () is always available. hashcode () is equal. Equals () may be equal or not. To explain the scope of use at, I understand that it can be used in objects, strings, and other classes. In the object class, the hashcode () method is a local method and returns the address value of the object. The equals () method in the object class compares the address values of the two objects, if equals () is equal, the address values of the two objects are equal, and hashcode () is equal. In the string class, equals () returns a comparison of the content of the two objects, when two objects have the same content,
The hashcode () method analyzes the code based on the string class rewriting (analyzed in point 2nd). You can also know that the returned results of hashcode () are equal. Similarly, we can know that the overwritten equals () and hashcode () methods in integer and double encapsulation classes are also suitable for this principle. Of course, the class that has not been overwritten will also follow this principle after it inherits the equals () and hashcode () Methods of the object class.
 
4. When talking about hashcode () and equals (), we can't help but talk about the usage of hashset, hashmap, and hashtable. For details, see the following analysis:
Hashset inherits the set interface and implements the collection interface. This is a hierarchical relationship. So what principle does hashset use to access objects?
Repeated objects are not allowed in hashset, And the element location is also unknown. In hashset, how does one determine whether the elements are repeated? This is the key to the problem. After an afternoon's query and verification, I finally got some inspiration. I would like to share with you that in the Java Collection, the rules for determining whether two objects are equal are:
1), judge whether the hashcode of the two objects is equal
If they are not equal, the two objects are considered not equal.
If equal, transfer 2)
(This is only required to improve storage efficiency. In theory, it is not acceptable. However, if it is not, the actual usage of the aging rate will be greatly reduced. Therefore, we need it here. This issue will be highlighted later .)
2) determine whether two objects are equal using the equals operation
If they are not equal, the two objects are considered not equal.
If the two objects are equal, equals () is the key to determining whether the two objects are equal)
Why are there two principles? Can't I use the first one? No, because as mentioned earlier, the equals () method may not be equal when hashcode () is equal. Therefore, you must use the 2nd rules to ensure that non-repeating elements are added.
 
 
 
 
 
For example, the following code:
 
Public static void main (string ARGs []) {
String S1 = new string ("zhaoxudong ");
String S2 = new string ("zhaoxudong ");
System. Out. println (S1 = S2); // false
System. Out. println (s1.equals (S2); // true
System. Out. println (s1.hashcode (); // s1.hashcode () equals s2.hashcode ()
System. Out. println (s2.hashcode ());
Set hashset = new hashset ();
Hashset. Add (S1 );
Hashset. Add (S2 );
/* In essence, when adding S1 and S2, we can use the two principles mentioned above to understand that hashset considers S1 and S2 to be equal and that duplicate elements are added, so let S2 overwrite S1 ;*/
Iterator it = hashset. iterator ();
While (it. hasnext ())
{
System. Out. println (it. Next ());
}
At last, only a "zhaoxudong" is printed during the while loop ".
The output result is: false.
True
-967303459
-967303459
This is because the string class has already overwritten the equals () and hashcode () methods. Therefore, according to the above article 1.2, hashset considers them to be equal objects, added again.
But look at the following program:
Import java. util .*;
Public class hashsettest
{
Public static void main (string [] ARGs)
{
Hashset HS = new hashset ();
HS. Add (new student (1, "zhangsan "));
HS. Add (new student (2, "Lisi "));
HS. Add (new student (3, "wangwu "));
HS. Add (new student (1, "zhangsan "));
 
Iterator it = HS. iterator ();
While (it. hasnext ())
{
System. Out. println (it. Next ());
}
}
}
Class student
{
Int num;
String name;
Student (INT num, string name)
{
This. num = num;
This. Name = Name;
}
Public String tostring ()
{
Return num + ":" + name;
}
}
Output result:
1: zhangsan
1: zhangsan
3: wangwu
2: Lisi
The problem arises. Why does hashset add equal elements? Is this contrary to the hashset principle? The answer is: no
Because when we compare the newly created student (1, "zhangsan") Objects Based on hashcode (), different hash code values are generated, so hashset treats him as a different object. Of course, the values returned by the equals () method at this time also vary (this does not need to be explained ). So why does it generate different hash code values? Didn't we generate the same hash code when comparing S1 and S2? The reason is that the student class we wrote does not repeat the hashcode () and equals () methods. Therefore, during comparison, it is the hashcode () method in the inherited object class, remember what the hashcode () method in the object class compares !!
It is a local method that compares the object address (reference address) and creates an object using the new method, of course, the two generated objects are different (you can understand this ...), The result is that the values returned by the hashcode () of the two objects are different. Therefore, according to the first criterion, hashset treats them as different objects, and naturally does not need the second criterion for determination. So how can we solve this problem ??
The answer is: Re-hashcode () and equals () methods in the student class.
For example:
Class student
{
Int num;
String name;
Student (INT num, string name)
{
This. num = num;
This. Name = Name;
}
Public int hashcode ()
{
Return num * name. hashcode ();
}
Public Boolean equals (Object O)
{
Student s = (student) O;
Return num = S. Num & name. Equals (S. Name );
}
Public String tostring ()
{
Return num + ":" + name;
}
}
Based on the override method, even if new student (1, "zhangsan") is called twice, when we obtain the object's hash code, according to the override method hashcode (), the obtained hash code must be the same (there is no doubt about this ).
Of course, based on the equals () method, we can also judge that it is the same. So they are treated as repeated elements when added to the hashset set. So when we run the modified program, we will find that the running result is:
1: zhangsan
3: wangwu
2: Lisi
We can see that the duplicate element problem has been eliminated.
In the pojo class of hibernate, the problem of re-equals () and hashcode () is as follows:
1) the focus is on equals. Rewriting hashcode is only a technical requirement (to improve efficiency)
2) Why rewrite equals? In Java's collection framework, equals is used to determine whether two objects are equal.
3) In hibernate, the Set set is often used to store related objects, and the Set set cannot be repeated. Let's talk about how to judge whether the object is the same when adding elements to a hashset set. We mentioned two principles above, but we only need to rewrite equals.
However, when there are many elements in a hashset, or the rewritten equals () method is complicated, we only use the equals () method for comparison and judgment, and the efficiency will be very low, therefore, the hashcode () method is introduced to improve efficiency, but I think this is very necessary (so we will judge whether the elements of the hashset are repeated using the previous two principles ).
For example, you can write as follows:
Public int hashcode (){
Return 1;} // equivalent to invalid hashcode
The result of this operation is that it cannot be judged when comparing hash codes, because each object returns a hash code of 1 and each time it must be compared to equals () the method can be used to determine whether it is repeated, which will greatly reduce the efficiency.
I have a question. If the necessary method for determining whether elements are repeated in a hashset as mentioned above is the equals () method (based on the point found online ), however, the hash table is not involved here, but this set is called hashset. Why ??
I think the storage operations in hashmap and hashtable still follow the above rules. So I will not talk about it here. These are what I have summarized when I read books and query materials online. Some codes and languages are quoted, but they are indeed summed up by myself. If you have any errors or are not clear about the details, you can also point out that I am also a beginner, so it is inevitable that there will be errors. I hope you can discuss them together.