Java Theory and Practice: Hash

Source: Internet
Author: User
Java Theory and Practice: hash-Linux general technology-Linux programming and kernel information. The following is a detailed description. Each Java object has the hashCode () and equals () methods. The default implementation of many class-ignore (Override) methods to provide deeper semantic comparability between object instances. In the Java concepts and practices section, Java developer Brian Goetz describes the rules and guidelines to be followed when creating Java classes to effectively and accurately define hashCode () and equals. You can discuss your views on this article with the author and other readers in the Forum. (You can also click the discussion at the top or bottom of this article to enter the Forum .)
Although Java does not directly support associating Arrays-any Object can be used as an index array-But hashCode () is used in the root Object class () the method explicitly indicates that HashMap (and its predecessor Hashtable) is expected to be widely used ). In ideal cases, the efficient insertion and retrieval of containers based on hashes are provided. The support of hashes in object mode can promote the development and use of hashes-based containers.

Define object equality
The Object class has two methods to infer the Object identifier: equals () and hashCode (). In general, if you ignore either of them, you must ignore both of them at the same time, because there must be a crucial relationship between the two. In special cases, according to the equals () method, if two objects are equal, they must have the same hashCode () value (although this is usually not true ).

The equals () semantics of a specific class is defined on the left side of Implementer; equals () defines what is part of its design work for a specific class. The default implementation provided by the Object simply references the following equation:


Public boolean equals (Object obj) {return (this = obj );}



In this default implementation, the two references are equal only when they reference the same object. Similarly, the default implementation of the hashCode () provided by the Object is generated by reflecting the memory address of the Object to an integer. In some architectures, because the address space is larger than the int value range, it is possible that two different objects have the same hashCode. If hashCode () is ignored, you can still use the System. identityHashCode () method to access such default values.

Ignore equals ()-simple instance
By default, the implementation of equals () and hashCode () based on identity is reasonable, but for some classes, they want to relax the definition of equations. For example, the Integer class definition equals () is similar to the following:


Public boolean equals (Object obj ){
Return (obj instanceof Integer
& IntValue () = (Integer) obj). intValue ());
}



In this definition, the two Integer objects are equal only when they contain the same Integer value. Combined with unmodifiable Integer, this makes it feasible to use Integer as the keyword in HashMap. This value-based Equal method can be used by all original encapsulation classes in the Java class library, such as Integer, Float, Character, Boolean, AND String (if two String objects contain characters of the same sequence, they are equal ). Because these classes are unchangeable and can implement hashCode () and equals (), they can all be used as good hash keywords.

Why ignore equals () and hashCode ()?
What if Integer does not ignore equals () and hashCode? If we never use Integer as a keyword in HashMap or other hash-based sets, nothing will happen. However, if we use such Integer objects as keywords in HashMap, we cannot reliably retrieve related values unless we use put () in get () calls () an Integer instance that is extremely similar in the call. This requires that only one instance of an Integer object corresponding to a specific Integer can be used in our entire program. Needless to say, this method is extremely inconvenient and has frequent errors.

Object interface contract requires that if two objects are equal according to equals (), they must have the same hashCode () value. Why does our root object class need hashCode () when its recognition capability is included in equals ()? The hashCode () method is purely used to improve efficiency. Java platform designers predict the importance of Collection classes in typical Java applications, such as Hashtable, HashMap, and HashSet, and use equals () comparing with many objects is very expensive in computing. This allows all Java objects to support hashCode () and use a hash-based set for effective storage and retrieval.

Implement equals () and hashCode () Requirements
There are some restrictions on implementing equals () and hashCode (), which are listed in the Object file. In particular, the equals () method must display the following attributes:

Handle ry: two references, a and B, a. equals (B) if and only if B. equals ()
Reflexivity: all non-empty references, a. equals ()
Transitials: If a. equals (B) and B. equals (c), then a. equals (c)
Consistency with hashCode (): two equal objects must have the same hashCode () value.
The Object specification does not explicitly require that equals () and hashCode () must be consistent-their results will be the same in subsequent calls, assume that "No information used in object equality comparison is changed." It sounds like "The calculation results will not change unless the actual situation is the case ." This Fuzzy statement is generally interpreted as equal and hash value calculation should be the object's deterministic function, rather than other.

What does object equality mean?
It is easy to meet the equals () and hashCode () requirements of Object class specifications. Determine whether or not and how to ignore equals () in addition to judgment, but also require other. In simple unmodifiable value classes, such as Integer (in fact almost all unmodifiable classes), the choice is quite obvious-equality should be based on the equality of the basic object state. In Integer cases, the unique state of an object is a basic Integer.

For modifiable objects, the answer is not always so clear. Should equals () and hashCode () Be based on the object's identity (like the default implementation) or the object's status (like Integer and String )? There is no simple answer-it depends on the plan of the class. For containers like List and Map, there is a lot of debate about this. Most classes in the Java class library, including the container class, provide equals () and hashCode () implementation based on the object status when errors occur.

If the hashCode () value of an object can be changed based on its state, we must note that this type of object is used as a key word in a hash-based set, make sure that when they are used as hash keywords, we are not allowed to change their status. All hash-based set assumptions do not change when the object's hash value is used as a key word in the set. If the hash code of a keyword is changed when it is in a set, unpredictable and confusing results are generated. In practice, this is usually not a problem-we do not often use modifiable objects like List as keywords in HashMap.

An example of a simple modifiable class is Point, which defines equals () and hashCode () based on the state (). If two Point objects reference the same (x, y) coordinates, the hashed values of Point are represented by IEEE 754-bit of the x and y coordinate values, they are equal.

For complex classes, the behaviors of equals () and hashCode () may even be affected by superclass or interface. For example, the List interface requires that if only one Object is List and they have the same Elements in the same order (the Object on the Element. equals () definition), the List object is equal to another object. HashCode () needs more special -- The hashCode () value of list must meet the following calculation:


HashCode = 1;
Iterator I = list. iterator ();
While (I. hasNext ()){
Object obj = I. next ();
HashCode = 31 * hashCode + (obj = null? 0: obj. hashCode ());
}



Not only does the hash value depend on the content of the list, but it also specifies a special algorithm that combines the hash value of each Element. (Similar algorithms of the String class are used to calculate the hash value of the String .)

Compile your own equals () and hashCode () Methods
Ignoring the default equals () method is relatively simple, but it is extremely tricky to ignore the ignored equals () method if it does not violate symmetric or Transitivity requirements. When you ignore equals (), you should always include some Javadoc comments in equals () to help users who want to correctly extend your class.

As a simple example, consider the following classes:


Class {
Final B someNonNullField;
C someOtherField;
Int someNonStateField;
}



How should we compile the equals () method of this class? This method applies to many situations:


Public boolean equals (Object other ){
// Not strictly necessary, but often a good optimization
If (this = other)
Return true;
If (! (Other instanceof ))
Return false;
A otherA = (A) other;
Return
(SomeNonNullField. equals (otherA. someNonNullField ))
& (SomeOtherField = null)
? OtherA. someOtherField = null
: SomeOtherField. equals (otherA. someOtherField )));
}



Now we have defined equals (). We must define hashCode () in a uniform way (). A unified but not always effective way to define hashCode () is as follows:


Public int hashCode () {return 0 ;}



This method will generate a large number of entries and significantly reduce the performance of HashMaps, But it complies with the specifications. A more reasonable hashCode () implementation should be like this:


Public int hashCode (){
Int hash = 1;
Hash = hash * 31 + someNonNullField. hashCode ();
Hash = hash * 31
+ (SomeOtherField = null? 0: someOtherField. hashCode ());
Return hash;
}



Note: Both of these implementations reduce the computing power of the equals () or hashCode () method of the class state field to a certain proportion. Depending on the class you use, you may want to reduce the computing power of the superclass equals () or hashCode () function. For Original fields, the helper function is available in the relevant encapsulation class to help create hash values, such as Float. floatToIntBits.

Writing a perfect equals () method is unrealistic. In general, it is impractical to ignore equals () When you expand an instantiable class that ignores equals (), and write the ignored equals () method (such as in an abstract class) it is different from compiling the equals () method for a specific class. For more information about the instance and description, see objective Java Programming Language Guide and Item 7 (references ).

To be improved?
Constructing the hash Method to the root object class of the Java class library is a wise design compromise-it makes it so simple and efficient to use a hash-based container. However, many people have criticized the methods and implementation of hash algorithms and object equality in Java class libraries. Hash-based containers in java. util are very convenient and easy to use, but may not be suitable for applications that require very high performance. While most of them will not change, you must consider these factors when designing applications that depend heavily on the efficiency of hashed containers, including:

The hash range is too small. Using int instead of long as the return type of hashCode () increases the probability of hash conflicts.


Bad hash value allocation. The hash values of short strings and small integers are their own small integers, which are close to the hash values of other "adjacent" objects. A Well-behaved hash function distributes hash values more evenly within the hash range.


Undefined hash operation. Although some classes, such as String and List, define the hash algorithm used to combine the hash value of its Element into a hash value, however, the language specification does not define any method for combining hash values of multiple objects into new hash values. The List, String, or instance Class A discussed in the preceding equals () and hashCode () methods are easy to use, but the arithmetic is far from perfect. The class library does not provide any hash algorithm for easy implementation. It can simplify the creation of more advanced hashCode () implementation.


It is difficult to write equals () when the extended instantiable class has ignored equals (). When the extension has ignored the equals () instantiable class, the "obvious" Method Defining equals () cannot meet the symmetric or pass-through requirements of the equals () method. This means that when you ignore equals (), you must understand the structure and implementation details of the class you are extending, and even expose confidential fields in the basic class, it violates the object-oriented design principles.
Conclusion
By uniformly defining equals () and hashCode (), you can improve the usage of classes as keywords in a hash-based set. There are two ways to define the equality and hash value of an Object: Based on the identifier, It is the default method provided by the Object; Based on the status, it requires that equals () and hashCode () be ignored (). When the object state changes, if the object's hash value changes, you are sure that you cannot change the state when the State is used as a hash keyword.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.