Java Theory and Practice: hashing

Source: Internet
Author: User

Valid and correctly defined Hashcode () and Equals ()

Each Java object has a hashCode() and equals() method. Many classes Override The default implementations of these methods to provide deeper semantic comparability between object instances. In the Java Concepts and Practices section, Java Developer Brian Goetz introduces you to hashCode() equals() the rules and guidelines you should follow when creating Java classes for effective and accurate definition.

Although the Java language does not directly support associative arrays--any object can be used as an array of indexes--but the Object use of methods in the root class hashCode() clearly indicates the expected widespread use HashMap (and its predecessors Hashtable ). Ideally, hash-based containers provide effective insertion and efficient retrieval, and the ability to support hashing directly in object mode facilitates the development and use of hash-based containers.

Defining the equality of objects

ObjectThe class has two methods to infer the identity of the object: equals() and hashCode() . In general, if you override one of these, you must override both, because there is a vital relationship between the two that must be maintained. The special case is that, depending on the equals() method, if two objects are equal, they must have the same hashCode() value (although this is usually not true).

The semantics of a particular class are equals() defined on the left side of implementer, and the definition of what it means to a particular class equals() is part of its design effort. Objectthe default implementation provided simply references the following equation:

  public boolean equals (Object obj) {return (this = = obj);}

In this default implementation, the two references are equal only if they refer to the true same object. Similarly, the Object hashCode() default implementation provided is generated by mapping the memory address of an object to an integer value. Because the address space is greater than the range of values on some schemas, int hashCode() It is possible for two different objects to have the same. If you Override hashCode() , you can still use System.identityHashCode() the method to access such default values.

Override equals ()--simple instance

By default, equals() and hashCode() identity-based implementations are justified, but for some classes they want to loosen the definition of an equation. For example, the Integer class definition is equals() similar to the following:

  public boolean equals (Object obj) {    return (obj instanceof Integer             && intvalue () = = ((Integer) obj). Intva Lue ());  }

In this definition, the two objects are equal only if they contain the same integer value Integer . The combination will not be modifiable Integer , which makes it practical to use Integer as a HashMap keyword in. This value-based equal method can be used by all the original wrapper classes in the Java class Library, such as Integer , Float ,,, Character and Boolean String (if two String objects contain the same sequence of characters, they are equal). Because these classes are non-modifiable and can be implemented hashCode() equals() , they can all be used as good hash keywords.

Why Override equals () and hashcode ()?

If   Integer   not override  equals ()   and   hashcode () What will happen to  ? If we have never been in   HashMap   or other hash-based collections use   Integer   as a keyword, Nothing is going to happen. However, if we are in  , hashmap   Use such  , Integer   object As the keyword, we will not be able to reliably retrieve the associated value unless we get ()   calls are used with   put ()   calls in very similar   Integer   instance. This requires ensuring that only   that correspond to a specific integer value are used in our entire program; an integer   An instance of an object. Needless to say, this method is extremely inconvenient and frequently wrong.

ObjectThe interface contract requires that if the equals() two objects are equal, they must have the same hashCode() value. equals()Why are our root object classes needed when their recognition capabilities are all contained in hashCode() ? hashCode()method is purely used to improve efficiency. The Java Platform Designer anticipates the importance of a hash-based collection class (Collection Class) in a typical Java application, such as Hashtable , HashMap and HashSet , and equals() is computationally expensive to compare with many objects. Enabling all Java objects to support hashCode() and combine hash-based collections enables efficient storage and retrieval.

Back to top of page

Implementation of the requirements of equals () and Hashcode ()

Implementation equals() and hashCode() limitations, Object which are listed in the document. In particular equals() , the method must display the following properties:

    • Symmetry: Two references, a and b ,a.equals(b) if and only if b.equals(a)
    • Reflexivity: All non-null references,a.equals(a)
    • Transitivity:if a.equals(b) b.equals(c) and, thena.equals(c)
    • Consistency hashCode() with: Two equal objects must have the same hashCode() value

ObjectSpecifications are not explicitly required equals() and hashCode() must be consistent -their results will be the same in subsequent invocations, assuming "does not alter any information used in object equality comparisons." "It sounds like" the result of the calculation will not change unless the situation is so. "This fuzzy declaration is often interpreted as equality and hash value calculations should be deterministic to the object, not others."

Back to top of page

What does object equality mean?

It is easy to meet the requirements of the object class specification equals() hashCode() . Decide whether and how to Override equals() the addition of judgment, but also ask for other. In a simple non-repairable value class, such as Integer (in fact, almost all non-modifiable classes), the choice is fairly obvious-equality should be based on the equality of the basic object state. In the Integer case, the object's unique state is the basic integer value.

For modifiable objects, the answer is not always so clear. equals()and hashCode() Whether it should be based on the identity of the object (like the default implementation) or the state of the object (like Integer and string)? There is no simple answer-it depends on the intended use of the class. For elephants List and Map such containers, there is a debate about this. Most classes in the Java class Library, including container classes, are now provided and implemented according to the state of the object equals() hashCode() .

If the value of an object hashCode() can be changed based on its state, then when using such objects as a keyword in a hash-based collection, we must be careful to ensure that when they are used as hash keywords, we are not allowed to change their state. All hash-based collections assume that when an object's hash value is used as a keyword in the collection, it does not change. If the hash code of a keyword is changed when it is in the collection, it will produce some unpredictable and confusing results. This is usually not a problem in practice-we don't often use modifiable List objects like this as HashMap keywords in.

An example of a simple modifiable class is point, which is defined and based on the state equals() hashCode() . If two Point objects refer to the same (x, y) coordinates, Point the hash value is derived from the x y IEEE 754-bit representation of the coordinates value, then they are equal.

For more complex classes, equals() hashCode() The behavior may even be affected by superclass or interface. For example, the List interface requires that if and only another object is List, and that they have the same elements (defined by element) in the same order Object.equals() , the object is List equal to another object. hashCode()the need for more special--list hashCode() values must conform to the following calculations:

  Hashcode = 1;  Iterator i = List.iterator ();  while (I.hasnext ()) {      Object obj = I.next ();      Hashcode = 31*hashcode + (obj==null? 0:obj.hashcode ());  }

Not only does the hash value depend on the contents of the list, but it also specifies a special algorithm that combines the hash values of each element. (The String class prescribes a similar algorithm for calculating String the hash value.) )

Back to top of page

Write your own Equals () and Hashcode () methods

The override default equals() method is relatively straightforward, but if you do not violate the symmetric (symmetry) or transitive (transitivity) requirements, the override has been an equals() extremely tricky approach. When equals() you Override, you should always equals() include some Javadoc comments in to help users who want to extend your class correctly.

As a simple example, consider the following classes:

  Class A {    final B Somenonnullfield;    C Someotherfield;    int Somenonstatefield;  }

How should we write the equals() methods of this class? This approach applies to many situations:

  public boolean equals (Object) {    //not strictly necessary, but often a good optimization    if (this = other) C3/>return true;    if (! ( Other instanceof A))      return false;    A othera = (a) other;    Return       (Somenonnullfield.equals (Othera.somenonnullfield))        && ((Someotherfield = = null)             ? Othera.someotherfield = = NULL             : Someotherfield.equals (Othera.someotherfield)));  }

Now equals() that we have defined, we must define it in a uniform way hashCode() . A unified, but not always valid, definition hashCode() is as follows:

  public int hashcode () {return 0;}

This method generates a large number of entries and significantly reduces HashMap the performance of S, but it conforms to the specification. A more sensible hashCode() implementation should be this:

  public int hashcode () {     int hash = 1;    hash = hash * + Somenonnullfield.hashcode ();    hash = hash *                 + (Someotherfield = = null? 0:someotherfield.hashcode ());    return hash;  }

Note: Both implementations reduce the computational power of the Class state field equals() or a hashCode() certain percentage of the method. Depending on the class you are using, you may want to reduce the equals() computational power of superclass or part of a hashCode() feature. For the original field, there is a helper function in the associated wrapper class that can help create the hash value, such as Float.floatToIntBits .

equals()It is unrealistic to write a perfect method. In general, when you extend an equals() instantiable class that you override, override equals() is impractical and writing a method that will be override equals() (as in an abstract class) differs from writing a method for a specific class equals() . 。 For more detailed information on examples and descriptions, see effective Java Programming LanguageGuide, Item 7 (resources).

Back to top of page

Need to be improved?

Building a hash into the root object class of the Java class Library is a very sensible design tradeoff-it makes using a hash-based container so simple and efficient. However, there are many criticisms about the methods and implementation of hashing algorithm and object equality in Java class Library. java.utila hash-based container is very convenient and easy to use, but may not be suitable for applications that require very high performance. While most of these will not change, you must consider these factors when you design an application that relies heavily on hash-based container efficiency, including:

  • The hash range is too small. The int long return type, instead of being used, hashCode() increases the odds of a hash conflict.
  • Bad hash value assignment. The hash values for short strings and small integers are their own small integers, which are close to the hash values of other "neighboring" objects. A hash function for a guided moment (well-behaved) will distribute the hash value more evenly within the hash range.
  • No hash operation is defined. Although some classes, such as String and List , define the hashing algorithm used to combine the hash values of their element into a hash value, the language specification does not define any approved method that combines the hash values of multiple objects into the new hash values. The trick we used to write our own equals () and the Hashcode () method List , String or the instance class, is A simple, but is far from perfect in arithmetic. The class library does not provide a convenient implementation of any hashing algorithms, which simplifies the creation of more advanced hashCode() implementations.
  • equals()It is difficult to write when extending the Instantiable class that has been Override. equals() When extending equals() the Instantiable class that has been Override, the equals() "obvious" way to define it does not satisfy the equals() symmetric or transitive requirements of the method. This means that when you Override equals() , you must understand the structure and implementation details of the class you are extending, and even expose the secret field in the base class, which violates the principle of object-oriented design.

Back to top of page

Conclusion

Through a unified definition equals() and you can promote the usability of a hashCode(), class as a keyword in a hash-based collection. There are two ways to define equality and hash values for an object: based on identity, which is the Object default method provided, and based on state, which requires Override equals() and hashCode() . When the state of an object changes, if the hash value of the object changes, you are sure that you are not allowed to change its state more when the status is used as a hash key.

Resources
  • You can refer to the original English text on the DeveloperWorks global site in this article.
  • Participate in the discussion forum of this article. (You can also click on the discussion at the top or bottom of this article to enter the Forum.) )
  • Read a complete set of Java theory and practice articles written by Brian Goetz. Especially in the February 2003, "Java Theory and Practice: Change or unchanged?" , "It discusses the dangers of using mutable objects as hash keywords.
  • The 7th and 8 parts of the Joshua Bloch masterpiece are the Java programming language guide , detailing the equals() issues surrounding and hashCode() .
  • Tony Sintes in this article provided by Javaworld explains how the hash-based container works and how it is used equals() and hashCode() (July 2002).
  • On the slide, Robert Uzgalis of the Department of Computer Science at Oakland University in New Zealand introduces some of the criticisms of the Java hashing pattern and explains the problems behind some hash functions.
  • Mark Roulo in his own article "How to avoid traps and correct override Java.lang.Object" ( Javaworld, January 1999) article provides some of the Override equals() and hashCode() the real Example program code.
  • This technical report from the Computer Science Department of the University of Canterbury in New Zealand describes in detail what the makes an effective hash function (PDF).
  • IBM Software Lab software engineer Sreekanth Iyer explores the various meanings of Java object Equality (developerWorks, September 2002).
  • Javaworld The hint of a license to reprint is a slight mention of the flaw in equality comparison.
  • Hundreds of references to Java technology can be found in the developerWorksJava Technology Zone.

Java Theory and Practice: hashing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.