"Turn" about the 3 things you need to know about Hashcode ()

Source: Internet
Author: User
Tags sha1

In Java, every object has a Hashcode method that is easy to understand but still sometimes forgotten or misused. There are 3 things to keep in mind at all times to avoid common pitfalls.

The hash code of an object allows the algorithm and data structure to put objects into compartments, just like the type of letters in a printer type case. The printer places all the "a" types in a room, and when it looks for this "a" it just needs to be found in this room. This simple system makes it faster to look for types in unsorted drawers. This is also an idea based on a collection of hashes, such as HashMap and HashSet.

In order for your class to work properly with other hash-based collections or other hash-dependent algorithms, all hashcode implementations must adhere to a simple contract.

Hashcode contract

This contract is elaborated in the JavaDoc of the Hashcode method. It can be broadly summed up as follows:

    1. In a running process, equal objects must have the same hash code
    2. Please note that this does not imply the following common misconceptions:
    3. Unequal objects must have different hash codes--wrong!
    4. An object with the same hash value must be equal--wrong!

This contract allows different objects to share the same hash code, for example, according to the description in, "A" and "μ" object hash value is the same. In mathematical terms, a mapping from an object to a hash code is not necessarily an internal or a double-shot. This is obvious because the number of possible different objects is often larger than the number of possible hashes (2^32).

Edit: In the earlier version, I mistakenly thought that the hash code mapping must belong to the injective, but not necessarily double-shot, which is obviously wrong. Thank Lucian for pointing out the mistake.

This Convention directly leads to the first rule:

1. Whenever you implement the Equals method, you must implement the Hashcode method at the same time. /strong>

How to write a good hashcode method is not within the scope of this article, in Joshua Bloch very popular book "effective Java is well illustrated and should not be missing from the Java developer's bookshelf.

"Do you need professional advice on your project?" Our Developer support will solve the problem for you. Find out more tips on how to write a clean code on our software craftsmanship page. 】

For security reasons, let the Eclipse IDE produce the Equals and Hashcode methods one at a time: Source > Generate hashcode () and Equals () ....

To protect yourself, you can also configure Eclipse to detect classes that implement the Equals method but do not implement the Hashcode method, and display an error. Unfortunately, this option defaults to "Ignore": Preferences > Java >compiler > Errors/warnings, and then uses the Quick filter to search for "hashcode":

Update: As Laurent points out, Equalsverifier is a powerful tool used to validate the conventions of hashcode and equals methods. You should consider using it in your unit tests.

Hash code conflict

At any time, two different objects have the same hash code, which we call a conflict. Conflict does not matter, it just means that there are multiple objects in the same space, so HashMap will check again to find the right object. A large number of conflicts will degrade the performance of the system, but they will not result in incorrect results.

But if you mistakenly think that a hash code is a unique handle to an object, such as using it as a key to a map, you sometimes get the wrong object. Because although the conflict is very rare, they are unavoidable. For example, the characters "Aa" and "BB" produce the same hash code: 2112. So:

2. Never misuse a hash code as a key

You might object, unlike the printer type example, in Java, there is 4,294,967,296 space (2^32 a possible integer value). 4 billion slots, conflict seems extremely unlikely, right?

It turns out that it's not unlikely. This is a surprising clash: Imagine having 23 random people in a room. How much do you think two people are likely to have a birthday on the same day? Very low, because there are 365 days in a year? In fact, the odds are around 50%! 50 humans are conservative estimations. This phenomenon is called the birthday paradox. Applied to a hash code, which means that there is a 50% likelihood of conflict in 77,163 different objects – Suppose you have an ideal hash function that evenly distributes the objects within all available space.

For example:

Enron's e-mail set contains 520,924 emails. When calculating the hash code for the e-mail content string, I found that 50 pairs (or even 2 triples) have the same hash code for different e-mails. For 500,000 strings, this is a good result. But here's the message: If you have a lot of data elements, the conflict will happen. If you are using a hash code as key, you will not immediately notice your error. But a few people get the wrong email.

Hash code variable

Finally, in a hash-code contract, one of the most important details is quite surprising: Hashcode does not guarantee that the same results are obtained in different application executions. Let's take a look at the Java documentation:

In the execution of a Java application, for the same object, the Hashcode method must always return the same integer, but this integer does not reflect information about whether the object has been modified (equals comparison). Different executions of the same application, the integer does not have to be consistent.

In fact, this is not common, and some classes in class libraries even specify the exact formula (such as a string) that they use to calculate the hash code. For these classes, the hash code will always be the same. Although the implementation of most hash codes provides a stable value, you cannot rely on this. As this article points out, some classes of libraries return different hashes in different processes, which can sometimes be confusing. Google's Protocol buffers is an example.

Therefore, you should not use hash codes in distributed applications. A remote object may have a different hash code than the local object, even if the two objects are equal.

3. Do not use hash codes in distributed applications

In addition, you should be aware that the functionality implementation of the hash code from one version to another may change. Therefore, your code should not rely on any particular hash code value. For example, you should not use a hash code to persist the state. The next time you run the program, the hash code for the "Same" object may be different.
The best advice might be to use no hash code at all, unless you create a hash-based algorithm yourself.

An alternative approach: SHA1

You may know that the cryptographic hash code SHA1 is sometimes used to identify objects (for example, Git does this). Is it not safe? No. SHA1 uses a 160-bit key, which makes the conflict almost impossible. Even if there are many objects, the odds of a collision in this space are much lower than a meteor hitting the computer you are executing the program on. This article gives a good overview of the probability of conflict.


Source: >  

From for notes (Wiz)

"Turn" about the 3 things you need to know about Hashcode ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.