Guava 10-Hash

Source: Internet
Author: User
Tags google guava

Overview

Java built-in hash code [hash code] concept is limited to 32-bit, and there is no discrete column algorithm and the data they are acting, it is difficult to replace the alternative algorithm. In addition, hash codes implemented using the Java built-in approach are often inferior, in part because they ultimately depend on the inferior hash codes already in the JDK class.

Object.hashcode tend to be fast, but they are weak in preventing collisions, and there is no expectation of dispersion. This makes them well suited for use in a hash table, as the additional collisions will only result in a slight performance penalty, while the poor dispersion can easily be corrected by re-hashing (all reasonable hash lists in Java are hashed). However, in a hash application other than a simple hash table, Object.hashcode almost always does not reach the requirements--hence, the Com.google.common.hash package.

Composition of the hash package

In this package of Java doc, we can see a lot of different classes, but the documentation doesn't clearly show how they work together. Before introducing the classes in the hash package, let's take a look at the following code example:

Hashfunction HF = HASHING.MD5 (); Hashcode HC = Hf.newhasher ()        . Putlong (ID).        putstring (name, Charsets.utf_8)        . PutObject (Person, Personfunnel)        . hash ();
Hashfunction

Hashfunction is a simple (transparent), stateless method that maps arbitrary blocks of data to a fixed number of bits, and guarantees that the same input will produce the same output, and that different inputs produce different outputs as much as possible.

Hasher

An instance of Hashfunction can provide a stateful hasher,hasher that provides a smooth syntax for adding data to a hash operation and then fetching the hash value. Hasher can accept all native types, byte arrays, fragments of byte arrays, character sequences, character sequences for specific character sets, and so on, or any object given a funnel implementation.

Hasher implements the Primitivesink interface, which defines the fluent-style API for objects that accept native-type streams

Funnel

Funnel describes how to break a specific object type into the original Word segment value, thus writing to the Primitivesink. For example, if we have such a class:

Class Person {    final int id;    Final String FirstName;    Final String LastName;    final int birthyear;}

The corresponding funnel implementations may be:

funnel<person> Personfunnel = new funnel<person> () {    @Override public    void Funnel Primitivesink into) {        into            . Putint (person.id)            . putstring (Person.firstname, Charsets.utf_8)            . Putstring (Person.lastname, Charsets.utf_8)            . Putint (birthyear);    }}

Note: putstring ("abc", Charsets.utf_8). Putstring ("Def", Charsets.utf_8) is exactly the same as putstring ("AB", Charsets.utf_8). putstring ("Cdef", charsets.utf_8), because they provide the same sequence of bytes. This can lead to unexpected hash conflicts. Adding some form of delimiter helps eliminate hash conflicts.

Hashcode

Once the hasher is given all input, it is possible to get the hashcode instance by means of the hash () method (the result of multiple calls to the hash () method is indeterminate). Hashcode can do equality detection through the Asint (), Aslong (), Asbytes () methods, in addition, Writebytesto (array, offset, maxLength) writes the first maxLength bytes of the hash value to the byte array.

Bloom Filter [Bloomfilter]

Bloom Filter is an elegant use of hashing, which can be implemented simply based on Object.hashcode (). In short, the Bloom filter is a probabilistic data structure that allows you to detect whether an object is definitely not in the filter, or it may have been added to the filter. This is a comprehensive introduction to the Bloom Filter Wiki page, and we recommend a tutorial on GitHub.

The Guava Hash pack has an built-in bloom filter implementation that you can use as long as you provide funnel. You can get bloomfilter<t> by using the Create (Funnel Funnel, int expectedinsertions, double falsepositiveprobability) method, and the default false detection rate [ Falsepositiveprobability] is 3%. Bloomfilter<t> provides Boolean mightcontain (T) and void put (t), and their meanings are self-explanatory.

Bloomfilter<person> friends = Bloomfilter.create (Personfunnel, $, 0.01); for (person friend:friendslist) {    Friends.put (friend);} A long time later if (Friends.mightcontain (dude)) {    //dude is not a friend and the probability of running here is 1%    //Here we can trigger some asynchronous loading while doing further precise checking}
Hashing class

The hashing class provides a number of hash functions, as well as tool methods for operating Hashcode objects.

The provided hash function
MD5 () murmur3_128 () Murmur3_32 () SHA1 ()
SHA256 () SHA512 () Goodfasthash (int bits)
Hashcode Operations
Method Describe
hashcode combineordered (iterable To join the hash code in an orderly manner, if the two hash sets are the same as the hash code that is joined by the method, the elements of the hash set may be of equal order
Hashcode combineunordered (iterable To join the hash code in an unordered manner, if the two hash sets are the same as the hash code that is joined by the method, then the elements of the hash set may be equal in some sort
int Consistenthash (hashcode, int buckets) Returns a consistent hash value for the given bucket size. This method guarantees a minimum consistent hash value when the bucket grows. See consistent hashes.

original articles, reproduced please specify: reproduced from the Concurrent programming network –ifeve.com This article link address: [Google Guava] 10-hash

Guava 10-Hash

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.