Java implementation method of Prum filter (Bloom filter) _java

Source: Internet
Author: User
Tags bitset generator hash

The Prum filter principle is simple: is to hash a string into an integer key, and then select a very long sequence of bits, starting with 0, where the key turns 0 in this position to 1; the next time you come in a string, the value key after the hash, and if the value on this bit is 1, Then it means that the string exists.

If you follow the above procedure, it is no different from the hashing algorithm, and the hashing algorithm is duplicated.

The Prum filter is a string that Hashicheng multiple keys, I'll just follow the book.

Create a 1,600,000,002-feed constant and then set all 1.6 billion bits to 0. For each string, 8 information fingerprints (F1,F2,...., F8) are generated with 8 different random generator (F1,f2,....., F8). Then use a random number generator G to map these eight information fingerprints to 1.6 billion natural numbers in 1 to 8 g1,g2,..., G8. Now turn the bits of these 8 positions into 1. Such a cloth-lung filter was built.

So how do you detect if a string already exists?

8 Random number generator (F1,F2,..., F8) is now used to generate 8 information fingerprints for this string s1,s2,..., S8, and then the 8 information fingerprints correspond to 8 prum of the bits filter, respectively t1,t2,..., T8. If the string exists, then obviously T1, T2,..., T8 Corresponding bits should be 1. This is the way to judge whether a string already exists.

In fact, Prum filter is an extension of the hashing algorithm, since the essence is a hash, then there will certainly be deficiencies, that is to say, there will certainly be a miscalculation, a string clearly did not appear and the filter judge appeared, although the possibility is very small, but it does exist.

So how do you reduce this probability, the first thing you can think of is that if you extend the 8 fingerprint to 16 errors, it will certainly decrease, but consider also that the number of strings that a profiler can store is 1 time times lower, and the other is to select a good hash function, There are many kinds of hashing methods for strings, including good hash functions.

Prum filters are mainly used in the filtering of malicious Web sites, all malicious Web sites built on a filter, and then the user's access to the Web site to detect, if the malicious Web site then notify the user. In this way, we can also set a list of the URLs that often appear to be wrong, and then match the URLs that appear to be judged to be in the whitelist, and then release them if they are in the whitelist. Of course, this white list can not be too big, also not too big, Prum filter error probability is very small. Interested readers can refer to the error rate of the Prum filter.

The following gives the Java version of the filter Source:

Import Java.util.BitSet; /** * * @author xkey/public class Bloomfilter {private static final int default_size = 2 << 24;//prum The bit length of the filter is private static final int[] seeds = {3,5,7, 11, 13, 31, 37, 61};//here to select prime numbers, can well reduce the error rate private static Bitset bit 
  s = new Bitset (default_size); 
 
  private static simplehash[] func = new Simplehash[seeds.length]; public static void AddValue (string value) {Simplehash f:func) hashes the string value to 8 or more integers, and then changes to 1 B on the bit of those integers 
  Its.set (F.hash (value), true); 
  public static void Add (String value) {if (value!= null) addvalue (value); 
    public static Boolean contains (String value) {if (value = = null) return false; 
    boolean ret = true; 
    for (Simplehash f:func)//There's really no need to run all of this, just once ret==false then don't include this string ret = ret && bits.get (F.hash (value)); 
  return ret; 
    public static void Main (string[] args) {String value = ' www.jb51.net '; for (int i = 0; i < Seeds.length; 
    i++) {Func[i] = new Simplehash (default_size, seeds[i]); 
    Add (value); 
  System.out.println (contains (value)); 
  } class Simplehash {//This thing is equivalent to C + + structure private int cap; 
 
  private int seed; 
    public Simplehash (int cap, int seed) {this.cap = cap; 
  This.seed = seed; 
    public int hash (string value) {//String hash, it is important to select a good hash function int result = 0; 
    int len = Value.length (); 
    for (int i = 0; i < len; i++) {result = Seed * result + value.charat (i); 
  Return (cap-1) & result;  } 
}

Summary: prum filter is a kind of innovation to hashing algorithm, and need to consume the space is also very small, error rate is very low. In short, this innovative thinking is worth learning, is a bit of this type of data use.

The above this cloth lung filter (Bloom filter) Java implementation method is small series to share all the content, hope to give you a reference, but also hope that we support cloud habitat community.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.