The Prum filter principle is simple: is to hash a string into an integer key, and then select a very long sequence of bits, starting with 0, where the key turns 0 in this position to 1; the next time you come in a string, the value key after the hash, and if the value on this bit is 1, Then it means that the string exists.
If you follow the above procedure, it is no different from the hashing algorithm, and the hashing algorithm is duplicated.
The Prum filter is a string that Hashicheng multiple keys, I'll just follow the book.
Create a 1,600,000,002-feed constant and then set all 1.6 billion bits to 0. For each string, 8 information fingerprints (F1,F2,...., F8) are generated with 8 different random generator (F1,f2,....., F8). Then use a random number generator G to map these eight information fingerprints to 1.6 billion natural numbers in 1 to 8 g1,g2,..., G8. Now turn the bits of these 8 positions into 1. Such a cloth-lung filter was built.
So how do you detect if a string already exists?
8 Random number generator (F1,F2,..., F8) is now used to generate 8 information fingerprints for this string s1,s2,..., S8, and then the 8 information fingerprints correspond to 8 prum of the bits filter, respectively t1,t2,..., T8. If the string exists, then obviously T1, T2,..., T8 Corresponding bits should be 1. This is the way to judge whether a string already exists.
In fact, Prum filter is an extension of the hashing algorithm, since the essence is a hash, then there will certainly be deficiencies, that is to say, there will certainly be a miscalculation, a string clearly did not appear and the filter judge appeared, although the possibility is very small, but it does exist.
So how do you reduce this probability, the first thing you can think of is that if you extend the 8 fingerprint to 16 errors, it will certainly decrease, but consider also that the number of strings that a profiler can store is 1 time times lower, and the other is to select a good hash function, There are many kinds of hashing methods for strings, including good hash functions.
Prum filters are mainly used in the filtering of malicious Web sites, all malicious Web sites built on a filter, and then the user's access to the Web site to detect, if the malicious Web site then notify the user. In this way, we can also set a list of the URLs that often appear to be wrong, and then match the URLs that appear to be judged to be in the whitelist, and then release them if they are in the whitelist. Of course, this white list can not be too big, also not too big, Prum filter error probability is very small. Interested readers can refer to the error rate of the Prum filter.
The following gives the Java version of the filter Source:
Import Java.util.BitSet; /** * * @author xkey/public class Bloomfilter {private static final int default_size = 2 << 24;//prum The bit length of the filter is private static final int[] seeds = {3,5,7, 11, 13, 31, 37, 61};//here to select prime numbers, can well reduce the error rate private static Bitset bit
s = new Bitset (default_size);
private static simplehash[] func = new Simplehash[seeds.length]; public static void AddValue (string value) {Simplehash f:func) hashes the string value to 8 or more integers, and then changes to 1 B on the bit of those integers
Its.set (F.hash (value), true);
public static void Add (String value) {if (value!= null) addvalue (value);
public static Boolean contains (String value) {if (value = = null) return false;
boolean ret = true;
for (Simplehash f:func)//There's really no need to run all of this, just once ret==false then don't include this string ret = ret && bits.get (F.hash (value));
return ret;
public static void Main (string[] args) {String value = ' www.jb51.net '; for (int i = 0; i < Seeds.length;
i++) {Func[i] = new Simplehash (default_size, seeds[i]);
Add (value);
System.out.println (contains (value));
} class Simplehash {//This thing is equivalent to C + + structure private int cap;
private int seed;
public Simplehash (int cap, int seed) {this.cap = cap;
This.seed = seed;
public int hash (string value) {//String hash, it is important to select a good hash function int result = 0;
int len = Value.length ();
for (int i = 0; i < len; i++) {result = Seed * result + value.charat (i);
Return (cap-1) & result; }
}
Summary: prum filter is a kind of innovation to hashing algorithm, and need to consume the space is also very small, error rate is very low. In short, this innovative thinking is worth learning, is a bit of this type of data use.
The above this cloth lung filter (Bloom filter) Java implementation method is small series to share all the content, hope to give you a reference, but also hope that we support cloud habitat community.