Finally, we compare the D-left counting bloom filter with the standard counting bloom filter. Assume that the set to be represented has m elements and the parameters for constructing the D-left counting bloom filter are as follows:
1. The D-left hash table contains four sub-tables;
2. Each sub-table contains M/24 buckets, so that the average load of the bucket is 6 elements;
3. Each bucket in the sub-table can contain 8 cells, and 8 cells can be guaranteed without overflow with a high probability;
4. each counter in the cell contains two digits, which can hold four identical fingerprints. Note that we must set a state for fingerprint to indicate null, for example, all 0, in this way, two counter digits can be used to represent four fingerprints.
If the r bit is used to represent fingerprint, the probability of false positive is 24 · 2-r. The probability of two fingerprints being exactly the same is (1/2) R, and because D-left hashing makes the search have four choices (with four sub-tables), each of which corresponds to a bucket, the average load of a bucket is 6, so it must be multiplied by 24. The number of digits required for the entire d-left counting bloom filter is 4 m (R + 2)/3. R + 2 indicates the number of digits of a cell. M indicates the number of elements in a set. A bucket can accommodate 8 cells, but the average load is 6. Therefore, multiply by 4/3 to obtain all the digits.
Now let's look at the standard counting bloom filter. Assume that for the set of M elements, counting bloom filter uses cm counter, and each counter uses 4 bits. The number of hash functions K uses the optimal value cln2, and the probability of false positive is (2-ln2) c. Use 4cm bits in total.
If we make C = (R + 2)/3, then the two methods use the same number of digits. Then we will compare the probability of false positive. We found that when R is greater than or equal to 7
(2-ln2) (R + 2)/3> 24 · 2-r
In addition, the larger the number of digits used, the larger the gap between the two false positive probabilities. When r = 14, c = 16/3. Although the two structures use the same number of digits, the counting bloom filter has a positive probability more than 100 times higher than the false positive probability of the D-left counting bloom filter.
Now, let's look at the space occupied by the two when the false positive probability is the same. Assuming that the standard counting bloom filter uses 9 4-bit counter (36 bits per element) and 6 independent hash functions, the probability of false positive is 0.01327. The D-left counting bloom filter uses the 11-bit fingerprint (each element has 52/3 bits) to obtain a positive probability of 0.01172. Let's calculate that 52/3 rows 36 = 0.48, that is to say, the D-left counting bloom filter only uses less than half of the counting bloom filter space and gets a lower error rate than the counting bloom filter.
Reference: An Improved Construction for counting Bloom Filters