Counting bloom Filter

Source: Internet
Author: User
Counting bloom Filter

Jomeng January 30, 2007

 

According to the introduction of the bloom filter in the previous articles, the standard Bloom filter is a very simple data structure. It only supports insert and search operations. When the set to be expressed is a static set, the standard Bloom filter can work well. However, if the set to be expressed is changed frequently, the disadvantages of the standard Bloom filter are shown, because it does not support deletion.

 

The appearance of counting bloom filter solves this problem. It extends each bit of the standard Bloom filter Bit Array to a small counter (Counter ), when an element is inserted, add 1 to the values of the corresponding counter (k is the number of hash functions) and 1 to the values of the corresponding K counters When deleting the element. Counting bloom filter adds a delete operation to the bloom filter by consuming several times more storage space. The next question is, how many times will it take?

 

 

First, we calculate the probability that the I counter is increased by J times. N indicates the number of elements in the Set, K indicates the number of hash functions, and M indicates the number of counters (corresponding to the size of the original array):


In the expression on the right of the equation above, the first part indicates that J times are selected from the NK hash, and the middle part indicates that the I counter is selected for the J hash, the latter part indicates that the I-th counter is not selected for other NK-J hash operations. Therefore, the probability that the I counter value is greater than J can be limited:

In step 2 of the above formula, the following formula is used to estimate the factorial:

In the article on the concept and principle of bloom filter, we mentioned that the optimal value of K is (ln2) M/N. Now we limit k to ≤ (ln2) M/N, the following conclusions can be drawn:

If each counter is assigned four digits, the counter will overflow when the counter value reaches 16. This probability is:

This value is small enough, so four digits are sufficient for most applications.

 

The earliest paper about counting bloom filter: Summary cache: A Scalable wide-area Web Cache sharing protocol

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.