Fundamentals and variants of Bloom filter

Source: Internet
Author: User

Learn a thing first to know what this thing is, what to do, and then to understand what the benefits and advantages of this thing, and then learn how he works. Let's take a brief look at the Bloom filter, as well as his variants from these three points.
  1. What: In cases where a certain error rate is allowed to determine whether an element belongs to a collection, Bloom filter may misjudge an element that is not part of the collection to belong to this set, that is, false positive. Can be used to check if a URL has been crawled by a crawler, a network cache share, a string match, and so on
  2. Why: High Time and space efficiency (compared to hash)
  3. How:
    • Storage elements: Using an array of M-bits and K-hash functions, a K-value is mapped to an element with a hash function (the range is (0~m-1), which is the array subscript), and the M-subscript position of the logarithm group is 1
    • Query element: Ibid. Gets the M subscript position of an element if the M subscript position is 1, indicating that the element belongs to the collection
    • Advantages: High time and space efficiency, only k-times hash can find elements, only the size of M-space, time, space complexity are constant
    • Disadvantage: There is a certain error rate, the element cannot be deleted, the value of the original element cannot be restored (because element values are not stored directly)
  4. from Hash to bloom filter:
    1. traditional hash:
      • Storage element: Open H-grid, on each element in the set, hash out the subscript of the lattice, the element is stored in the lattice. There is a hash conflict situation, with the hash list, and then hash the method to solve.
      • Lookup: Hash out of the grid subscript, the same lattice subscript location stored element value comparison
      • disadvantage: You need to store all the values of the collection, space occupied large; Encounter hash conflict need to find the linked list or hash, time complexity is not determined
      /li>
    2. Improved hash:
      1. Storage element: Open h lattice, to each element of the set, hash out the subscript of the lattice, not directly store the value of the element, but the encoding of the stored element, usually encoded in the number of bits than the number of bits of the element value, the element value mapping into a new encoding (hash) can be can have conflicts (different elements have the same encoding)
      2. look up: Same as traditional hash
      3. Pros: Less complex than traditional hash space
      4. disadvantage: There is a certain error rate, with the correct rate for space; Unable to restore the value of the original element
    3. Bloom filter:
      1. Store and find child 3rd
      2. Pros: Do not consider the situation of conflict, because allow a certain error rate, space-time efficiency high
  5. Variants of Bloom Filter
    1. Counting Bloom Filter: The original Bloom filter does not support the delete operation, and CBF extends the original 1 bits to the T bit for counting by extending the array of bits. Each storage will correspond to K hash of the target bit count +1, delete corresponding to K hash subscript count-1, thereby supporting the collection delete operation
    2. Partial Bloom Filter: The value of the hash function of the original Bloom filter is 0~m-1, that is, the subscript range of the entire bit array, whereas in PBF each hash function has a smaller range, with no overlap between the bits, and the bit array is divided into K regions, each hash function value is responsible for a region. The advantage is that the accuracy rate is higher than the original, and the array can be accessed in parallel, optimizing program performance
    3. Compressed Bloom filter: Compresses the original Bloom filter for use in network transmission applications. The advantage is that the compressed Bloom filter has a lower error rate, fewer bits required, and less hash function required

Fundamentals and variants of Bloom filter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.