Extension of the hash table-fabric filter

Source: Internet
Author: User
Tags bitset bool sort
Introduction to the Bron filter

What is a bron filter.

The Bron filter (Bloom filter) was proposed by Bron in 1970. It is actually a very long binary vector and a series of random mapping functions. The Bron filter can be used to retrieve whether an element is in a collection. Its advantage is that space efficiency and query time are far more than the general algorithm, the disadvantage is that there is a certain rate of error recognition and removal difficulties.

Why there is a cloth filter.

In everyday life, including the design of computer software, we often judge whether an element is in a set. For example, to check whether a word is spelled correctly (that is, whether it is in a known dictionary), whether the name of a suspect is already on the suspect list, whether a Web site has been visited in a web crawler, and so on. The most straightforward approach is to have all of the elements in the collection present on the computer and encounter a new element that compares it directly to the elements in the collection. A collection in a computer is stored with a hash table. The advantages are: fast and accurate, the disadvantage is: cost storage space. To improve efficiency we can use the hash table and map the elements in the collection to a bit in the bitmap, which saves space and time to find. But because of the hash conflict, we may have a miscarriage of judgment, that is, different elements may produce the same address after the hash function.

What are the applications of the Bron filter?
1. Google's well-known distributed database BigTable uses a fabric filter to find rows or columns that do not exist to reduce the number of disk lookup Io.
2, Squid Web Proxy cache service in cache digests.
3. The Venti document storage System also uses a fabric filter to detect previously stored data.
4. The spin model detector uses a fabric filter to track up the state space for large-scale verification problems.
5, Google Chrome browser using the filter to speed up the Safe browsing service.
6, in many Key-value systems also use the Bron filter to speed up the query process. such as: Hbase, Accumulo, Leveldb. simple implementation of the fabric filter

BitSet.h #pragma once #include <vector> class BitSet {public:bitset (size_t range)//constructor {_a.res
    Ize (range >> 5) + 1, 0);

        } void Set (size_t num) {size_t index = num >> 5;//in which number size_t pos = num% 32;//in which bit _a[index] |= (1 << POS);//position num corresponding to 1} void ReSet (size_t num) {size_t index = num >& Gt
        5;

        size_t pos = num% 32;  _a[index] &= ~ (1 << pos);//position num corresponding to 0} bool Test (size_t num) {size_t index = num >>
        5;

        size_t pos = num% 32;
Return _a[index] & (1 << POS);//If present, the corresponding bit is 1,&1 for 1, otherwise the opposite} protected:vector<int> _a;

};
        BloomFilter.h #pragma once template <typename k> struct _FUNC1 {size_t bkdrhash (const char *str) {
        Register size_t hash = 0;   while (size_t ch = (size_t) *str++) {hash = hash * 131 + ch; can also be multiplied by 31, 131, 1313, 13131, 131313.} return hash;
    } size_t operator () (const string& key) {return Bkdrhash (Key.c_str ());

}
};
        Template <typename k> struct _FUNC2 {size_t sdbmhash (const char *str) {Register size_t hash = 0;
            while (size_t ch = (size_t) *str++) {hash = 65599 * hash + ch;  
        hash = (size_t) ch + (hash << 6) + (hash << +)-hash;
    } return hash;
    } size_t operator () (const string& key) {return Sdbmhash (Key.c_str ());

}
};
        Template <typename k> struct _func3 {size_t rshash (const char *str) {Register size_t hash = 0;
        size_t magic = 63689;
            while (size_t ch = (size_t) *str++) {hash = hash * magic + ch;
        Magic *= 378551;
    } return hash;
    } size_t operator () (const string& key) {return Rshash (Key.c_str ());

}
}; Template < TypeName k> struct _func4 {size_t aphash (const char *str) {Register size_t hash = 0;
        size_t ch; for (long i = 0; ch = (size_t) *str++; i++) {if ((I & 1) = = 0) {Hash
            ^= (Hash << 7) ^ ch ^ (hash >> 3));
            } else {hash ^= ((hash << one) ^ ch ^ (hash >> 5));
    }} return hash;
    } size_t operator () (const string& key) {return Aphash (Key.c_str ());

}
}; Template <typename k> struct _func5 {size_t jshash (const char *str) {if (!*STR)//This is added by me,
        To ensure that an empty string returns a hash value of 0 return 0;
        Register size_t hash = 1315423911;
        while (size_t ch = (size_t) *str++) {hash ^= (hash << 5) + ch + (hash >> 2));
    } return hash; } size_t operator () (const string& key) {return JSHAsh (Key.c_str ());

}
}; Template <typename K = string, typename Func1 = _func1<k>, typename Func2 = _func2<k>, Typen Ame Func3 = _func3<k>, typename Func4 = _func4<k>, typename Func5 = _func5<k >> class Bloom  Filter {public:bloomfilter (const size_t Range): _S1 (Range), _size (range) {} void Set (const
        k& key) {size_t index1 = Func1 () (KEY.C_STR ())% _size;
        size_t index2 = Func2 () (KEY.C_STR ())% _size;
        size_t index3 = Func3 () (KEY.C_STR ())% _size;
        size_t index4 = Func4 () (KEY.C_STR ())% _size;

        size_t index5 = Func5 () (KEY.C_STR ())% _size; _s1.
        Set (INDEX1); _s1.
        Set (INDEX2); _s1.
        Set (INDEX3); _s1.
        Set (INDEX4); _s1.
    Set (INDEX5);
        } bool Test (const k& key) {size_t index1 = Func1 () (KEY.C_STR ())% _size; _s1.
        Test (INDEX1); if (_s1.

    Test (index1) = = 0) return false;    size_t index2 = Func2 () (KEY.C_STR ())% _size; _s1.
        Test (INDEX2); if (_s1.

        Test (index2) = = 0) return false;
        size_t index3 = Func3 () (KEY.C_STR ())% _size; _s1.
        Test (INDEX3); if (_s1.

        Test (index3) = = 0) return false;
        size_t index4 = Func4 () (KEY.C_STR ())% _size; _s1.
        Test (INDEX4); if (_s1.

        Test (index4) = = 0) return false;
        size_t index5 = Func1 () (KEY.C_STR ())% _size; _s1.
        Test (INDEX5); if (_s1.
        Test (index5) = = 0) return false;
    return true;
    } Protected:bitset _s1;
size_t _size;

};
    void Testbloomfilter () {bloomfilter<> bf1 (1000); Bf1.
    Set ("sort"); Bf1.
    Set ("man"); Bf1.
    Set ("left"); Bf1.
    Set ("123"); Bf1.
    Set ("true"); Bf1.
    Set ("https://hao.360.cn/?a1006"); Bf1.
    Set ("https://hao.360.cn/?a10061"); Bf1.
    Set ("https://hao.360.cn/?a10062"); Bf1.
    Set ("https://hao.360.cn/?a10063"); Bf1. Set ("https://hao.360.cn/?a10064 "); cout << "is True?:" << bf1.
    Test ("sort") << Endl; cout << "is True?:" << bf1.
    Test ("123") << Endl; cout << "is True?:" << bf1.
    Test ("Left1") << Endl; cout << "is True?:" << bf1.
    Test ("true") << Endl; cout << "is True?:" << bf1.
    Test ("false") << Endl; cout << "is True?:" << bf1.
    Test ("https://hao.360.cn/?a1006") << Endl; cout << "is True?:" << bf1.
    Test ("https://hao.360.cn/?a10064") << Endl; cout << "is True?:" << bf1.
Test ("https://hao.360.cn/?a10067") << Endl; } Test.cpp #include <iostream> #include <string> #include <cassert> #include <cstdlib> using Nam

Espace std;
    #include "BitSet.h" #include "BloomFilter.h" int main () {testbloomfilter ();
return 0; }

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.