Bloomfliter and murmur hash algorithm in level db

Source: Internet
Author: User
Tags memory usage

1. Levledb bloomfilter Storage format

In the LEVELDB 1.4 release, support for Bloomfilter was added so that the block portion of the Bloom filter can be read directly during the invocation of the Db::get () method. This reduces the number of sstable file random read operations that do not exist for key.
The filter block in LEVELDB is stored in the Meta block section, and the current version of the Meta block only has the current Bloom filter, and subsequent versions may also add new content. As shown in.

For the storage of the bloom filter in the meta block, as shown in.

[Filter 0]
[Filter 1]
[Filter 2]
...
[Filter N-1]

[Offset of filter 0]: 4 bytes
[Offset of filter 1]: 4 bytes
[Offset of filter 2]: 4 bytes
...
[Offset of filter N-1]: 4 bytes

[offset of beginning of offset array]: 4 bytes
LG (Base): 1 byte
First there is a base, the size is in the way of LG storage, the default is 2Kb, then in the data store [I*base, (i+1) *base) This part of the data is mapped to filter I, you can directly calculate the value of I, and then get to offset of Beginning of offset array, you can get the offset of filter I and filter i+1, which is the contents of the Bloom filter of this part. Table::internalget will first use the filter to determine whether the key is match, if it does not match the direct return, do not need to read the corresponding block, the code in the/table/table.cc.

2. Bloomfilter Construction algorithm

The concrete construction algorithm of Bloom Fliter in/util/bloom.cc.

From the code created by the bloom.cc, it can be seen that the memory occupied by Bloom Fliter is determined by the N (number of keys) and Bits_per_key_ parameters. And in the entire leveldb Bloom Fliter occupies memory, should be all open sstable memory and, open sstable file number is max_open_files to specify, default is 1000. Thus the memory Bloom Fliter in the entire leveldb is determined by the number of all open keys and the Bits_per_key_ specified by Keyt. A million keys and you use the suggested of bits per key as the argument to Newbloomfilterpolicy, the memory usage would be Approximately million bits =~ 1.25 MB.

3, Bloomfilter hash algorithm

Bloom hash uses the value of k_ hash function, k_ between 1~30, calculated by BITS_PER_KEY_*LN (2). These hash functions are calculated by Bloomhash and then shifted to each other.
The calculation method of Bloomhash is similar to that of MurmurHash. The code is shown below,

4. MurmurHash algorithm

MurmurHash is a non-cryptographic hash function that is suitable for general hash retrieval operations. Invented by Austin Appleby in 2008
MURMURHASH2 can produce a 32-bit or 64-bit hash value. MurmurHash is used in several open source projects, including LIBSTDC, libmemcached, Nginx, Hadoop, and more.

5. References

Http://leveldb.googlecode.com/svn/trunk/doc/table_format.txt

Https://code.google.com/p/smhasher/source/browse/trunk/MurmurHash2.cpp

http://duanple.blog.163.com/blog/static/7097176720123227403134/

Http://zh.wikipedia.org/wiki/Murmur%E5%93%88%E5%B8%8C

https://code.google.com/p/leveldb/source/detail?r=85584d497e7b354853b72f450683d59fcf6b9c5c

Bloomfliter and murmur hash algorithm in level db

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.