Bloomfliter and murmur hash algorithm in level db

Last Update:2015-05-15 Source: Internet

Author: User

Tags memory usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Levledb bloomfilter Storage format

In the LEVELDB 1.4 release, support for Bloomfilter was added so that the block portion of the Bloom filter can be read directly during the invocation of the Db::get () method. This reduces the number of sstable file random read operations that do not exist for key.
The filter block in LEVELDB is stored in the Meta block section, and the current version of the Meta block only has the current Bloom filter, and subsequent versions may also add new content. As shown in.

For the storage of the bloom filter in the meta block, as shown in.

[Filter 0]
[Filter 1]
[Filter 2]
...
[Filter N-1]

[Offset of filter 0]: 4 bytes
[Offset of filter 1]: 4 bytes
[Offset of filter 2]: 4 bytes
...
[Offset of filter N-1]: 4 bytes

[offset of beginning of offset array]: 4 bytes
LG (Base): 1 byte
First there is a base, the size is in the way of LG storage, the default is 2Kb, then in the data store [I*base, (i+1) *base) This part of the data is mapped to filter I, you can directly calculate the value of I, and then get to offset of Beginning of offset array, you can get the offset of filter I and filter i+1, which is the contents of the Bloom filter of this part. Table::internalget will first use the filter to determine whether the key is match, if it does not match the direct return, do not need to read the corresponding block, the code in the/table/table.cc.

2. Bloomfilter Construction algorithm

The concrete construction algorithm of Bloom Fliter in/util/bloom.cc.

From the code created by the bloom.cc, it can be seen that the memory occupied by Bloom Fliter is determined by the N (number of keys) and Bits_per_key_ parameters. And in the entire leveldb Bloom Fliter occupies memory, should be all open sstable memory and, open sstable file number is max_open_files to specify, default is 1000. Thus the memory Bloom Fliter in the entire leveldb is determined by the number of all open keys and the Bits_per_key_ specified by Keyt. A million keys and you use the suggested of bits per key as the argument to Newbloomfilterpolicy, the memory usage would be Approximately million bits =~ 1.25 MB.

3, Bloomfilter hash algorithm

Bloom hash uses the value of k_ hash function, k_ between 1~30, calculated by BITS_PER_KEY_*LN (2). These hash functions are calculated by Bloomhash and then shifted to each other.
The calculation method of Bloomhash is similar to that of MurmurHash. The code is shown below,

4. MurmurHash algorithm

MurmurHash is a non-cryptographic hash function that is suitable for general hash retrieval operations. Invented by Austin Appleby in 2008
MURMURHASH2 can produce a 32-bit or 64-bit hash value. MurmurHash is used in several open source projects, including LIBSTDC, libmemcached, Nginx, Hadoop, and more.

5. References

Http://leveldb.googlecode.com/svn/trunk/doc/table_format.txt

Https://code.google.com/p/smhasher/source/browse/trunk/MurmurHash2.cpp

http://duanple.blog.163.com/blog/static/7097176720123227403134/

Http://zh.wikipedia.org/wiki/Murmur%E5%93%88%E5%B8%8C

https://code.google.com/p/leveldb/source/detail?r=85584d497e7b354853b72f450683d59fcf6b9c5c

Bloomfliter and murmur hash algorithm in level db

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More