Bitmap index in HBase--fabric filter

Last Update:2017-01-15 Source: Internet

Author: User

Tags server memory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In HBase, read business is very frequent. Many operations are based on the meta table where the client navigates to the specific regionserver and then queries the specific data in the region.

But now the problem is, a region consists of a memstore and multiple filestore, memstore like cache in server memory, can improve the efficiency of insertion, when memstore reach a certain size (by Hbase.hregion.memstore.flush.size settings ) or after the user has manually flush, it will be cured on a disk system such as HDFs. In other words, a region can correspond to many files with valid data, although the data in the file is sorted according to Rowkey, but the rowkey between the files is not in any order (unless the major_compact is merged into one file).

If the user is now making a request to view a random column of Rowkey (ROW1) (cf1:col1)

Even with get ' tab ', ' Row1 ', ' cf1:col1 ' this command

It is possible that the row1 is between the Startkey and EndKey of each file, so Regionserver needs to scan the relevant chunks of each file for multiple physical IO. However, there is no guarantee that there must be row1 in every file, and many physical IO are invalid, which has a great impact on performance. Thus there is a Bron filter, to a certain extent, to determine whether the file has a specified line health.

Bron filter is divided into row and rowcol two kinds, the principle is similar, take the Rowcol type as an example:

When Memstore writes to HDFs to form a file, a part of the file is called Meta, and in the process of writing follows the following algorithm:

1. First initializes a longer bit array that may be called bit arr[n]={0};

2. Using k hash function (K<n), the single (Row:cf:col) data is K-hash, guaranteeing the result of calculation in [0,n-1];

3. Assuming that the result of a hash function is R, set arr[r]=1 so that each (Row:cf:col) can have a k result, and the ARR data corresponding position is set to 1;

4. So repeatedly know that all data is written to the file and then write arr to the Meta section in the file

Due to the structural characteristics of the bitmap index itself, it is guaranteed that arr[n] will not be large, so even if it is cached in memory (not memstore) it will not take up much space, although bitmap indexing can cause a lot of lock-in in relational databases, especially OLTP systems, but in HBase Files that have already been written will hardly be modified unless the compact is otherwise.

Now look at get ' tab ', ' Row1 ', ' cf1:col1 ', in determining if a file contains (ROW1:CF1:COL1), only need to do a k hash of the row1:cf1:col1, and determine whether each result corresponds to the ARR array value is not 1 , if one is not, you can indicate that the column data does not exist in the file (although all of them are not necessarily represented by 1), which avoids reading unnecessary files and improving query efficiency.

From the visible Bron filter can be to some extent avoid reading unnecessary files, but because it is based on the hash function, so it is not completely accurate, and for large-scale scan such operations, there is no need to use the filter filter.

2017.1.15

Bitmap index in HBase--fabric filter

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More