Bitmap index in HBase--fabric filter

Source: Internet
Author: User
Tags server memory

In HBase, read business is very frequent. Many operations are based on the meta table where the client navigates to the specific regionserver and then queries the specific data in the region.

But now the problem is, a region consists of a memstore and multiple filestore, memstore like cache in server memory, can improve the efficiency of insertion, when memstore reach a certain size (by Hbase.hregion.memstore.flush.size settings ) or after the user has manually flush, it will be cured on a disk system such as HDFs. In other words, a region can correspond to many files with valid data, although the data in the file is sorted according to Rowkey, but the rowkey between the files is not in any order (unless the major_compact is merged into one file).

If the user is now making a request to view a random column of Rowkey (ROW1) (cf1:col1)

Even with get ' tab ', ' Row1 ', ' cf1:col1 ' this command

It is possible that the row1 is between the Startkey and EndKey of each file, so Regionserver needs to scan the relevant chunks of each file for multiple physical IO. However, there is no guarantee that there must be row1 in every file, and many physical IO are invalid, which has a great impact on performance. Thus there is a Bron filter, to a certain extent, to determine whether the file has a specified line health.

Bron filter is divided into row and rowcol two kinds, the principle is similar, take the Rowcol type as an example:

When Memstore writes to HDFs to form a file, a part of the file is called Meta, and in the process of writing follows the following algorithm:

1. First initializes a longer bit array that may be called bit arr[n]={0};

2. Using k hash function (K<n), the single (Row:cf:col) data is K-hash, guaranteeing the result of calculation in [0,n-1];

3. Assuming that the result of a hash function is R, set arr[r]=1 so that each (Row:cf:col) can have a k result, and the ARR data corresponding position is set to 1;

4. So repeatedly know that all data is written to the file and then write arr to the Meta section in the file

Due to the structural characteristics of the bitmap index itself, it is guaranteed that arr[n] will not be large, so even if it is cached in memory (not memstore) it will not take up much space, although bitmap indexing can cause a lot of lock-in in relational databases, especially OLTP systems, but in HBase Files that have already been written will hardly be modified unless the compact is otherwise.

Now look at get ' tab ', ' Row1 ', ' cf1:col1 ', in determining if a file contains (ROW1:CF1:COL1), only need to do a k hash of the row1:cf1:col1, and determine whether each result corresponds to the ARR array value is not 1 , if one is not, you can indicate that the column data does not exist in the file (although all of them are not necessarily represented by 1), which avoids reading unnecessary files and improving query efficiency.

From the visible Bron filter can be to some extent avoid reading unnecessary files, but because it is based on the hash function, so it is not completely accurate, and for large-scale scan such operations, there is no need to use the filter filter.

2017.1.15




Bitmap index in HBase--fabric filter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.