New Features of HBase-StripeCompaction

Last Update:2018-06-10 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Taking the Compaction method of LevelDB and Cassandra for reference, issues. apache. orgjirabrowseHBASE-7667 proposed the StripeCompaction method. Motivation: 1) Excessive Region will increase the RS maintenance overhead and reduce the RS read/write performance. As the data volume increases, the number of Region increases to a certain extent.

Based on the Compaction method of LevelDB and Cassandra, the method of Stripe Compaction is proposed by https://issues.apache.org/jira/browse/HBASE-7667. Motivation: 1) Excessive Region will increase the RS maintenance overhead and reduce the RS read/write performance. As the data volume increases, the number of Region increases to a certain extent.

Based on the Compaction method of LevelDB and Cassandra, the method of Stripe Compaction is proposed by https://issues.apache.org/jira/browse/HBASE-7667.

Motivation:
1) Excessive Region will increase the RS maintenance overhead and reduce the RS read/write performance. As the amount of data increases, increasing the number of Region will increase the system throughput. However, the increase in the number of Region services on RS increases the memory maintenance overhead under RS. In particular, each Store is configured with a MemStore, resulting in more frequent Flush operations, the read/write performance of the system is affected. Therefore, a more lightweight mini-Region can not only reduce the overhead of multiple Region services, but also improve the efficiency of data reading and writing.

2) Region Compaction is easy to "zoom in ". For example, if the Region range is [1FFF, 2FFF), only [1FFF, 21FF) in this range has a large number of write operations (put, delete). However, when the MajorCompaction condition is reached, You need to execute Major Compaction on all files, resulting in a large amount of IO.

3) The Region Split operation is costly.

To learn about the Compaction and Flush processes of HBase, see HBaseCompaction mechanism and the impact of HBase Flush on read/write.

Core Idea of Stripe-Compaction design:
1) perform secondary segmentation on the rowkey range in Region, for example, [1FFF, 2FFF). The partition is divided into two intervals: [1FFF, 24FF) and [24FF, 2FFF). Each interval becomes a Stripe.
2) Region data files are divided into two layers: Level-0 and Level-1. Level-0 is mainly used to store temporary data files (for example, data after bulkload or mem flush operation is executed). Level-1 layer data is distinguished by the Stripe partition.
3) supports two Configuration Methods: The number setting of Mini-regions, or the automatic splitting mechanism with Size-based as the Size trigger factor.
4) fault tolerance mechanism. If there is a hole between Stripes. Then, based on the settings in the Store, all files in the Level-1 layer can be returned to Level-0 for compaction again.
5) during the Get operation, the files involved in a Row include all files in MemStore, Level-0, and files in the Stripe area corresponding to Level-1. According to Stack's opinion, files under Level-0 are only in a temporary state. Most files are located under Level-1 Stripe, the files to be involved are more aggregated.
6) When performing Scan, you need to locate startrow. During the scanning process, the data is sorted by the row interval of the Stripe.
7) Compaction is the process in which Level-0 rises to Level-1. At the same time, data at Level-1 levels is also merged.
8) during the Split operation, locate the center point of the Rowkey range and further search based on the location of the Stripe record. Therefore, using the pre-configured Stripe will facilitate the Split operation, most HFile files can be directly copied to the sub-Region directory, thus accelerating the efficiency of the Split operation.

The following describes the multi-level Compaction Algorithm Used in Cassandra and LevelDB.

1) layer-by-layer compression divides data into layers. The lowest layer is L0, Which is L1, L2 ...., The data size of each layer is 10 times the maximum data size of the layer on which it is located, and the minimum L0 is 5 MB (configurable)
2) When the level is greater than 0, the Rowkey ranges of files on the same layer do not overlap. Therefore, when level n and level n + 1 data blocks are merged, you can clearly know which data block a key value is in. You can merge one data block and one data block, after merging, the new block is generated, and the old block is lost. You do not need to delete the old blocks until all the shards are merged.
3) The overall execution process is merged from L0-> L1-> L2, as shown in.

From, we can know that the more low the level of the block, the more new its data will be. In the process of meeting the downward reduction merge, it will be merged according to the Rowkey range of the file, remove unnecessary versions or perform related deletion operations. Therefore, when read requests are the most challenging, read data from Level0 and read the lowest Level n.

This Compaction has the following advantages:
1) Most read operations with the LRU feature fall into a lower Level. Therefore, the more data is "hot", the lower the Level. This will help you locate problems with multiple storage media of HFile in the future.
2) during the merge process, you only need to participate in part of the files from top to bottom, rather than performing Compaction operations on all files. This will speed up Compaction execution efficiency.

The disadvantage is that, if there are too many layers, the Compaction storm in a certain interval may easily occur during the recursive merge process, affecting the data operation throughput of this interval.
Therefore, HBase-Stripe Compaction only has two layers, Level 0 and Level1. This method reduces the total number of files while retaining the advantages of hierarchical compression, this facilitates RS to perform Split and Merge operations.

References:
[1] HBase-7667 https://issues.apache.org/jira/browse/HBASE-7667

This series of articles are original articles on Binos_ICT's personal technical blog in Binospace. The original Article is http://www.binospace.com/index.php/hbase-new-features-stripe-compaction. they are not allowed to be uploaded.

FromBinospace,PostNew Features of HBase-Stripe Compaction

The footer information of the article is automatically generated by WordPress's wp-posturl plug-in.

Copyright©2008
This feed is for personal, non-inclucial use only.
The use of this feed on other websites breaches copyright. If this content is not in your news reader, it makes the page you are viewing an infringement of the copyright. (Digital Fingerprint:
)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More