New Features of HBase-StripeCompaction

Source: Internet
Author: User
Tags cassandra
Taking the Compaction method of LevelDB and Cassandra for reference, issues. apache. orgjirabrowseHBASE-7667 proposed the StripeCompaction method. Motivation: 1) Excessive Region will increase the RS maintenance overhead and reduce the RS read/write performance. As the data volume increases, the number of Region increases to a certain extent.

Based on the Compaction method of LevelDB and Cassandra, the method of Stripe Compaction is proposed by https://issues.apache.org/jira/browse/HBASE-7667. Motivation: 1) Excessive Region will increase the RS maintenance overhead and reduce the RS read/write performance. As the data volume increases, the number of Region increases to a certain extent.

Based on the Compaction method of LevelDB and Cassandra, the method of Stripe Compaction is proposed by https://issues.apache.org/jira/browse/HBASE-7667.

Motivation:
1) Excessive Region will increase the RS maintenance overhead and reduce the RS read/write performance. As the amount of data increases, increasing the number of Region will increase the system throughput. However, the increase in the number of Region services on RS increases the memory maintenance overhead under RS. In particular, each Store is configured with a MemStore, resulting in more frequent Flush operations, the read/write performance of the system is affected. Therefore, a more lightweight mini-Region can not only reduce the overhead of multiple Region services, but also improve the efficiency of data reading and writing.

2) Region Compaction is easy to "zoom in ". For example, if the Region range is [1FFF, 2FFF), only [1FFF, 21FF) in this range has a large number of write operations (put, delete). However, when the MajorCompaction condition is reached, You need to execute Major Compaction on all files, resulting in a large amount of IO.

3) The Region Split operation is costly.

To learn about the Compaction and Flush processes of HBase, see HBaseCompaction mechanism and the impact of HBase Flush on read/write.

Core Idea of Stripe-Compaction design:
1) perform secondary segmentation on the rowkey range in Region, for example, [1FFF, 2FFF). The partition is divided into two intervals: [1FFF, 24FF) and [24FF, 2FFF). Each interval becomes a Stripe.
2) Region data files are divided into two layers: Level-0 and Level-1. Level-0 is mainly used to store temporary data files (for example, data after bulkload or mem flush operation is executed). Level-1 layer data is distinguished by the Stripe partition.
3) supports two Configuration Methods: The number setting of Mini-regions, or the automatic splitting mechanism with Size-based as the Size trigger factor.
4) fault tolerance mechanism. If there is a hole between Stripes. Then, based on the settings in the Store, all files in the Level-1 layer can be returned to Level-0 for compaction again.
5) during the Get operation, the files involved in a Row include all files in MemStore, Level-0, and files in the Stripe area corresponding to Level-1. According to Stack's opinion, files under Level-0 are only in a temporary state. Most files are located under Level-1 Stripe, the files to be involved are more aggregated.
6) When performing Scan, you need to locate startrow. During the scanning process, the data is sorted by the row interval of the Stripe.
7) Compaction is the process in which Level-0 rises to Level-1. At the same time, data at Level-1 levels is also merged.
8) during the Split operation, locate the center point of the Rowkey range and further search based on the location of the Stripe record. Therefore, using the pre-configured Stripe will facilitate the Split operation, most HFile files can be directly copied to the sub-Region directory, thus accelerating the efficiency of the Split operation.

The following describes the multi-level Compaction Algorithm Used in Cassandra and LevelDB.

1) layer-by-layer compression divides data into layers. The lowest layer is L0, Which is L1, L2 ...., The data size of each layer is 10 times the maximum data size of the layer on which it is located, and the minimum L0 is 5 MB (configurable)
2) When the level is greater than 0, the Rowkey ranges of files on the same layer do not overlap. Therefore, when level n and level n + 1 data blocks are merged, you can clearly know which data block a key value is in. You can merge one data block and one data block, after merging, the new block is generated, and the old block is lost. You do not need to delete the old blocks until all the shards are merged.
3) The overall execution process is merged from L0-> L1-> L2, as shown in.

From, we can know that the more low the level of the block, the more new its data will be. In the process of meeting the downward reduction merge, it will be merged according to the Rowkey range of the file, remove unnecessary versions or perform related deletion operations. Therefore, when read requests are the most challenging, read data from Level0 and read the lowest Level n.

This Compaction has the following advantages:
1) Most read operations with the LRU feature fall into a lower Level. Therefore, the more data is "hot", the lower the Level. This will help you locate problems with multiple storage media of HFile in the future.
2) during the merge process, you only need to participate in part of the files from top to bottom, rather than performing Compaction operations on all files. This will speed up Compaction execution efficiency.

The disadvantage is that, if there are too many layers, the Compaction storm in a certain interval may easily occur during the recursive merge process, affecting the data operation throughput of this interval.
Therefore, HBase-Stripe Compaction only has two layers, Level 0 and Level1. This method reduces the total number of files while retaining the advantages of hierarchical compression, this facilitates RS to perform Split and Merge operations.

References:
[1] HBase-7667 https://issues.apache.org/jira/browse/HBASE-7667

This series of articles are original articles on Binos_ICT's personal technical blog in Binospace. The original Article is http://www.binospace.com/index.php/hbase-new-features-stripe-compaction. they are not allowed to be uploaded.

FromBinospace,PostNew Features of HBase-Stripe Compaction

The footer information of the article is automatically generated by WordPress's wp-posturl plug-in.

Copyright©2008
This feed is for personal, non-inclucial use only.
The use of this feed on other websites breaches copyright. If this content is not in your news reader, it makes the page you are viewing an infringement of the copyright. (Digital Fingerprint:
)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.