The compact analysis of HBase

Source: Internet
Author: User

HBase is a distributed NoSQL database based on the LSM tree storage model. The LSM tree, compared to the popular B + tree, can achieve high random write performance while maintaining reliable random read performance (refer to here). In the case of a read request, the LSM tree merges multiple subtrees (similar to the B + tree structure) to query, and for HBase, these subtrees are hfile (also including the tree structure Memstore in memory). Therefore, the fewer sub-trees in a merge query, the higher the query performance.

the role of the compact

In this article of writing a request, it has been described that for each write request, you must write Memstore and Hlog to complete the transaction submission. When the memstore exceeds the threshold, it is necessary to flush to HDFs to generate a hfile. As a result, the number of hfile will increase as you continue to write, as described earlier, too many hfile can degrade read performance. To avoid the impact on read performance, these hfile can be compact, merging multiple hfile into a single hfile. The compact operation requires multiple re-writes of HBase data, so this process generates a lot of IO. You can see that the essence of the compact operation is to exchange IO operations for subsequent read performance improvements.

the Compact process

The Compact of HBase is operated on the hstore of Hregion. The compact operation is divided into major and minor two kinds, major will hstore all hfile are compact as a hfile, While ignoring the keyvalue labeled Delete (the deleted keyvalue is really "deleted" only during the compact process), you can imagine that major will generate a lot of IO operations, affecting the read and write performance of HBase. Minor will select only a few hfile files The compact is generally faster for a hfile,minor, and the IO is relatively low. During the daily task time, the Mjaor operation is forbidden and is executed only during the idle period.

Compact Entrance

There are many places to request the compact, including the open region, Memstore flush, and so on to determine whether a compact operation is required (after a single hstore memstore flush, if the compact operation is triggered, All Hstore under the owning hregion are respectively compact). In addition, Hregionserver.compactionchecker is responsible for the regular 1000s inspection of all hregion to determine if a compact operation is required for all hstore.

The Compactionchecker determines whether the compact operation is required as follows:

1, Hstore under the compact hfile the total number >= hbase.hstore.compaction.min (default is 3), you need to compact.

2, if 1 is not established, then determine whether the need to implement the major compact. The main focus is to see if the compact operation has not been performed for too long. Specific judgment process:

1) Obtain the compact time interval. Hbase.hregion.majorcompaction (default 7 days) is the base base time, Hbase.hregion.majorcompaction.jitter (default 5.0) is jitter, formula Base + jitter- Math.Round (2 * jitter * randomnum) calculates a time interval for each auto-jitter value as the major compact. The reason for an auto-jitter is to avoid a large number of major compact occurrences that cause large amounts of IO when the hregionserver is restarted.

2) The time interval for all hfile oldest (least timestamp) hfile is greater than the major compact interval, then the major compact is executed. In addition, if the hregion has only one hfile, and all keyvalue timestamps of this hfile do not exceed the TTL, then the major compact is skipped.

When 1 or 2 is established to send the compact request to the Compactsplitthread, the difference is that 1 asynchronously selects the hfile,2 that needs to be compact and synchronizes the selection.

Compact Request

Compactsplitthread is a thread pool within Hregionserver that specializes in performing minor compact, major compact, split, and merge operations. Its internal counterpart 4 operations have different thread pools to execute corresponding requests. Putting these time-consuming operations into their respective line constructor can help improve the overall throughput of the system while avoiding the effects of an operation blocking other operations.

For each compact request, Compactionchecker needs to differentiate between major and minor and then assign to the corresponding thread pool execution. The condition is the total file size of the Compact > hbase.regionserver.thread.compaction.throttle (Default 2*maxfilecompacts*memstoreflushsize=2*10 *128MB), or the major compact, otherwise the minor compact.

The file operation for the compact is selected by the corresponding hstore. Compactionchecker 2 will select the compact file in sync, so you can immediately determine which thread pool is performing the specific compact operation. However, when 1 asynchronously chooses the compact hfile, HBase will first perform the compact file selection operation in the minor compact's thread pool, and if it is judged to need to perform the major compact after selecting the operation, The request is sent back to the major thread pool for subsequent compact operations.

Hstore's compact file selection

The selection of the compact file first to determine whether it is major or minor, if it is major, then the entire Hstore all hfile are selected, otherwise select some files for the minor compact. Considering that the compact operation consumes a lot of Io, the goal of the minor compact operation is to gain maximum read performance with minimal IO costs. Currently in the new version, Hstore's compact file selection strategy can take into account the overall situation to choose the best solution. The whole process is as follows:

    • Delete invalid files. Select the hfile that exceed the TTL as the compact file. Write the compact record of these files to Wal, notify all scanner updates that perform read requests, update the total file size of the Hstore, and so on.
    • Select the compact file.
    • Update the internal data according to the selection compact file.

The process of selecting the compact file is the main step, as follows:

      excludes all current Hstore hfile as candidate compact files.
    • excludes files in candidate hfile that are older than the most recent files in the compact. Judging the file new always compares the maximum SequenceID saved in the hfile (in the Hlog replay process can determine which records have been written to hfile). SequenceID is a monotonically incrementing ID that hregion as part of a key when the inserted keyvalue record is written to Hlog, so the larger the SequenceID, the more new The record, which is hfile.
    • excludes hbase.hstore.compaction.max.size (the default long maximum) and non-reference files in candidate hfile. Skip this step if it's not forcemajor. The reference file is a temporary file generated by the split region, and is simply a reference file that must normally be deleted during the compact process. The
    • determines whether the compact is major. Satisfies the user-specified force major, Or if the compact is too long (Compactionchecker 2) and the number of candidate files is less than Hbase.hstore.compaction.max (default 10), or if there are reference files, one of the three conditions above is major Compact The
    • Minor Compact continues to exclude operations. 1, excluded in the metadata set not to carry out the minor compact hfile (when the Bulkload set) 2, Applycompactionpolicy (detailed later) 3, The number of candidate files is less than hbase.hstore.compaction.min (default 3) excludes all candidate files
    • Exclude the portion of the candidate file that exceeds Hbase.hstore.compaction.max (the default 10), and if the major compact skips this step, notice that it is excluded from the latest hfile, that is, if there are 12 candidate files, The last 2 latest hfile are removed.

In the selection process of the compact, the major and minor are judged primarily, and then selected under the maximum minimum relevant constraints of the configuration. The whole step is focused on Applycompactionpolicy, users can implement their own selection strategy, hbase mainly has two strategies ratiobasedcompactionpolicy and exploringcompactionpolicy. We first assume a phenomenon: when the write request is very much, resulting in the constant generation of hfile, but the compact speed is far behind the speed of hfile generation, so that the number of hfile will be more and more, resulting in a sharp decline in read performance. To avoid this, the speed of write requests is limited when the number of hfile is too high: before each memstore flush operation, if the hfile number of Hstore exceeds Hbase.hstore.blockingStoreFiles (default 7), will block the flush operation Hbase.hstore.blockingWaitTime time, during which time, if the compact operation causes the number of hstore files to fall back to this value, stop blocking. The flush operation is also resumed after the blocking time has passed. This can effectively control the speed of a large number of write requests, but it is also one of the main reasons for the speed of write requests.

The two implementations are as follows:

          Ratiobasedcompactionpolicy. From the oldest file, go to the latest candidate file and find the total size less than [Hbase.hstore.compaction.min.size (default is Memstore flush size, 128M) and compact file *ratio The maximum value] of the qualifying file, and stop the search immediately if a non-conformance is found. Ratio is a variable proportion that can be changed by setting the peak time, ratio at peak time at 1.2, non-peak 5, which is the non-peak period allowing the compact to be larger (the non-peak can cost more IO).   Purpose is to find small files for minor compact as much as possible. If you determine that the number of files after this compact operation is still too large to block the flush operation, simply choose from the oldest file, the number of candidate files minus Hbase.hstore.compaction.min (default 3) files.
    •     exploringcompactionpolicy.   Start traversing all candidates from the oldest file to find the [compact file size less than Hbase.hstore.compaction.max.size (default long maximum) and all files will not be larger than other file sizes *ratio] and most efficient [the number of compact files or the smallest compact size]. Ratio is the peak ratio. Note that because of the limitations, it is possible that the candidate files are excluded to 0, and if the number of files after the compact operation is judged to be too large it will block the flush operation. The Hbase.hstore.compaction.min (default 3) file is selected, and a sub-collection that meets the minimum total size of the minimum compact size (128MB) of the maximum (long maximum).

The visible exploringcompactionpolicy is based on all candidate file considerations, while the ratiobasedcompactionpolicy is traversed to find the stop. Exploringcompactionpolicy is the new version of the strategy, the old version of Ratiobasedcompactionpolicy was only considered the largest file is often the oldest, However, for some cases such as bulk-loaded files will break this rule, Ratiobasedcompactionpolicy algorithm is not the optimal compression strategy.

After completing the compact file selection, Hstore saves the results of this compact and returns it to Compactsplitthread.

implementation of the compact

Compactsplitthread next requires hregion to make the compact request, Hregion will increase the compact's count value to indicate the compact operation being performed, which prevents Hregion from being shut down during the compact process. Then hregion calls the specific Hstore compact method to perform the real compact operation.

Hstore's compact operation procedure is mainly to write these hfile as a hfile. The main processes are:

    • The corresponding scanner,reference for all files are created with special scanner. The scanner hierarchy can refer to the previous read request, the end result is a Storescanner object, and if it is the major compact, then the scanner will be specified to ignore the delete keyvalue.
    • Creates a temporary file, loops through the next () method of scanner, writes the obtained ordered KeyValue to the temporary file, and writes the keyvalue maximum SequenceID to metadata.
    • Move the compact file written to the temporary file to the hstore corresponding storage directory.
    • When you write the compact results to Wal,rs, you can delete the old storefile based on the Wal.
    • Update the Hstore internal data with the new compact file
    • Notifies the scanners to perform a read request to update the read hfile, delete the old file (actually archive it), recalculate the total size of all hfile

As you can see throughout the compact operation, only the final compact process will have an impact on the read request.

Once the Hstore compact operation is completed, the hregion will subtract the previous compact's count values. Return to the Compactsplitthread process if Hbase.hstore.blockingStoreFiles (default 7) subtracts the hfile number of the current hstore. If <0 indicates that Hregion will block the subsequent Memstore flush operation, the stuck state continues to invoke requestsystemcompaction, otherwise execution requestsplit see if split is required.

This completes the compact operation of the Hstore.

Summary

The compact operation of HBase is able to gain later read performance by sacrificing current IO, in order to mitigate the impact of the compact on the system, each compact operation should be as selective as possible with the least IO, and can improve read performance (the largest number of files). In addition, for the major compact, it should be possible to manually trigger during cluster idle time.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Compact analysis of HBase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.