HBase compact analysis and HBasecompact Analysis
HBase is a distributed NoSQL database based on the LSM tree storage model. Compared with the common B + tree, LSM trees can achieve high random write performance while maintaining reliable random read performance (refer to here ). When performing a Read Request, the LSM tree needs to merge multiple sub-trees (similar to the B + tree structure) for query. For HBase, these sub-trees are HFile (also including the memory tree structure MemStore ). Therefore, the smaller the number of Subtrees for merge queries, the higher the query performance.
The role of Compact
In this article, we have already introduced that for each write request, MemStore and HLog must be written to complete transaction commit. When MemStore exceeds the threshold, flush to HDFS to generate an HFile. Therefore, as the number of hfiles is constantly written, the number of hfiles will increase. As described above, too many hfiles will reduce the read performance. To avoid the impact on read performance, you can compact these hfiles and combine multiple hfiles into one HFile. Compact operations require multiple reads and writes of HBase data. Therefore, this process produces a large amount of IO. We can see that the essence of compact operations is to exchange IO operations for subsequent read performance improvement.
Compact Process
HBase compact operates on HStore of HRegion. Compact operations are divided into major and minor. major will compact all hfiles in HStore into an HFile, and ignore the KeyValue marked as delete (the deleted KeyValue is "deleted" only in the compact process). It can be imagined that major will generate a large number of IO operations, it affects the read/write performance of HBase. Minor selects only several HFile files compact as one HFile. The minor process is generally fast and IO is relatively low. Mjaor operations are not allowed in daily task time and are only performed in idle time periods.
Compact entry
There are many places where compact can be requested, including open region and MemStore flush, which determine whether compact operations are required (if compact operations are triggered after a single HStore's MemStore is flushed, compact will be performed on all hstores under the corresponding HRegion ). In addition, HRegionServer. CompactionChecker is responsible for regular 10 * s HStore checks for all HRegion to determine whether compact operations are required.
CompactionChecker determines whether compact operations are required by the following conditions:
1. The total number of hfiles under HStore that have not been compact> = hbase. hstore. compaction. min (default value: 3), compact is required.
2. If 1 is not true, determine whether to execute major compact. Check whether the compact operation has been executed for too long. Specific judgment process:
1) obtain the compact interval. Hbase. hregion. majorcompaction (default 7 days) is the base reference time, hbase. hregion. majorcompaction. jitter (default 5.0) is jitter, formula base + jitter-Math. round (2 * jitter * randomNum) calculates a value that will automatically shake each time as the time interval for major compact. The reason for Automatic jitter is to avoid a large number of IO caused by major compact during HRegionServer restart.
2) If the time interval of the HFile with the oldest (minimum timestamp) of all hfiles is greater than that of the major compact, the major compact is executed. In addition, if HRegion only has one HFile and the timestamp of all keyvalues of this HFile does not exceed TTL, it means that major compact is not required, and major compact is skipped this time.
When 1 or 2 is set up, the compact request is sent to CompactSplitThread. The difference is that 1 asynchronously selects the HFile for compact, and 2 synchronously selects the file.
Compact request
CompactSplitThread is the thread pool in the HRegionServer responsible for executing minor compact, major compact, split, and merge operations. The corresponding four operations have different thread pools to execute corresponding requests. Putting these time-consuming operations into their respective thread pools can help increase the overall throughput of the system and avoid the impact of other operations on a certain operation blocking.
For each compact request, CompactionChecker needs to differentiate major and minor and allocate them to the corresponding thread pool for execution. The condition is the total size of the compact File> hbase. regionserver. thread. compaction. throttle (default 2 * maxFileCompacts * memstoreFlushSize = 2*10*128 MB), it is major compact, otherwise it is minor compact.
Select the file operation of compact by the corresponding HStore. The CompactionChecker 2 selects the compact file synchronously, so that you can immediately determine which thread pool is used to execute the specific compact operation. However, when 1 asynchronously selects compact for HFile, HBase first selects the compact file in the minor compact thread pool because it does not know the total file size, if the selected operation is determined to require major compact, the request is re-sent to the thread pool of major for subsequent compact operations.
Select the compact file of HStore
To select a compact file, you must first determine whether it is major or minor. If it is major, all hfiles in the entire HStore will be selected. Otherwise, select some files for minor compact. Considering that compact operations consume a lot of IO, the goal of minor compact operations is to get the maximum read performance at the lowest IO cost. Currently, in the new version, the compact file selection policy of HStore can fully consider the overall situation to select the best solution. The process is as follows:
- Delete invalid files. Select the HFile that exceeds TTL as the compact file. Write the compact records of these files to WAL, notify all shards that execute read requests, and update the total file size of HStore.
- Select the compact file.
- Update internal data according to the selected compact file.
The procedure of selecting a compact file is as follows:
- Exclude all hfiles in the current HStore as candidate compact files.
- Exclude files older than the latest file being compact in candidate HFile. The maximum SequenceId stored in the HFile is always compared to the new file (in the HLog replay process, you can determine which records have been written to the HFile. SequenceId is the monotonically increasing ID of the key part when HRegion writes the inserted KeyValue record to HLog. Therefore, the larger the SequenceId, the newer the record, that is, the newer the HFile.
- Exclude candidate HFile files that exceed hbase. hstore. compaction. max. size (maximum value of Long by default) and non-Reference files. Skip this step if it is not forceMajor. The Reference file is a temporary file generated by split region. It is a simple Reference file and must be deleted during the compact process.
- Determine whether major compact is used. The specified force major is satisfied, or compact (CompactionChecker Judgment 2) is not performed for too long, and the number of candidate files is smaller than hbase. hstore. compaction. max (10 by default), or a Reference file. One of the above three conditions is major compact.
- Minor compact continues to exclude operations. 1. Exclude the setting of HFile without minor compact in metadata (set during bulkLoad) 2. applyCompactionPolicy (described later) 3. The number of candidate files is smaller than hbase. hstore. compaction. min (default 3) exclude all candidate files
- Exclude that the number of candidate files exceeds hbase. hstore. compaction. for max (default 10), skip this step if it is major compact. Note that the exclusion starts from the latest HFile, that is, if there are 12 candidate files, the last two latest hfiles are excluded.
In the selection process of compact, we mainly judge major and minor, and then select under the maximum and minimum limits configured. The focus of the entire step is applyCompactionPolicy. You can implement your own selection policy. HBase mainly has two policies: RatioBasedCompactionPolicy and compactionpolicy. First, let's assume that when there are a lot of write requests, HFile is generated continuously, but the compact speed is far behind the speed of HFile generation, which will increase the number of hfiles, this causes a sharp drop in read performance. To avoid this situation, when there are too many hfiles, The Write Request speed is limited: if the number of HStore hfiles exceeds hbase before each MemStore flush operation is executed. hstore. blockingStoreFiles (7 by default) blocks the flush operation on hbase. hstore. blockingWaitTime. During this period, if the compact operation reduces the number of HStore files to this value, blocking is stopped. In addition, the flush operation will be resumed after the blocking time is exceeded. This can effectively control the speed of a large number of write requests, but it is also one of the main reasons that affect the speed of write requests.
The two are implemented as follows:
- RatioBasedCompactionPolicy. Traverse from the oldest file to the newest candidate file and find the file smaller than [hbase. hstore. compaction. min. size (the default value is the flush size of memstore, 128 MB) and the total size of compact files * maximum value of ratio]. If the file does not match, stop searching immediately. Ratio is a variable ratio. You can change this ratio by setting the peak period. During the peak period, ratio is 1.2, and the non-peak period is 5, that is, the compact file is allowed during off-peak periods (non-peak periods can consume more IO ). The goal is to find as small files as possible for minor compact. If the number of files after the compact operation is determined to be too large, the flush operation will be blocked. Then, simply choose from the oldest file, and subtract the number of candidate files from hbase. hstore. compaction. min (default: 3) files.
- ExploringCompactionPolicy. Traverse from the oldest FileAllCandidate file to find the file that meets the [compact file size smaller than hbase. hstore. compaction. max. size (maximum Long by default) and the size of all files will not exceed the size of other files * ratio] and the efficiency is highest [maximum number of compact files or minimum size of compact]. Ratio is the ratio of peak periods. Note: Due to restrictions, the number of candidate files may be excluded to 0. If the number of files after this compact operation is determined to be too large, the flush operation will be blocked, and hbase will be selected. hstore. compaction. starting from min (3 by default) files, the total size of the maximum (Long maximum) Minimum compact size (128 MB) is the smallest sub-set.
It can be seen that ExploringCompactionPolicy is based on all candidate files, while RatioBasedCompactionPolicy is used to traverse and locate and stop. ExploringCompactionPolicy is a new version of policy. The old version of RatioBasedCompactionPolicy only considers that the largest file is always the oldest, but this rule will be broken for bulk-loaded files and other situations, the RatioBasedCompactionPolicy algorithm is not the optimal compression policy.
After the compact file is selected, HStore saves the compact result and returns it to CompactSplitThread.
Compact execution
CompactSplitThread then requires HRegion to perform compact requests. HRegion increases the compact Count value to indicate the compact operation being executed. This prevents HRegion from being disabled during compact. Then, HRegion calls the compact method of a specific HStore to execute the compact operation.
The compact operation process of HStore mainly refers to writing these hfiles into an HFile. The main process is:
- Create a region for all files. The Reference has a special region. You can refer to the previous read requests to obtain a storeiterator object. In addition, if it is major compact, the KeyValue of Delete is ignored during the restore process.
- Create a temporary file, call the next () method of the sequence cyclically, write the obtained ordered keyvalues to the temporary file, and write the largest SequenceId of these keyvalues into metadata.
- Move the compact file written to the temporary file to the storage directory corresponding to HStore.
- Write compact results to WAL. When RS is down, the old storeFile can be deleted based on WAL.
- Update HStore internal data with a new compact File
- Notify scanners in the Read Request to update the read HFile, delete the old file (actually archive it), and recalculate the total size of all hfiles.
We can see that in the entire compact operation, only the final compact process will affect read requests.
After the compact operation of HStore is completed, HRegion will subtract the Count value of the previous compact. Return to the CompactSplitThread process. If hbase. hstore. blockingStoreFiles (7 by default) is subtracted from the number of hfiles in the current HStore. If <0, HRegion will block subsequent memstore flush operations. If it is in the stuck status, it will continue to call requestSystemCompaction. Otherwise, run requestSplit to check whether split is required.
So far, the compact operation of HStore has been completed.
Summary
HBase's compact operation can achieve later read performance improvement by sacrificing the current IO. In order to reduce the impact of compact on the system, each compact operation should select the least IO as much as possible, in addition, the maximum read performance can be improved (the maximum number of files). In addition, for major compact, it is best to manually trigger it in the idle time of the cluster.
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.