HBase Authoritative Guide Learning Notes-architecture--storage

Source: Internet
Author: User

HBase mainly handles two kinds of files: pre-write logs (Write-ahead Log,wal), actual data files.

A basic process is that the client first contacts the Zookeeper subset group to find the region server name where the row health data resides. (Zookeeper obtains the region server name "hostname" containing the-root-, which can be queried by the region server containing the-root-.) the corresponding region server name in the table containing the row health information. The main contents of both are cached and queried only once. Finally through the query. META. Server to get the server name of the region where the row health data for the client query is located. )。 Once you know the actual location of the data, the location of region, HBase caches the information for this query and directly contacts the Hregionserver that manages the actual data. Therefore, the client can then locate the desired data location by caching the information well, without looking again. META. Table.

Hregionserver is responsible for opening the region and creating the corresponding Hregion instance. When Hregion is opened, it creates a store instance for each table's hcolumnfamily, which is defined when the user previously created the table. Each store instance contains one or more storefile instances, which are lightweight packages for the actual data store file hfile. Each store also has a corresponding memstore, and a hregionserver shares a Hlog instance.

When the data is written to the Wal, the data is placed in the Memstore, and the Memstore is checked for full, and if full, the brush is requested to be written to disk (the brush write request has another hregionserver thread processing). If the hbase.hregion.preclose.flush.size (default 5MB) is configured, the "Pre-flush" operation will be run to clean up the memstore that needs to be closed when the server operation is off. Then the region will be offline. When a region is offline, we can't do any more writing. If a memstore is large, the flush operation consumes a lot of time. The "Pre-flush" operation means that the memstore is emptied before the region is offline. This will cause the flush operation to be quick when the close operation is finally performed. Shutting down the region server forces all Memstore to be brushed to disk without caring whether Memstore has reached the configured maximum value.

When a file stored in a region grows to a size larger than the configured hbase.hregion.max.filesize size or configured at the column family level, the region is split in Split. The region server completes this process by creating a splits directory in the parent, and then closes the section without receiving any requests. The region server then establishes the necessary file structure in the splits directory to prepare the new child region, after this process is completed. Move two sub-region to the table directory, open in parallel on the same server, and update at the same time. META. Table, the contents of the region are also merged, merging the stored files of the parent region into the path two sub region asynchronously before replacing the reference file.

HBase supports two types of merging: minor and Major. Minor merge is responsible for rewriting the last generated files to a larger file; Major to compress all the files into a single file.

hbase in order to prevent small files (brush to disk menstore) too much to ensure query efficiency, HBase needs to be when necessary to merge these small store file into a relatively large store file, This process is called compaction. In HBase, there are two main types of Compaction:minor compaction and major compaction.

the function of major compaction is to merge all store file into one, and the possible conditions for triggering major compaction are: Major_compact command, Majorcompact () API, Region Server Autorun (Related parameters: Hbase.hregion.majoucompaction default is 24 hours, hbase.hregion.majorcompaction.jetter default value is 0.2 prevent region server Major compaction at the same time). The function of the Hbase.hregion.majorcompaction.jetter parameter is to float the value specified by the parameter hbase.hregion.majoucompaction, assuming that two parameters are the default values 24 and 0, 2, the major compact finally uses the following values: 19.2~28.8 this range.

The operation mechanism of minor compaction is more complicated, it is decided by several parameters together:

Hbase.hstore.compaction.min: The default value is 3, which means that minor compaction starts when it requires at least three store file that meets the criteria

The default value of Hbase.hstore.compaction.max is 10, which means that up to 10 store file is selected in minor compaction at a time

Hbase.hstore.compaction.min.size indicates that a store file with a size smaller than this value must be added to the minor compaction store file

Hbase.hstore.compaction.max.size indicates that a store file larger than this value must be minor compaction excluded

Hbase.hstore.compaction.ratio the store file by file age (older to younger), minor compaction always starts from the older store file, If the size of the file is less than the sum of the Hbase.hstore.compaction.max store file size after it is multiplied by the ratio, the store file is also added to the minor compaction.

Reference:

1.http://blog.csdn.net/azhao_dn/article/details/8867036

HBase Authoritative Guide Learning Notes-architecture--storage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.