HBase System Architecture

Source: Internet
Author: User
Keywords nbsp; can each therefore merge

Client
Http://www.aliyun.com/zixun/aggregation/13713.html ">hbase client uses the RPC mechanism of HBase to communicate with Hmaster and Hregionserver, For management operations, client and hmaster RPC; for data read-write class operations, client and Hregionserver RPC
Zookeeper
In addition to storing the address of the-root-table and Hmaster address in the Zookeeper quorum, Hregionserver also registers itself ephemeral with zookeeper, So that hmaster can feel the health of each hregionserver at any time. In addition, zookeeper also avoids the single point of Hmaster, as described below
Hmaster
Hmaster there is no single point of problem, HBase can start multiple hmaster, through the zookeeper Master election mechanism to ensure that there is always a master run, Hmaster in the main function of the table and region management work:
1. Manage users to add, delete, change and check the table
2. Manage Hregionserver load balancing and adjust region distribution
3. After region split, responsible for the new region distribution
4. After the hregionserver shutdown, responsible for the failure hregionserver on the Regions migration
Hregionserver
Hregionserver is mainly responsible for responding to user I/O requests, reading and writing data to HDFs file system, which is the core module in HBase.

The


Hregionserver internally manages a series of Hregion objects, each of which corresponds to a region,hregion in a table that consists of multiple hstore. Each hstore corresponds to the storage of a column accessibility in the table, which shows that each column accessibility is actually a centralized storage unit, so it is best to place a column with the common IO attribute in a column accessibility, This is the most effective.
Hstore Storage is the core of hbase storage, which consists of two parts, part Memstore, and Storefiles. Memstore is Sorted Memory Buffer, the user writes the data first will put in the Memstore, when Memstore full will flush into a storefile (the bottom implementation is hfile), When the number of storefile files increases to a certain threshold, it triggers the compact merge operation, merging multiple storefiles into a single storefile, merging and deleting data, so you can see that hbase only adds data, All updates and deletions are performed in the subsequent compact process, which allows the user's write operations to return immediately in memory, guaranteeing high performance of hbase I/O. When the Storefiles compact, will gradually form more and more storefile, when a single storefile size exceeds a certain threshold, will trigger split operation, at the same time the current Region split into 2 Region, the Father Region will be offline, The new split of 2 children region will be hmaster allocated to the corresponding hregionserver, so that the original 1 region pressure can be diverted to 2 region. The following figure describes the process of compaction and split:


After understanding the fundamentals of the above Hstore, you must also understand the functionality of the Hlog, as the above hstore is not problematic if the system works properly, but in a distributed system environment, you cannot avoid system errors or downtime. So once Hregionserver quits unexpectedly, the memory data in the Memstore will be lost, which requires the introduction of Hlog. Each hregionserver has a Hlog object, Hlog is a class that implements write projectile log, and writes a copy of the data to the Memstore file (hlog file format for follow-up) every time the user writes to Hlog. Hlog files are periodically scrolled out of new and deleted old files (data that has been persisted to storefile). When Hregionserver unexpectedly terminated, Hmaster will perceive through zookeeper, Hmaster will first process the legacy hlog files, which will be divided into different region log data, respectively, into the corresponding region directory, The region is then reassigned, and the region is picked up hregionserver in the load region process, there will be a history hlog need to be processed, so replays the data in Hlog to Memstore, Then flush to Storefiles to complete the data recovery.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.