HBase Physical Model Architecture architecture
HBase Work Flow
Hregionserver is responsible for opening the region and creating a hregion instance, which creates a store instance for each table's hcolumnfamily (defined when the user creates the table), and each store instance contains one or more storefile instances. Is the lightweight package of the actual data store file hfile, each store corresponds to a memstore. When data is written to Hlog, the data is written to the Memstore after it is successfully written. the data in the Memstore needs to be flush to the file storefile periodically, each flush generating a new storefile, because space is limited. when processing the flush request, Hregionserver writes the data as a hfile file and stores it permanently on HDFs, storing the last-written data sequence number.
Integration of the portal for HBase clusters
Communicating with Hmaster and hregionserver using the hbase RPC mechanism
Operations for management classes with Hmaster communication
Read-write class operation with Hregionserver communication
Contains interfaces that access HBase, and the client maintains some caches to speed up access to hbase, such as Regione's location information
Ensure that at any time, only one running Master,master and regionservers in the cluster will be registered to zookeeper by default, HBase manages zookeeper instances, for example, Starting or stopping the introduction of Zookeeperzookeeper makes master no longer a single point of failure
Storage of address entry for all region
Real-time monitoring of regionserver status, Regionserver on-line and offline information, real-time notification to master
Storing the schema and table metadata for HBase
Manage users ' additions and deletions to table operations
Assign a new region after Regionsplit
Responsible for load balancing of regionserver, adjust region distribution
Responsible for redistribution of region on failed regionserver after regionserver outage
Hmaster failure only causes all metadata to be unmodified, to express the data read or write, or to run normally
Regionserver maintain region, processing IO requests to these region
Regionserver is responsible for segmenting the region that has become too large during operation.
As can be seen, the client accesses HBase data on the process does not require master participation, addressing access first zookeeper Regionserver, data read and write access to Regioneserver.
Hregionserver is primarily responsible for responding to user I/O requests and reading and writing data to the HDFs file system, which is the core module in HBase.
all rows in 1.table are sorted by Rowkey dictionary
2.table split into multiple region in the direction of the row
3.Region is split by size, each table starts with only one region, and as data grows, region grows, but when the threshold is reached, region is divided into two new region, so region will be more and more.
4.region is the smallest unit of distributed storage and load balancing in HBase, with different regioon distributed across different regionserver, but region is not split across different region servers.
Table split in the direction of a row into multiple hregion , a Region by [Startkey,endkey] represents
Region is the smallest unit of distributed storage, but not the smallest unit stored.
1. Region consists of one or more stores, each store a columnfamily
2. Each store is made up of one memstore and 0 or more storefile
3. Memstore stored in memory, storefile stored in HDFs
Inner structure of region in table
1. A table is divided into a number of region each region is assigned to a specific regionserver management according to the row (see Data volume)
2. Each region is divided into a number of hstore according to the column family
3. The data in each Hstore is landed in several hfile files
4.region volume grows as data is inserted and splits after a certain threshold
5. With the division of region, more and more region will be managed on a regionserver
6.HMASTER load balancing based on the number of region managed on Regionserver
Data in 7.region has a memory cache: Memstore, access to data takes precedence in Memstore
Data in 8.memstore because of limited space, it is necessary to flush into the file storefile periodically, each flush is to generate a new storefile
The number of 9.storefile will continue to increase over time, and Regionserver will periodically merge a large number of storefile (merge)
Data Block segment – Save the data in the table, this part can be compressed
Meta Block segment ( optional ) – Save user-defined KV Yes, it can be compressed.
File Info segment –hfile Meta-information is not compressed, and users can add their own meta-information in this section.
Data Block Index segment –data Block key is an indexed block The first record of the key
Meta Block Index segment ( optional ) –meta Block the index.
trailer–This paragraph is fixed-length. Saves the offset of each segment, reading ahfileIs read First, theTrailer,TrailerSave the starting position of each segment(segment ofMagic Numberused for security.check), and then,DataBlock Indexwill be read into memory, so that when retrieving aKey, you do not need to scan the entirehfile, and just find it from memoryKeywhere theBlock, through one diskiothe entireBlockread into memory and find the requiredKey. DataBlock IndexAdoptLRUmechanism to be phased out.
hfile io and disk io The attendant overhead, of course, is to spend cpu
Target hfile compression support in two ways: Gzip , Lzo .
hfile file length is not fixed, fixed length of block only two: trailer and FileInfo
The pointer in trailer points to the starting point of the other data block, which is written at the end of the persisted data to the file, which is determined to be the immutable storage file.
Some meta information for files is recorded in file info, for example: Avg_key_len,avg_value_len, Last_key, COMPARATOR, Max_seq_id_key, etc.
The data index and Meta index blocks record the starting point for each data block and meta block
The Data block is the basic unit of HBase I/O, and in order to improve efficiency, the hregionserver is based on the LRU block cache mechanism.
The size of each data block can be specified by parameters when creating a table, the large block facilitates sequential scan, and the small block is useful for random queries.
Each data block in addition to the beginning of the magic is a keyvalue stitching, magic content is some random numbers, the purpose is to prevent data corruption.
Each keyvalue pair inside the hfile is a simple byte array. This byte array contains many items and has a fixed structure.
Keylength and Valuelength: Two fixed lengths, each representing the length of the key and value, so you can ignore the direct access to the key, the user can implement jumping in the data.
Key part: Row length is a fixed-length value, indicating the length of the Rowkey, row is rowkey,column Family length is a fixed-length value, indicating the length of the Family is then Column Family, Then the qualifier, then the two fixed-length values representing the time stamp and key Type (Put/delete)
The value section does not have such a complex structure, that is, pure binary data
The role of Zookeeper
1.HBase Dependent zookeeper, HBase management zookeeper (on and off by default)
2.Master and Regionserver are registered to zookeeper when they are started.
The introduction of 3.Zookeeper makes master no longer a single point of failure.
1.ZooKeeper (Location of root table can be found)
2.-root-(will only be stored on a region, from the ROOT table to find the location of the. Meta table)
3..META (stores the user's actual storage location, such as the user table)
4. User tables
1. The table contains. META. List of the region in which the table is located, which is stored in only one table
The location of the-root-table is recorded in 2.ZooKeeper.
. META: table contains all user space region list, and Regionserver server address
so the access process is client before accessing user data, you need to first access Zookeeper , and then access -root- table, then ask . META. table, where you can finally find the location of the user data to access.
HBase Fault Tolerance
Master fault tolerance: Zookeeper re-selects a new master
1. In the absence of master process, the data reading will continue as usual;
2. No master process, region segmentation, load balancing, etc. can not be carried out;
Regionserver fault tolerance: Timing to zookeeper report heartbeat, if the heartbeat does not appear in time,
1.Master reassign the region on the Regionserver to another regionserver,
2. The "Pre-write" Log on the failed server is split by the primary server and sent to the new Regionserver
Zookeeper fault tolerance: Zookeeper is a reliable service, typically configured with 3 or 5 zookeeper instances
Write-ahead-log This mechanism is used for fault tolerance and recovery of data:
Each hregionserver has a Hlog object, Hlog is a class that implements the write Ahead log, and writes a copy of the data to the Memstore file each time the user operation writes Hlog (the Hlog file format is followed). The Hlog file periodically scrolls out of the new and deletes the old file (data that has persisted to storefile).
When Hregionserver terminates unexpectedly, Hmaster will perceive through zookeeper that Hmaster will first process the remaining Hlog files,
The log data of the different region is split, placed in the corresponding region of the directory, and then reassigned the failed region, to receive these region hregionserver in the process of load region, You will find that historical hlog need to be processed, so you will replay the data in Hlog to Memstore, then flush to Storefiles to complete the data recovery
Write-ahead-log (WAL) pre-write log
1.Client when submitting data to the Regionserver, the Wal log is preferred, and when the Wal log is successfully written, the client is informed that the commit data is successful, and if it fails to write to the Wal, it will tell clients that the submission failed. Failed data can be recovered through the Wal log.
2. All the region on a regionserver share a hlog, one commit to write the Wal first, write the Memstore.
HBase Architecture Core Module