Distributed Database HBase

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HBaseHadoopDatabase is a highly reliable, high-performance, column-oriented, and Scalable Distributed Storage System. HBase technology can be used to build large-scale structured storage clusters on cheap pcservers. HBase is an open-source implementation of GoogleBigtable. Similar to GoogleBigtable, HBase uses GFS as its file storage system and HadoopHDF

HBase Hadoop Database is a highly reliable, high-performance, column-oriented, and Scalable Distributed Storage System. HBase technology can be used to build large-scale structured storage clusters on low-cost PC servers. HBase is an open-source implementation of Google Bigtable. Similar to Google Bigtable, HBase uses GFS as its file storage system and Hadoop HDF

HBase-Hadoop Database is a highly reliable, high-performance, column-oriented, and Scalable Distributed Storage System. HBase technology can be used to build large-scale structured storage clusters on low-cost PC servers.

HBase is an open-source implementation of Google Bigtable, similar to Google Bigtable's use of GFS as its file storage system, HBase uses Hadoop HDFS as its file storage system, and Google runs MapReduce to process massive data in Bigtable, HBase also uses Hadoop MapReduce to process massive data in HBase. Google Bigtable uses Chubby as the collaborative service, and HBase uses Zookeeper as the corresponding service.

This section describes all layers of the Hadoop EcoSystem system. HBase is located at the structured storage layer. Hadoop HDFS provides HBase with underlying storage support with high reliability. Hadoop MapReduce provides HBase with high-performance computing capabilities, zookeeper provides stable services and failover mechanisms for HBase.

In addition, Pig and Hive provide high-level language support for HBase, making data statistics processing on HBase very simple. Sqoop provides a convenient RDBMS data import function for HBase, making it very convenient for traditional database data to be migrated to HBase.

HBase access interface

1. Native Java API, the most common and efficient access method, suitable for Hadoop MapReduce Job concurrent batch processing of HBase table data

2. HBase Shell, HBase command line tool, and the simplest interface are suitable for HBase management.

3. Thrift Gateway uses Thrift serialization technology and supports multiple languages such as C ++, PHP, and Python. It is suitable for other heterogeneous systems to access HBase table data online.

4. REST Gateway supports restful Http APIs to access HBase, removing language restrictions

5. Pig can use Pig Latin stream programming language to operate HBase data. Similar to Hive, Pig Latin is essentially compiled into MapReduce Job to process HBase table data. It is suitable for data statistics.

6. Hive, the current Hive Release version has not yet added support for HBase, but HBase will be supported in the next version Hive 0.7.0, and HBase can be accessed using similar SQL languages

HBase Data Model Table & Column Family

Row Key	Timestamp	Column Family
Row Key	Timestamp	URI	Parser
R1	T3	Url = http://www.taobao.com	Title = daily specials
	T2	Host = taobao.com
	T1
R2	T5	Url = http://www.alibaba.com	Content = every day...
R2	T4	Host = alibaba.com

? Row Key: the Row Key, the primary Key of the Table. Records in the Table are sorted by Row Key.

? Timestamp: The Timestamp corresponding to each data operation. It can be viewed as the version number of the data.

? Column Family: A Column cluster. The Table is composed of one or more Column families in the horizontal direction. A Column Family can contain any number of columns, that is, Column Family supports dynamic expansion, you do not need to define the number and type of columns in advance. All columns are stored in binary format. You need to perform type conversion on your own.

Table & Region

As the number of records increases, the Table gradually splits into multiple splits and becomes regions. A region is represented by [startkey, endkey, different region will be allocated to the corresponding RegionServer by the Master for management:

-ROOT-&. META. Table

HBase has two special tables:-ROOT-And. META.

? . META.: record the Region information of the User table.. META. can have multiple regoin

? -ROOT-: record the Region information of the. META. Table.-ROOT-only one region

? The location of the-ROOT-table is recorded in Zookeeper.

Before the Client accesses user data, it needs to first access zookeeper, then access-ROOT-table, and then access. META. table, and finally the user data can be accessed. network operations need to be performed multiple times in the middle, but the client will cache the data.

MapReduce on HBase

The most convenient and practical model for running batch processing on HBase is MapReduce, for example:

The relationship between HBase Table and Region is similar to the relationship between HDFS File and Block. HBase provides the TableInputFormat and TableOutputFormat APIs to conveniently use HBase Table as the Source and Sink of Hadoop MapReduce, for MapReduce Job application developers, there is basically no need to pay attention to the details of the HBase system.

HBase System Architecture

Client

HBase Client uses the HBase RPC mechanism to communicate with HMaster and HRegionServer. For management operations, the Client and HMaster perform RPC. For data read/write operations, the Client and HRegionServer perform RPC

Zookeeper

In addition to the address of the-ROOT-table and the address of the HMaster, The HRegionServer also registers itself as Ephemeral to the Zookeeper, the HMaster can detect the health status of each HRegionServer at any time. In addition, Zookeeper also avoids the single point of failure of HMaster. See the description below.

HMaster

There is no single point of failure in HMaster. Multiple hmasters can be started in HBase. The Master Election mechanism of Zookeeper ensures that one Master runs. HMaster is mainly responsible for Table and Region management in terms of functions:

1. Manage the operations for adding, deleting, modifying, and querying tables.

2. Manage the load balancing of HRegionServer and adjust the Region distribution

3. After Region Split, allocate the new Region

4. After the HRegionServer is down, it is responsible for migrating Regions on the HRegionServer that fails.

HRegionServer

HRegionServer is mainly responsible for responding to user I/O requests and reading and writing data to the HDFS file system. It is the core module in HBase.

The HRegionServer internally manages a series of HRegion objects. Each HRegion corresponds to a Region in the Table, which consists of multiple hstores. Each HStore corresponds to the storage of a Column Family in the Table. It can be seen that each Column Family is actually a centralized storage unit. Therefore, it is best to put the column with the common IO feature in a Column Family, this is the most efficient way.

HStore is the core of HBase storage, which consists of two parts: MemStore and StoreFiles. MemStore is a Sorted Memory Buffer. The data written by the user is first put into MemStore. When MemStore is full, it is flushed into a StoreFile (HFile is implemented at the underlying layer ), when the number of StoreFile files increases to a certain threshold, the Compact merge operation is triggered to merge multiple StoreFiles into one StoreFile. During the merge process, versions are merged and data is deleted, therefore, we can see that HBase only adds data, and all the update and delete operations are carried out in the subsequent compact process, so that the user's write operations can be returned immediately as long as they enter the memory, this ensures high performance of HBase I/O. When StoreFiles Compact, a larger StoreFile is gradually formed. When the size of a single StoreFile exceeds a certain threshold, the Split operation is triggered, and the current Region is Split into two Region, the parent Region is deprecated, and the newly Split two child Region will be allocated to the corresponding HRegionServer by the HMaster, so that the pressure of the original one Region can be diverted to two Region. Describes the Compaction and Split processes:

After understanding the basic principles of the above HStore, you must also understand the HLog function, because the above HStore is no problem while the system is working normally, however, in a distributed system environment, system errors or downtime cannot be avoided. Therefore, once the HRegionServer unexpectedly exits, memory data in MemStore will be lost, which requires the introduction of HLog. Each HRegionServer has an HLog object. HLog is a class that implements Write Ahead Log. When each user operation is written to MemStore, you can also write a copy of the data to the HLog file (for the HLog file format, see the following). The HLog file regularly rolls out a new one and deletes the old one (the data already persisted to the StoreFile ). When the HRegionServer is accidentally terminated, the HMaster will perceive through Zookeeper that the HMaster will first process the legacy HLog files, split the Log data of different Region into the corresponding region directories, respectively, then, the invalid region will be re-allocated and the HRegionServer that receives these region will find that there is a historical HLog to be processed during the Load Region process, so it will Replay the data in the HLog to MemStore, then flush to StoreFiles to complete data recovery.

HBase Storage Format

All data files in HBase are stored in the Hadoop HDFS file system, which mainly includes the two file types proposed above:

1. HFile: The storage format of KeyValue data in HBase. HFile is a Hadoop binary file. In fact, StoreFile is lightweight packaging of HFile, that is, the underlying layer of StoreFile is HFile.

2. HLog File: The storage format of WAL (Write Ahead Log) in HBase. It is a Hadoop Sequence File physically.

HFile

Is the storage format of HFile:

First, the HFile file is not long, and the fixed length is only two of them: Trailer and FileInfo. As shown in the middle, the Trailer has a pointer to the starting point of other data blocks. File Info records the Meta information of the File, such as AVG_KEY_LEN, AVG_VALUE_LEN, LAST_KEY, COMPARATOR, and MAX_SEQ_ID_KEY. The Data Index and Meta Index blocks record the starting point of each Data block and Meta block.

Data Block is the basic unit of HBase I/O. To improve efficiency, HRegionServer has a LRU-based Block Cache mechanism. The size of each Data Block can be specified by parameters when a Table is created. A large Block is used for Sequential Scan, and a small Block is used for random query. In addition to the Magic at the beginning, each Data block is spliced by KeyValue pairs. The Magic content is random numbers to prevent Data corruption. The internal structure of each KeyValue pair is described in detail later.

Each KeyValue pair in HFile is a simple byte array. However, this byte array contains many items and has a fixed structure. Let's take a look at the specific structure:

Start with two fixed-length values, indicating the Key length and Value length respectively. Followed by Key, starting with a fixed length value, indicating the length of RowKey, followed by RowKey, followed by a fixed length value, indicating the length of Family, then Family, followed by Qualifier, then there are two fixed-length values, indicating Time Stamp and Key Type (Put/Delete ). The Value part does not have such a complex structure, that is, pure binary data.

HLogFile

Indicates the structure of the HLog File. In fact, the HLog File is a common Hadoop Sequence File. The Key of the Sequence File is the HLogKey object, and the HLogKey records the ownership information of the written data, in addition to the table and region names, it also includes the sequence number and timestamp. The timestamp is the "write time", the starting value of the sequence number is 0, or the sequence number that was last stored in the file system.

The Value of HLog Sequece File is the KeyValue object of HBase, which corresponds to the KeyValue in HFile. See the preceding description.

End

This article gives a general introduction to the functions and Design of HBase technology. Due to the limited space, this article does not describe HBase details in depth. Currently, yitao's storage system is built based on HBase technology. We will introduce the "yitao Distributed Storage System" in the future. We will introduce more HBase applications through actual cases.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More