Hbase Technology Introduction

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

Hfile
Hlogfile

Http://www.searchtb.com/2011/01/understanding-hbase.html

Hbase Introduction

Hbase-hadoop database is a highly reliable, high-performance, column-oriented, and Scalable Distributed Storage System. hbase technology can be used to build large-scale structured storage clusters on low-cost PC servers.

Hbase is an open-source implementation of Google bigtable, similar to Google
Bigtable uses GFS as its file storage system, and hbase uses hadoop
HDFS is its file storage system. Google runs mapreduce to process massive data in bigtable. hbase also uses hadoop
Mapreduce is used to process massive data in hbase. Google bigtable uses
Chubby is used as the collaborative service, and hbase uses zookeeper as the corresponding service.

Describes the systems at different layers in hadoop ecosystem. hbase is located at the structured storage layer, and hadoop
HDFS provides high-reliability underlying storage support for hbase, hadoop
Mapreduce provides high-performance computing capabilities for hbase, and zookeeper provides stable services and Failover mechanisms for hbase.

In addition, pig and hive provide high-level language support for hbase, making data statistics processing on hbase very simple. Sqoop provides a convenient RDBMS data import function for hbase, making it very convenient for traditional database data to be migrated to hbase.

Hbase access interface

1. Native Java API, the most common and efficient access method, suitable for hadoop mapreduce job concurrent batch processing of hbase table data

2. hbase shell, hbase command line tool, and the simplest interface are suitable for hbase management.

3. Thrift gateway uses thrift serialization technology and supports multiple languages such as C ++, PHP, and python. It is suitable for other heterogeneous systems to access hbase table data online.

4. Rest gateway supports restful HTTP APIs to access hbase, removing language restrictions

5. pig can use pig latin stream programming language to operate hbase data. Similar to hive, Pig Latin is essentially compiled into mapreduce job to process hbase table data. It is suitable for data statistics.

6. hive, the current hive release version has not yet added support for hbase, but hbase will be supported in the next version hive 0.7.0, and hbase can be accessed using similar SQL languages

Hbase Data Model Table & Column family

Row key	Timestamp	Column family
Row key	Timestamp	Uri	Parser
R1	T3	Url = http://www.taobao.com	Title = daily specials
	T2	Host = taobao.com
	T1
R2	T5	Url = http://www.alibaba.com	Content = every day...
R2	T4	Host = alibaba.com

Ø row key: the row key, the table's primary key, and the records in the table are sorted by row key

Ø timestamp: The timestamp corresponding to each data operation. It can be regarded as the version number of the data.

Column family: A column cluster. The table is composed of one or more column families in the horizontal direction.
Family can be composed of any number of columns, that is, column
Family supports dynamic expansion. You do not need to define the number and type of columns in advance. All columns are stored in binary format and type conversion is required.

Table & Region

As the number of records increases, the table gradually splits into multiple splits and becomes regions. A region is represented by [startkey, endkey, different region will be allocated to the corresponding regionserver by the master for management:

-Root-&. Meta. Table

Hbase has two special tables:-root-And. Meta.

Ø. Meta.: records the region information of the User table.. Meta. can have multiple regoin

-Root-: record the region information of the. Meta. Table.-root-only one region

Ø the location of the-root-table is recorded in zookeeper.

Before the client accesses user data, it needs to first access zookeeper, then access-root-table, and then access. meta. table, and finally the user data can be accessed. network operations need to be performed multiple times in the middle, but the client will cache the data.

Mapreduce on hbase

The most convenient and practical model for running batch processing on hbase is mapreduce, for example:

Relationship between hbase table and region, similar to HDFS
Relationship between file and block. hbase provides the tableinputformat and tableoutputformat
You can use hbase table as the source and sink of hadoop mapreduce.
Job application developers do not need to pay attention to the details of hbase systems.

Hbase System Architecture

Client

Hbase client uses the hbase RPC mechanism to communicate with hmaster and hregionserver. For management operations, the client and hmaster perform rpc. For data read/write operations, the client and hregionserver perform RPC

Zookeeper

Zookeeper
In addition to the address of the-root-table and the address of the hmaster, The hregionserver also registers itself as an ephemeral
In zookeeper, The hmaster can detect the health status of each hregionserver at any time. In addition, Zookeeper also avoids
Single point of failure (spof). See the description below.

Hmaster

There is no single point of failure in hmaster. Multiple hmasters can be started in hbase. The master election mechanism of zookeeper ensures that one master runs. hmaster is mainly responsible for table and region management in terms of functions:

1. Manage the operations for adding, deleting, modifying, and querying tables.

2. Manage the load balancing of hregionserver and adjust the region distribution

3. After region split, allocate the new region

4. After the hregionserver is down, it is responsible for migrating regions on the hregionserver that fails.

Hregionserver

Hregionserver is mainly responsible for responding to user I/O requests and reading and writing data to the HDFS file system. It is the core module in hbase.

The hregionserver internally manages a series of hregion objects. Each hregion corresponds to a region in the table.
Hstore. Each hstore corresponds to the storage of a column family in the table. We can see that each column
Family is actually a centralized storage unit, so it is best to put the column with the common Io feature in a column family, which is the most efficient.

Hstore is the core of hbase storage, which consists of two parts: memstore and storefiles. Memstore is
Sorted memory
Buffer, the data written by the user is first put into memstore. When memstore is full, it will flush into a storefile (the underlying implementation is hfile ),
When the number of storefile files increases to a certain threshold, the Compact merge operation is triggered, and multiple storefiles are merged into one storefile.
Row version merge and data deletion. Therefore, we can see that hbase only adds data, and all update and delete operations are carried out in the subsequent Compact process, so that the user's write operations only
When you enter the memory, you can return immediately, ensuring the high performance of hbase I/O. When storefiles
After compact, a larger storefile is formed gradually. When the size of a single storefile exceeds a certain threshold, the split operation is triggered, and the current
Region
Split to 2 region, the parent region will go offline, and the new two child region will be allocated to the corresponding hregionserver by the hmaster.
So that the pressure of the original region can be diverted to the two region. Describes the compaction and split processes:

After understanding the basic principles of the above hstore, you must also understand the hlog function, because the above hstore is normal while the system works, but in the distributed
In the system environment, you cannot avoid system errors or downtime. Therefore, once the hregionserver unexpectedly exits, memory data in memstore will be lost, which requires the introduction of hlog.
Each hregionserver has an hlog object. hlog is an implementation of write ahead.
The log class writes a copy of data to the hlog file every time a user writes data to memstore (see the subsequent hlog File Format). The hlog file will be rolled out on a regular basis and
Delete the old file (data that has been persisted to the storefile ). When the hregionserver is accidentally terminated, the hmaster will be aware of it through zookeeper.
The hmaster will first process the legacy
Hlog file, split the log data of different region into the corresponding region directory, and re-allocate the invalid region.
When the hregionserver of these region loads the region, it will find that there is a historical hlog to be processed, so it will replay
Data in hlog is stored in memstore, and then flushed to storefiles to recover data.

Hbase Storage Format

All data files in hbase are stored in the hadoop HDFS file system, which mainly includes the two file types proposed above:

1. hfile: The storage format of keyValue data in hbase. hfile is a hadoop binary file. In fact, storefile is lightweight packaging of hfile, that is, the underlying layer of storefile is hfile.

2. hlog file: The storage format of Wal (write ahead log) in hbase. It is a hadoop Sequence File physically.

Hfile

Is the storage format of hfile:

First, the hfile file is not long, and the fixed length is only two of them: trailer and fileinfo. As shown in the middle, the trailer has a pointer pointing to another number.
The starting point of the data block. File info records some meta information of the file, such as avg_key_len, avg_value_len,
Last_key, comparator, and max_seq_id_key. Data Index and meta
The index block records the starting point of each data block and meta block.

Data block is the basic unit of hbase I/O. To improve efficiency, hregionserver has a LRU-based block.
Cache mechanism. The size of each data block can be specified by parameters when a table is created. A large block is used for Sequential Scan, and a small block is used for random query.
In addition to the magic at the beginning, each data block is spliced by keyValue pairs,
Magic contains random numbers to prevent data corruption. The internal structure of each keyValue pair is described in detail later.

Each keyValue pair in hfile is a simple byte array. However, this byte array contains many items and has a fixed structure. Let's take a look at the specific structure:

Start with two fixed-length values, indicating the key length and value length respectively. Followed by the key, starting with a fixed length value, indicating the length of the rowkey, followed
Rowkey, followed by a fixed-length value, indicating the length of family, followed by family, followed by qualifier, followed by two fixed-length values, indicating time
Stamp and key type (put/delete ). The value part does not have such a complex structure, that is, pure binary data.

Hlogfile

Indicates the structure of the hlog file. In fact, the hlog file is a common hadoop Sequence File, Sequence File
The key of is an hlogkey object. The hlogkey records the ownership information of the written data. Besides the table and region names, it also includes the sequence
Number, timestamp, timestamp is "write time", sequence
The starting value of number is 0, or the sequence number that was last stored in the file system.

The value of hlog sequece file is the keyValue object of hbase, which corresponds to the keyValue in hfile. See the preceding description.

End

This article gives a general introduction to the functions and Design of hbase technology. Due to the limited space, this article does not describe hbase details in depth. Currently, yitao's storage system is built based on hbase technology. We will introduce the "yitao Distributed Storage System" in the future. We will introduce more hbase applications through actual cases.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More