Apache Open Source Project--hbase

Source: Internet
Author: User
Tags compact file info hadoop mapreduce hadoop ecosystem

Hbase–hadoop Database is a highly reliable, high-performance, column-oriented, scalable distributed storage system that leverages HBase technology to build large-scale structured storage clusters on inexpensive PC servers.

HBase is an open source implementation of Google BigTable, like Google BigTable using GFS as its file storage system, HBase uses Hadoop HDFs as its file storage system , Google runs mapreduce to handle massive amounts of data in bigtable, and HBase uses Hadoop MapReduce to handle massive amounts of data in HBase, and Google BigTable uses chubby as a collaborative service HBase uses zookeeper as its counterpart.

Describes the various layer systems in Hadoop ecosystem, where HBase is located in a structured storage tier, Hadoop HDFS provides high-reliability underlying storage support for HBase, and Hadoop MapReduce provides high-performance computing power for HBase. Zookeeper provides a stable service and failover mechanism for hbase.

In addition, pig and hive provide high-level language support for HBase, making data statistics processing on hbase very simple. Sqoop provides a convenient RDBMS data import function for HBase, which makes it very convenient to migrate traditional database data to hbase.

HBase Access Interface

1. Native Java API, the most regular and efficient way to access, suitable for Hadoop MapReduce job parallel batching hbase table data

2. HBase shell,hbase command-line tool, the simplest interface for hbase management use

3. Thrift Gateway, using Thrift serialization technology to support multiple languages such as C++,php,python, to access hbase table data online for other heterogeneous systems

4. Rest Gateway, which supports the rest-style HTTP API to access HBase, lifting language restrictions

5. Pig, you can use the Pig Latin streaming programming language to manipulate data in HBase, similar to hive, and ultimately compiled into a mapreduce job to process hbase table data for data statistics

6. Hive, the release version of the current hive is not yet joined to HBase support, but HBase will be supported in the next version of Hive 0.7.0, and can be accessed using SQL-like language to access HBase

HBase Data Model table & Column Family content= per day ...
Row Key Timestamp Column Family
URI Parser
t3 url=http://www.taobao.com title= daily specials
t2 host=taobao.com  
t1    
t5 url=http://www.alibaba.com
t4 host=alibaba.com  

Ørow key: Row key, table's primary key, table's record sorted by row key

Øtimestamp: timestamp, timestamp of each data operation, can be considered as the version number of the data

Øcolumn Family: Column cluster, table in the horizontal direction has one or more column Family composition, a column Family can be composed of any number of columns, that is, column Family support dynamic expansion, There is no need to pre-define the number and type of column, all column is stored in binary format, and the user needs to do the type conversion themselves.

Table & Region

As the table grows larger as the number of records increases, it gradually splits into multiple splits, becoming regions, a region represented by [Startkey,endkey], Different region will be managed by the master assigned to the appropriate regionserver:

-root-&&. META. Table

There are two special table,-root-and. META in HBase.

O. Meta.: Records the region information of the user table,. Meta. can have multiple Regoin

ø-root-: recorded the. META. Table's region information,-root-only one region

The location of the-root-table is recorded in the Øzookeeper

Before the client accesses the user data, it needs to first access the zookeeper and then accesses the-root-table. META. Table, the last to find the location of user data to access, in the middle of the need for multiple network operations, but the client will do cache caching.

MapReduce on HBase

The most convenient and practical model for running batch operations on the HBase system is still mapreduce, such as:

The relationship between HBase table and region, compared to the HDFs file and block relationships, HBase provides the matching Tableinputformat and Tableoutputformat APIs that make it easy to add hbase Table as source and sink for Hadoop MapReduce, for MapReduce job application developers, there is little need to focus on the details of the hbase system itself.

HBase System Architecture

Client

HBase client uses the RPC mechanism of hbase to communicate with Hmaster and Hregionserver, and for management class operations, client and Hmaster RPC, and for data read-write class operations, RPC for client and Hregionserver

Zookeeper

In addition to storing the address of the-root-table and Hmaster address in Zookeeper quorum, Hregionserver also registers itself in ephemeral mode, So that hmaster can feel the health state of each hregionserver at any time. In addition, zookeeper also avoids the single point problem of Hmaster, as described below

Hmaster

Hmaster there is no single point of issue, HBase can start multiple hmaster, through the zookeeper Master election mechanism to ensure that there is always a master run, Hmaster function is mainly responsible for table and region management work:

1. Manage user's increment, delete, change, check operation to table

2. Manage Hregionserver load balancing and adjust region distribution

3. After split in region, responsible for the distribution of the new region

4. After hregionserver outage, responsible for regions migration on the failed Hregionserver

Hregionserver

Hregionserver is primarily responsible for responding to user I/O requests and reading and writing data to the HDFs file system, which is the core module in HBase.

Hregionserver internally manages a series of Hregion objects, each of which corresponds to a region,hregion in a table consisting of multiple hstore. Each hstore corresponds to the storage of a column family in the table, and you can see that each column family is actually a centralized storage unit, so it's best to place a column with the common IO feature in a column family. This is the most effective.

Hstore storage is the core of hbase storage, which consists of two parts, part Memstore, and part storefiles. Memstore is Sorted Memory Buffer, the user writes the data first will put into Memstore, when the Memstore full will be flush into a storefile (the underlying implementation is hfile), When the number of storefile files increases to a certain threshold, the compact merge operation is triggered, merging multiple storefiles into one storefile, the merge process is versioned and data is deleted, so you can see that hbase actually only adds data, All updates and deletions are performed during the subsequent compact process, which allows the user's write operations to return immediately as soon as they enter memory, guaranteeing the high performance of hbase I/O. When the Storefiles compact, will gradually become more and more large storefile, when a single storefile size exceeds a certain threshold, will trigger the split operation, while the current region split into 2 region, the parent region will be offline, The new split of the 2 children region will be hmaster assigned to the corresponding hregionserver, so that the original 1 region of the pressure can be diverted to 2 region. Describes the process of compaction and split:

After understanding the basic principles of the above hstore, it is also necessary to understand the Hlog function, because the above Hstore in the system is not a problem under the premise of normal operation, but in a distributed system environment, can not avoid system error or downtime, so once hregionserver unexpectedly quit, The memory data in the Memstore will be lost, which requires the introduction of Hlog. Each hregionserver has a Hlog object, Hlog is a class that implements the write Ahead log, and writes a copy of the data to the Memstore file each time the user operation writes Hlog (the Hlog file format is followed). The Hlog file periodically scrolls out of the new and deletes the old file (data that has persisted to storefile). When the hregionserver unexpected termination, Hmaster will be aware through zookeeper, Hmaster will first deal with the remaining hlog files, the different region of the log data is split, respectively, placed in the corresponding region of the directory, Then redistribute the failed region, pick up the hregionserver of these region in the process of load region, will find that there is a history hlog need to deal with, so will replay Hlog data into Memstore, Then flush to Storefiles to complete the data recovery.

HBase storage Format

All data files in HBase are stored on the Hadoop HDFs file system, mainly including the two file types presented above:

1. hfile, hbase keyvalue data storage format, hfile is a hadoop binary format file, in fact StoreFile is the hfile to do a lightweight packaging, that is storefile the bottom is hfile

2. Storage format of the Wal (Write Ahead Log) in HLog File,hbase, which is physically the sequence File of Hadoop

hfile

is the storage format for hfile:

First of all, the hfile file is indefinite, with a fixed length of only two blocks: trailer and FileInfo. In the center of the trailer, there is a pointer to the starting point of the other data block. File info records Some meta-information about files such as: Avg_key_len, Avg_value_len, Last_key, COMPARATOR, Max_seq_id_key, and so on. The data index and Meta index blocks record the starting point for each data block and meta block.

The Data block is the basic unit of HBase I/O, and in order to improve efficiency, the hregionserver is based on the LRU block cache mechanism. The size of each data block can be specified by parameters when creating a table, the large block facilitates sequential scan, and the small block is useful for random queries. Each data block in addition to the beginning of the magic is a keyvalue stitching, magic content is some random numbers, the purpose is to prevent data corruption. The internal construction of each keyvalue pair is described in detail later.

Each keyvalue pair inside the hfile is a simple byte array. However, this byte array contains many items and has a fixed structure. Let's take a look at the concrete structure inside:

The start is a two fixed-length number that represents the length of the key and the length of the value, respectively. Next is the key, which starts with a fixed-length value that represents the length of the RowKey, followed by a RowKey, then a fixed-length value that represents the length of the family, then the family, then the qualifier, then the two fixed-length values that represent the time Stamp and Key Type (Put/delete). The value section does not have such a complex structure, which is purely binary data.

Hlogfile

The structure of the Hlog file, in fact, Hlog file is a common Hadoop Sequence file,sequence files Key is a Hlogkey object, Hlogkey records the attribution information written to the data, In addition to table and region names, including sequence number and Timestamp,timestamp are "write Time", the starting value of sequence is 0, or the last time it was saved to the file system sequence Number

The value of HLog sequece file is the KeyValue object of HBase, which corresponds to KeyValue in hfile, as described above.

End

In this paper, the function and design of hbase technology are introduced, and due to the limited space, this article does not describe the details of hbase in much depth. At present, a Amoy storage system is based on HBase technology, the following will introduce "a Amoy distributed Storage System", through the actual case to more introduction hbase application.

Description from: http://www.searchtb.com/2011/01/understanding-hbase.html

Apache Open Source Project--hbase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.