HBase basics, column-oriented, real-time distributed database

Last Update:2015-11-17 Source: Internet

Author: User

Tags zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HBase is a NoSQL database running on Hadoop, a distributed and extensible Big Data Warehouse, which means hbase can take advantage of the distributed processing model of HDFS and benefit from the MapReduce program model of Hadoop.

1. HBase definition

HBase is a distributed, column-oriented, open-source database that comes from Google papers written by Fay Chang “ Bigtable: A distributed Storage System ” for structured data. Just as BigTable leverages the distributed data store provided by the Google File system, HBase provides bigtable-like capabilities on top of Hadoop. HBase is a sub-project of the Apache Hadoop project. HBase differs from the general relational database, which is a database suitable for unstructured data storage. The other difference is that HBase is column-based instead of row-based patterns.

650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase01.jpg "/>

2. HBase Features:

High reliability
High efficiency of
Column-oriented
Scalable
Large-scale structured storage clusters can be built on inexpensive PC servers

3. HBase vs. RDBMS

HBase is suitable for databases with unstructured data stores. A data storage method between the map Entry and the DB row. And RDBMS is a follow “ Codd's 12 rules ” database. The main differences are as follows:

Data type: HBase has only a simple string type, and it only holds all types of strings that are handed to the user to handle. Relational databases can select types

Data manipulation: HBase operations are simple inserts, queries, etc., and tables are separated from each other, with no join

Storage mode: HBase is based on column storage, where each column family is saved by several files, and the files of different column families are separated. The traditional relational database is saved based on the table structure and the row pattern.

Data maintenance: When an hbase update operation is made, the old version is still retained, and the new data is actually inserted. Traditional relational databases are substitution modifications

Scalability : HBase can easily increase or decrease the number of hardware

4. Data Model

650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase02.jpg "/>

Component Description:

Row key:table primary key rows table records are sorted by row key;

Timestamp: Each time the data operation corresponding timestamp, that is, the data version number;

Column Family: A table in a horizontal direction with one or more column families, the column cluster can be composed of any number of columns, the column cluster supports dynamic expansion, without the predefined number and type, binary storage, the user needs to do type conversion.

5. System Architecture

Constituent Parts Description

650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase03.jpg "/>

Client:
Communicating with Hmaster and hregionserver using the hbase RPC mechanism
Client communicates with Hmaster to manage class operations
Client and Hregionserver data read and write class operation

Zookeeper:
Zookeeper Quorum storage-root-table address, Hmaster address
Hregionserver the Ephedral way to zookeeper, Hmaster feel the health of each hregionserver at any time
Zookeeper avoid hmaster single point problem

Hmaster:
Hmaster there is no single point of issue, HBase can start multiple hmaster, through the zookeeper Master election mechanism to ensure that there is always a master running
Mainly responsible for the management of table and region:
1 Manage users to change the table and delete the operation
2 Manage Hregionserver load Balancing, adjust region distribution
3 Region split, responsible for the distribution of the new region
4 after Hregionserver outage, responsible for failure hregionserver on region migration

Hregionserver:
The most core module in HBase, primarily responsible for responding to user I/O requests and reading and writing data to the HDFs file system

6, the design of the table

In the table structure design, HBase has tall narrow and flat wide two design patterns, the former row more than a few columns, the whole table structure is high and narrow, the latter row is much smaller, the table structure is flat and wide; but because hbase can only do split at the line boundary, if you choose flat Wide structure, then when the special line becomes super large (more than the upper limit of file or region), then this behavior will cause compaction, and this is to put the row read memory ~ ~ Therefore, it is strongly recommended to use the tall narrow schema design table structure, This structure is closer to keyvalue and better performance.

This article refer to Hbase.apache reprint please specify reproduced from the HPE control network

HBase basics, column-oriented, real-time distributed database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More