HBase basics, column-oriented, real-time distributed database

Source: Internet
Author: User
Tags zookeeper

HBase is a NoSQL database running on Hadoop, a distributed and extensible Big Data Warehouse, which means hbase can take advantage of the distributed processing model of HDFS and benefit from the MapReduce program model of Hadoop.

1. HBase definition

HBase is a distributed, column-oriented, open-source database that comes from Google papers written by Fay Chang “ Bigtable: A distributed Storage System ” for structured data. Just as BigTable leverages the distributed data store provided by the Google File system, HBase provides bigtable-like capabilities on top of Hadoop. HBase is a sub-project of the Apache Hadoop project. HBase differs from the general relational database, which is a database suitable for unstructured data storage. The other difference is that HBase is column-based instead of row-based patterns.

650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase01.jpg "/>

2. HBase Features:
    • High reliability

    • High efficiency of

    • Column-oriented

    • Scalable

    • Large-scale structured storage clusters can be built on inexpensive PC servers

3. HBase vs. RDBMS

HBase is suitable for databases with unstructured data stores. A data storage method between the map Entry and the DB row. And RDBMS is a follow “ Codd's 12 rules ” database. The main differences are as follows:

Data type: HBase has only a simple string type, and it only holds all types of strings that are handed to the user to handle. Relational databases can select types

Data manipulation: HBase operations are simple inserts, queries, etc., and tables are separated from each other, with no join

Storage mode: HBase is based on column storage, where each column family is saved by several files, and the files of different column families are separated. The traditional relational database is saved based on the table structure and the row pattern.

Data maintenance: When an hbase update operation is made, the old version is still retained, and the new data is actually inserted. Traditional relational databases are substitution modifications

Scalability : HBase can easily increase or decrease the number of hardware

4. Data Model

650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase02.jpg "/>

Component Description:

Row key:table primary key rows table records are sorted by row key;

Timestamp: Each time the data operation corresponding timestamp, that is, the data version number;

Column Family: A table in a horizontal direction with one or more column families, the column cluster can be composed of any number of columns, the column cluster supports dynamic expansion, without the predefined number and type, binary storage, the user needs to do type conversion.

5. System Architecture

Constituent Parts Description

650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase03.jpg "/>

Client:
Communicating with Hmaster and hregionserver using the hbase RPC mechanism
Client communicates with Hmaster to manage class operations
Client and Hregionserver data read and write class operation

Zookeeper:
Zookeeper Quorum storage-root-table address, Hmaster address
Hregionserver the Ephedral way to zookeeper, Hmaster feel the health of each hregionserver at any time
Zookeeper avoid hmaster single point problem

Hmaster:
Hmaster there is no single point of issue, HBase can start multiple hmaster, through the zookeeper Master election mechanism to ensure that there is always a master running
Mainly responsible for the management of table and region:
1 Manage users to change the table and delete the operation
2 Manage Hregionserver load Balancing, adjust region distribution
3 Region split, responsible for the distribution of the new region
4 after Hregionserver outage, responsible for failure hregionserver on region migration

Hregionserver:
The most core module in HBase, primarily responsible for responding to user I/O requests and reading and writing data to the HDFs file system

6, the design of the table

In the table structure design, HBase has tall narrow and flat wide two design patterns, the former row more than a few columns, the whole table structure is high and narrow, the latter row is much smaller, the table structure is flat and wide; but because hbase can only do split at the line boundary, if you choose flat Wide structure, then when the special line becomes super large (more than the upper limit of file or region), then this behavior will cause compaction, and this is to put the row read memory ~ ~ Therefore, it is strongly recommended to use the tall narrow schema design table structure, This structure is closer to keyvalue and better performance.

This article refer to Hbase.apache reprint please specify reproduced from the HPE control network

HBase basics, column-oriented, real-time distributed database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.