HBase Introduction and Data Model

Source: Internet
Author: User
Tags hadoop ecosystem
HBase Introduction

HBase is a distributed storage system built on HDFS;
HBase is a typical key/value system developed based on the Google BigTable model.
HBase is an important member of the Apache Hadoop ecosystem, which is mainly used for storing large amounts of structured data.
Logically, HBase stores data according to tables, rows, and columns;
Like Hadoop, the HBase target relies heavily on scaling to increase computing and storage capabilities by increasing the number of inexpensive business servers. Advantages of HBase

The random reading and writing are transformed into sequential reading and writing, which adapts to high concurrent writing;
The balance effect is good, the reading and writing performance and machine number keep linear correlation;
There is no column in the row that holds the data, and does not occupy the storage space;
Distributed features: Based on HDFs, ZK, consistency, usability, partitioning tolerance, large data storage, easy to expand. Characteristics of HBase table

Big: A table can have billions of rows, millions of columns;
Modeless: Each row has a sortable primary key and any number of columns, the columns can be dynamically increased as needed, and different rows in the same table can have distinct columns;
Column oriented: column (family)-oriented storage and permission control, column (family) independent search;
Sparse: Empty (NULL) columns do not occupy storage space, the table can be designed very sparse;
Multiple versions of data: The data in each cell can have multiple versions, and by default the version number is automatically allocated, which is the time stamp when the cell is inserted;
Data type Single: The data in HBase is a string and has no type. HBase VS RDBMS

The data in the HBase are string types (strings);
HBase only ordinary additions, deletions and other operations, there is no association between the table query;
HBase is based on column-type storage, and RDBMS is based on row-type storage;
HBase is suitable for storing large amounts of data, and the query efficiency is extremely high. hbase Data Model

The application stores the data in a hbase way in a table. A table is made up of rows and columns, all of which are subordinate to a particular column family. The intersection of rows and columns, called Cell,cell, is versioned. The contents of the cell are an indivisible array of bytes.
A table's row key is also a byte array, so anything can be saved, whether it's a string or a number. HBase tables are sorted by key, sorted by byte. All tables must have primary key-key.

The logical view is as follows:

Time
row key (row keys)Stamp columnfamily Contents (column family contents) columnfamily anchor (column family anchor)
Com.cnn.www T1 anchor:cnnsi.com = "CNN"
Com.cnn.www T2 anchor:my.look.ca = "CNN.com"
Com.cnn.www T3 contents:html = "..."
hbase Basic Concepts 1. Line keys (Rowkey)

Rowkey: Is the byte array, is the table each record "The primary Key", facilitates the quick search, the Rowkey design is very important.
A row key is an array of bytes, and any string can be used as a row key;
The rows in the table are sorted according to the row key, and the data is sorted according to the byte order of the row key (byte ordering);
All access to a table is through a row key (single Rowkey access, or Rowkey range access, or a full table scan) 2. Column family (columnfamily)

Column Family: A family of columns with a name (string) that contains one or more related columns
The column family must be given when the table is defined;
Each CF can have one or more column members (Columnqualifier), the column members do not need to be given when the table is defined, and the new column family members can then be added on demand and dynamically;
The data is stored separately by the column family, and the HBase so-called column storage is stored separately according to the column family (each column family corresponds to a store), this design is very suitable for the data analysis situation. 3. Columns (column)

Column: Belongs to a columnfamily,familyname:columnname, each record can be added dynamically 4. Timestamp (TimeStamp)

Each cell may be more than one version, with a time stamp that distinguishes between 5. Cells (cell)

Cell by row key, column family: qualifier, time stamp only decision;
The data in the cell is not of type, all stored in byte code; 6. Area (Region)

HBase automatically divides the table horizontally (by row) into multiple areas (region), and each region holds a contiguous piece of data in a table;
Each table has only one region at the beginning, and as the data is constantly inserted into the table, the region grows, When increased to a threshold, region will wait for the branch two new region;
when the rows in the table are increasing, there will be more and more region. Such a complete table is saved on multiple region;
Hregion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit indicates that different hregion can be distributed on different hregionserver. However, a hregion is not split across multiple servers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.