How HBase stores data

Source: Internet
Author: User

1, HBase is a structured scalable, high-reliability, column-oriented open-source database, HBase is different from the traditional relational database, using the Bigt able data model, is a suitable for unstructured data storage database. HBase is a sub-project of the Apache Hadoop project. Data model for HBase: An Enhanced sparse sort table (key/value) with keys that consist of row keywords, column keywords, and timestamps. HBase provides random, real-time read and write access to large-scale data. Data saved in HBase can be handled using MapReduce, which combines data storage and parallel computing perfectly.

Data Model: Schema-->table-->column Family-->rowkey-->timestamp-->value


2, the Characteristics of hbase table

HBase tables are large: A table can have billions of rows and millions of columns;

HBase tables are modeless: Each row has a sortable primary key for any number of columns, the columns can be dynamically increased as needed, and different rows in the same table can have different columns;

Column-oriented: column independent search;

Sparse: Empty columns do not occupy storage space, the table can be designed very sparse;

Data type singleton: Data in HBase is a string, no type


HBase Basic Concepts

RowKey: is a byte array, which is the "primary key" for each record in the table;

Column Family: A family of columns with a name (string) that contains one or more related columns

Column: Belongs to a columnfamily,familyname:columnname, each record can be added dynamically

Version Number: The type is long, the default is the system timestamp and can be customized by the user

Value (cell): Byte array


Physical storage:

1. All rows in table are sorted by the dictionary of row key;

2, table in the direction of the division of multiple region;


HBase vs. HDFs

Both have good fault tolerance and extensibility, and can be extended to hundreds of points;

HDFs for batch processing scenarios

Not suitable for incremental data processing

Data Update not supported


The three-dimensional ordered storage of hbase means: Rowkey (row primary key), column Key,timetamp (timestamp) three-dimensional ordered storage.

Rowkey:rowkey is the primary key for the row, and HBase can use only one rowkey. Rowkey is critical to the design of the application layer, which is related to the query efficiency. Rowkey are sorted in dictionary order. And the stored byte code, the dictionary sort, if the letter, is the letter order, for example, has two rowkey,rowkey1:aaa222,rowkey2:bbb111, then Rowkey1 is Rowkey2 front.

Column Key:column key is the second dimension, and after the data is sorted by Rowkey dictionary, if Rowkey is the same, it is sorted according to column key and is sorted by dictionary.

The Timestamp:timestamp is a timestamp, a third dimension, sorted in descending order, that is, the most recent data is in the front row.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.