1, HBase is a structured scalable, high-reliability, column-oriented open-source database, HBase is different from the traditional relational database, using the Bigt able data model, is a suitable for unstructured data storage database. HBase is a sub-project of the Apache Hadoop project. Data model for HBase: An Enhanced sparse sort table (key/value) with keys that consist of row keywords, column keywords, and timestamps. HBase provides random, real-time read and write access to large-scale data. Data saved in HBase can be handled using MapReduce, which combines data storage and parallel computing perfectly.
Data Model: Schema-->table-->column Family-->rowkey-->timestamp-->value
2, the Characteristics of hbase table
HBase tables are large: A table can have billions of rows and millions of columns;
HBase tables are modeless: Each row has a sortable primary key for any number of columns, the columns can be dynamically increased as needed, and different rows in the same table can have different columns;
Column-oriented: column independent search;
Sparse: Empty columns do not occupy storage space, the table can be designed very sparse;
Data type singleton: Data in HBase is a string, no type
HBase Basic Concepts
RowKey: is a byte array, which is the "primary key" for each record in the table;
Column Family: A family of columns with a name (string) that contains one or more related columns
Column: Belongs to a columnfamily,familyname:columnname, each record can be added dynamically
Version Number: The type is long, the default is the system timestamp and can be customized by the user
Value (cell): Byte array
Physical storage:
1. All rows in table are sorted by the dictionary of row key;
2, table in the direction of the division of multiple region;
HBase vs. HDFs
Both have good fault tolerance and extensibility, and can be extended to hundreds of points;
HDFs for batch processing scenarios
Not suitable for incremental data processing
Data Update not supported
The three-dimensional ordered storage of hbase means: Rowkey (row primary key), column Key,timetamp (timestamp) three-dimensional ordered storage.
Rowkey:rowkey is the primary key for the row, and HBase can use only one rowkey. Rowkey is critical to the design of the application layer, which is related to the query efficiency. Rowkey are sorted in dictionary order. And the stored byte code, the dictionary sort, if the letter, is the letter order, for example, has two rowkey,rowkey1:aaa222,rowkey2:bbb111, then Rowkey1 is Rowkey2 front.
Column Key:column key is the second dimension, and after the data is sorted by Rowkey dictionary, if Rowkey is the same, it is sorted according to column key and is sorted by dictionary.
The Timestamp:timestamp is a timestamp, a third dimension, sorted in descending order, that is, the most recent data is in the front row.