[HBase] data model (Logical Structure) HBase stores data in the form of tables. A table consists of rows and columns. The column is divided into several rowfamily columns. The logical view is as follows: The following describes several key concepts: 1) RowKey-the row key is a byte array, any string can be used as the row key. -- the rows in the table are sorted by the row key,
[HBase] data model (Logical Structure) HBase stores data in the form of tables. A table consists of rows and columns. The column is divided into several row families. The logical view is as follows: The following describes several key concepts: 1) RowKey-the row key is a byte array, any string can be used as the row key. -- the rows in the table are sorted by the row key,
[HBase] data model (logical structure)
HBase stores data in tables. A table consists of rows and columns. The columns are divided into several row families. The logical view is as follows:
The following describes several key concepts:
1) RowKey)
-- The row key is a byte array. Any string can be used as the row key;
-- The rows in the table are sorted by the Row key and stored in byte order of the Row key;
-- Access to All Tables must be performed through the row key (single RowKey access, RowKey range access, or full table scan)
2) ColumnFamily)
-- CF must be provided when the table is defined
-- Each CF can have one or more ColumnQualifier columns. The column members do not need to be given when the table is defined. New column family members can be added as needed and dynamically.
-- Data is stored separately by CF. the so-called column-based storage of HBase is stored separately by CF (each CF corresponds to a Store). This design is very suitable for data analysis.
3) TimeStamp)
-- Each Cell may have multiple versions, which are distinguished by timestamps.
4) Cell)
-- Cell is determined by the row key, column family: qualifier, and time stamp
-- The data in Cell is of no type and all are stored in bytecode format.
5) Region)
-- HBase automatically divides the table horizontally (by Row) into multiple regions (region), and each region stores a continuous data segment in the table;
-- Each table has only one region at the beginning. As the data is inserted into the table, the region increases. When the value increases to a threshold, the region will wait for two new region segments;
-- When the number of rows in the table increases, more and more region will occur. In this way, a complete table is saved on multiple Region instances.
-- HRegion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit indicates that different HRegion can be distributed on different hregionservers. However, an HRegion will not be split into multiple servers.