HBase is a NoSQL database running on Hadoop, a distributed and extensible Big Data Warehouse, which means hbase can take advantage of the distributed processing model of HDFS and benefit from the MapReduce program model of Hadoop.
1. HBase definition
HBase is a distributed, column-oriented, open-source database that comes from Google papers written by Fay Chang “ Bigtable: A distributed Storage System ” for structured data. Just as BigTable leverages the distributed data store provided by the Google File system, HBase provides bigtable-like capabilities on top of Hadoop. HBase is a sub-project of the Apache Hadoop project. HBase differs from the general relational database, which is a database suitable for unstructured data storage. The other difference is that HBase is column-based instead of row-based patterns.
650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase01.jpg "/>
2. HBase Features:
3. HBase vs. RDBMS
HBase is suitable for databases with unstructured data stores. A data storage method between the map Entry and the DB row. And RDBMS is a follow “ Codd's 12 rules ” database. The main differences are as follows:
Data type: HBase has only a simple string type, and it only holds all types of strings that are handed to the user to handle. Relational databases can select types
Data manipulation: HBase operations are simple inserts, queries, etc., and tables are separated from each other, with no join
Storage mode: HBase is based on column storage, where each column family is saved by several files, and the files of different column families are separated. The traditional relational database is saved based on the table structure and the row pattern.
Data maintenance: When an hbase update operation is made, the old version is still retained, and the new data is actually inserted. Traditional relational databases are substitution modifications
Scalability : HBase can easily increase or decrease the number of hardware
4. Data Model
650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase02.jpg "/>
Component Description:
Row key:table primary key rows table records are sorted by row key;
Timestamp: Each time the data operation corresponding timestamp, that is, the data version number;
Column Family: A table in a horizontal direction with one or more column families, the column cluster can be composed of any number of columns, the column cluster supports dynamic expansion, without the predefined number and type, binary storage, the user needs to do type conversion.
5. System Architecture
Constituent Parts Description
650) this.width=650; "alt=" HBase Learning "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2015/ Hbase03.jpg "/>
Client:
Communicating with Hmaster and hregionserver using the hbase RPC mechanism
Client communicates with Hmaster to manage class operations
Client and Hregionserver data read and write class operation
Zookeeper:
Zookeeper Quorum storage-root-table address, Hmaster address
Hregionserver the Ephedral way to zookeeper, Hmaster feel the health of each hregionserver at any time
Zookeeper avoid hmaster single point problem
Hmaster:
Hmaster there is no single point of issue, HBase can start multiple hmaster, through the zookeeper Master election mechanism to ensure that there is always a master running
Mainly responsible for the management of table and region:
1 Manage users to change the table and delete the operation
2 Manage Hregionserver load Balancing, adjust region distribution
3 Region split, responsible for the distribution of the new region
4 after Hregionserver outage, responsible for failure hregionserver on region migration
Hregionserver:
The most core module in HBase, primarily responsible for responding to user I/O requests and reading and writing data to the HDFs file system
6, the design of the table
In the table structure design, HBase has tall narrow and flat wide two design patterns, the former row more than a few columns, the whole table structure is high and narrow, the latter row is much smaller, the table structure is flat and wide; but because hbase can only do split at the line boundary, if you choose flat Wide structure, then when the special line becomes super large (more than the upper limit of file or region), then this behavior will cause compaction, and this is to put the row read memory ~ ~ Therefore, it is strongly recommended to use the tall narrow schema design table structure, This structure is closer to keyvalue and better performance.
This article refer to Hbase.apache reprint please specify reproduced from the HPE control network
HBase basics, column-oriented, real-time distributed database