"HBase Authority Guide" in depth study one: Understanding HBase

Source: Internet
Author: User
Keywords nbsp can provide recognize
"HBase Authority Guide" in-depth study one: Know hbase blog Category: Hbasehadoop hbasehadoopnosql One, hbase what?


first HBase is a distributed database of NoSQL in a HDFS distributed storage system on Hadoop that is located in a mapping (Key/value), with the following characteristics:


1, provide random, real-time read and write query, by HBase architecture and lsm-tree data structure assurance;


2, high fault tolerance, this characteristic is guaranteed by HDFs;


3, distributed and linear expansion;


4, column-oriented and storage-loose data;


5, stored data can have multiple versions;


6, Configurable table braking function;


7, node automatic fault recovery and election mechanism, provided by zookeeper;


8, can be convenient for mapreduce, hive and pig inheritance.


HBase is an important part of Hadoop's entire ecosystem, making up for the mapreduce features of Hadoop's only high latency batch, which provides storage for apps down, and provides real-time operations and queries up In addition, the MapReduce parallel computing model can be used for large-scale data processing, hbase the data storage and parallel computing, real-time and batch processing almost perfect combination.




Node classification and communication
in
ii. hbase cluster

The nodes in the HBase cluster are divided into Hmaster server and Hregion server two, in Master-slave mode, but not as a single point of failure like the cluster in Hadoop.

Nodes in the
hbase cluster are communicated through the zookeeper cluster, which listens to the status of nodes in the zookeeper cluster by HBase.

The
hbase cluster can set up multiple Hmaster server nodes, but at the same time there can be only one node in the active state providing the service, and the Hmaster node does not have a single point of failure. When the zookeeper cluster OIOS hear the hmaster that is in service state in a listening cycle, the zookeeper cluster elects another Hmaster node to serve the entire hbase cluster through an electoral mechanism. Zookeeper always ensures that a hmaster is available in the HBase cluster, and zookeeper listens to the status of the Hregion server and provides automatic failback of the failed node.


because the node state of HBase cluster and communication between nodes are provided by zookeeper, a
must be established.

Zookeeper cluster.




The function of nodes in the cluster of
HBase


HBase is primarily responsible for the management of HBase table and Hregion server, including a few things:


1, manage load balancing of Hregion server servers, adjust hregion server in hregion to avoid hot hregion;


2, after region division, responsible for the new hregion distribution;


3, when a hregion server is machine or downtime, responsible for the hregion migration on the failed hregion server.

The main features of
Hregion server are as follows:


1, is responsible for responding to user requests (that is, the htable of the increase, delete, check and other operations);


2, responsible for hfile in Hregion server, including minor and major comparison two strategies;


3, Hregion automatic horizontal segmentation.





four, hbase and RDBMS traditional database differences


1, storage mode: HBase is a column-based storage mode in which a table can have multiple column groups, and a column group must be saved when creating htable-equivalent to a table structure in a traditional database, separating groups to keep data, separate column groups for storing files, and RDBMS is a tabular storage structure based on row patterns;


2, scalability: HBase Natural with high scalability, you can easily increase or decrease the cluster nodes at the same time, to ensure high fault tolerance, and the pain of the RDBMS database to improve the high scalability is more difficult;


3, Transactional: HBase is not transactional, and RDBMS is a transactional database;


4, Storage data Volume: HBase can easily store T or p level above data, and RDBMS parity is suitable for storing g and data;


5, data operation: HBase can only deal with very simple additions and deletions, tables and tables are separated, there is no complex table and the relationship between the table, but also can not do the association between the table operation; traditional RDBMS databases usually have rich inter-table connection operations and various functions;


6, data type: HBase can only store simple string types, all type conversions are processed by the client, while hbase only store strings; relational databases have rich data types;


7. Multi-version mechanism of data storage: data stored in HBase can have multiple versions, while relational databases are not;


8, Data maintenance: Specifically, HBase did not update the operation, because it can maintain a number of versions of the mechanism, his update operation is actually inserting new data, and the version will exist, and RDBMS traditional database modification operation is directly modify the data itself.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.