Why does HBase need to build an SQL engine layer?
The existing SQL solutions are generally not horizontally scalable, so when the data volume changes, it will encounter obstacles. However, with the emergence of NoSQL, it has been greatly mitigated, and with the improvement and maturity of NoSQL technology, this situation will be fundamentally solved.
One thing we know that NoSQL is different from relational databases is that NoSQL does not use SQL as the query language. As to why SQL interfaces are provided on NoSQL data storage HBase, there are the following reasons:
1. Using easy-to-understand languages such as SQL makes it easier for people to use HBase.
2. Use a higher-level language such as SQL to write, reducing the amount of code written.
3. When performing a query, you can perform a lot of optimization by adding an abstraction such as SQL between data access and running execution. For example, for group by queries, aggregation can be performed on the server using the HBase coprocessor, rather than on the client, which greatly reduces the amount of data transmitted between the client and the server. In addition, you can run group by in parallel on the client, which is achieved BY truncating the scan according to the healthy range. Through parallel execution, the results will be returned faster. All these optimizations require no user involvement. You only need to perform the query.
HBase-based SQL engine implementation
At this stage, there are some attempts in the industry on the HBase SQL engine layer, and there are already some stable solutions and practices.
1. Integrate Hive into HBase
The integration of Hive and HBase has started to appear in Hive0.6.0. The two APIs communicate with each other, and the communication mainly relies on Hive Storage Handlers ). Because HBase has a major version change, not every version of Hive can be integrated with the existing HBase version. Therefore, we should pay special attention to the consistency of the two versions during use.
2. Phoenix
Phoenix is open-source by Salesforce.com and is an SQL middle layer built on Apache HBase. It allows developers to execute SQL queries on HBase. Phoenix is fully written in Java. The code is located on Github and a JDBC driver that can be embedded on the client is provided. For simple queries with 10 to rows, Phoenix is better than Hive.
3. Kundera
Kundera is a JPA2.0 compatible NoSQL data storage object ing framework. Kundera is built based on existing class libraries and encapsulates simple APIs. Its main features include:
1) supports cross-data storage persistence, which means that users can use a single method to store and obtain relevant entities in different data storage.
2) manages transactions well and supports EntityTransaction and Java Transaction API (JPA ).
3) compatible with JPA2.0, use JPA annotation objects to map to data storage tables.
4) currently supported NoSQL servers include HBase, MongoDB, Redis, and Neo4j.
There are other solutions, such as Lealone, hbase-SQL, and Impala, which are either immature, stopped, or limited. If you are interested in it, you can understand it on your own.
Hadoop + HBase cloud storage creation summary PDF
Regionserver startup failed due to inconsistent time between HBase nodes
Hadoop + ZooKeeper + HBase cluster configuration
Hadoop cluster Installation & HBase lab environment setup
HBase cluster configuration based on Hadoop cluster'
Hadoop installation and deployment notes-HBase full distribution mode installation
Detailed tutorial on creating HBase environment for standalone Edition
HBase details: click here
HBase: click here
This article permanently updates the link address: