Why does HBase need to build an SQL engine layer?

Source: Internet
Author: User

Why does HBase need to build an SQL engine layer?

The existing SQL solutions are generally not horizontally scalable, so when the data volume changes, it will encounter obstacles. However, with the emergence of NoSQL, it has been greatly mitigated, and with the improvement and maturity of NoSQL technology, this situation will be fundamentally solved.

One thing we know that NoSQL is different from relational databases is that NoSQL does not use SQL as the query language. As to why SQL interfaces are provided on NoSQL data storage HBase, there are the following reasons:

1. Using easy-to-understand languages such as SQL makes it easier for people to use HBase.

2. Use a higher-level language such as SQL to write, reducing the amount of code written.

3. When performing a query, you can perform a lot of optimization by adding an abstraction such as SQL between data access and running execution. For example, for group by queries, aggregation can be performed on the server using the HBase coprocessor, rather than on the client, which greatly reduces the amount of data transmitted between the client and the server. In addition, you can run group by in parallel on the client, which is achieved BY truncating the scan according to the healthy range. Through parallel execution, the results will be returned faster. All these optimizations require no user involvement. You only need to perform the query.

HBase-based SQL engine implementation

At this stage, there are some attempts in the industry on the HBase SQL engine layer, and there are already some stable solutions and practices.

1. Integrate Hive into HBase

The integration of Hive and HBase has started to appear in Hive0.6.0. The two APIs communicate with each other, and the communication mainly relies on Hive Storage Handlers ). Because HBase has a major version change, not every version of Hive can be integrated with the existing HBase version. Therefore, we should pay special attention to the consistency of the two versions during use.

2. Phoenix

Phoenix is open-source by Salesforce.com and is an SQL middle layer built on Apache HBase. It allows developers to execute SQL queries on HBase. Phoenix is fully written in Java. The code is located on Github and a JDBC driver that can be embedded on the client is provided. For simple queries with 10 to rows, Phoenix is better than Hive.

3. Kundera

Kundera is a JPA2.0 compatible NoSQL data storage object ing framework. Kundera is built based on existing class libraries and encapsulates simple APIs. Its main features include:

1) supports cross-data storage persistence, which means that users can use a single method to store and obtain relevant entities in different data storage.

2) manages transactions well and supports EntityTransaction and Java Transaction API (JPA ).

3) compatible with JPA2.0, use JPA annotation objects to map to data storage tables.

4) currently supported NoSQL servers include HBase, MongoDB, Redis, and Neo4j.

There are other solutions, such as Lealone, hbase-SQL, and Impala, which are either immature, stopped, or limited. If you are interested in it, you can understand it on your own.

Hadoop + HBase cloud storage creation summary PDF

Regionserver startup failed due to inconsistent time between HBase nodes

Hadoop + ZooKeeper + HBase cluster configuration

Hadoop cluster Installation & HBase lab environment setup

HBase cluster configuration based on Hadoop cluster'

Hadoop installation and deployment notes-HBase full distribution mode installation

Detailed tutorial on creating HBase environment for standalone Edition

HBase details: click here
HBase: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.