Hive HBase Differences

Source: Internet
Author: User
Tags hadoop mapreduce

Hive was born to simplify the writing of the MapReduce program, and the people who did the data analysis with MapReduce knew that many of the analysis programs were essentially the same, except for the different business logic. In this case, a user programming interface such as Hive is required. Hive itself does not store and calculate data, it relies entirely on the table's pure logic in HDFs and mapreduce,hive, the definition of tables, and the metadata of the tables. Using SQL to implement hive is because SQL is familiar with the low conversion cost, and the similar role of Pig is not SQL.


hbase for querying, it provides an oversized memory hash table by organizing the memory of all the machines in the node, it needs to organize its own data structure, including disks and memory, and hive does not do this, the table in HBase is a physical table, Instead of a logical table, the search engine uses it to store the index to meet the real-time requirements of the query.

Hive is similar to Cloudbase and is a set of software that provides the SQL capabilities of data Warehouse on a Hadoop distributed computing platform. The summary of the massive amounts of data stored in Hadoop simplifies the ad hoc query. Hive provides a set of QL query languages, which are based on SQL and are easy to use.

hbase is a distributed, Columnstore-based, non-relational database. HBase queries are highly efficient, mainly because of query and display results.

hive is a distributed relational database. It is mainly used for parallel distributed processing of large amounts of data. All queries in hive except for "SELECT * from table;" Need to be carried out in a map\reduce way. Because of the map\reduce, even if a table with only 1 rows and 1 columns is not queried by the select * FROM table, it may also take 8, 9 seconds. But hive is better at processing large amounts of data. When there is a lot of data to be processed, and the Hadoop cluster is large enough, it can be an advantage.

Hive and HBase can be integrated using hive's storage interface.


1, hive is the SQL language, the way to operate the HDFs file system through the database, in order to simplify programming, the underlying calculation is mapreduce.

2, Hive is a row-oriented storage database.

3. Hive itself does not store and calculate data, it relies entirely on table-pure logic in HDFs and mapreduce,hive.

4, HBase for the query, it through the organization of all the machines inside the node, to provide a large deposit hash table

5, HBase is not a relational database, but a column-oriented distributed database developed on HDFS and does not support SQL.

6, HBase is a physical table, not a logical table, provides a large memory hash table, the search engine through it to store indexes, convenient query operation.

7, HBase is a column store.


Hive is for maintenance only and really very slow to look up!

This is because the bottom layer is distributed by MapReduce, which is the case for hbase, Hive, and pig. But Hadoop is a lot faster in general, because it's massive data storage and distributed computing, and that's a pretty good speed.


Hive and HBase have different characteristics: hive is high latency, structured, and analysis-oriented, and hbase is low latency, unstructured, and programming-oriented. The Hive Data Warehouse is high latency on Hadoop.


Where HBase is located in a structured storage tier, Hadoop HDFS provides high-reliability, low-level storage support for HBase, and Hadoop MapReduce provides high-performance computing power for HBase. Zookeeper provides a stable service and failover mechanism for hbase.

In addition, pig and hive provide high-level language support for HBase, making data statistics processing on hbase very simple. Sqoop provides a convenient RDBMS data import function for HBase, which makes it very convenient to migrate traditional database data to hbase.

Hive HBase Differences

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.