Hive HBase Differences

Last Update:2015-03-18 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hive was born to simplify the writing of the MapReduce program, and the people who did the data analysis with MapReduce knew that many of the analysis programs were essentially the same, except for the different business logic. In this case, a user programming interface such as Hive is required. Hive itself does not store and calculate data, it relies entirely on the table's pure logic in HDFs and mapreduce,hive, the definition of tables, and the metadata of the tables. Using SQL to implement hive is because SQL is familiar with the low conversion cost, and the similar role of Pig is not SQL.

hbase for querying, it provides an oversized memory hash table by organizing the memory of all the machines in the node, it needs to organize its own data structure, including disks and memory, and hive does not do this, the table in HBase is a physical table, Instead of a logical table, the search engine uses it to store the index to meet the real-time requirements of the query.

Hive is similar to Cloudbase and is a set of software that provides the SQL capabilities of data Warehouse on a Hadoop distributed computing platform. The summary of the massive amounts of data stored in Hadoop simplifies the ad hoc query. Hive provides a set of QL query languages, which are based on SQL and are easy to use.

hbase is a distributed, Columnstore-based, non-relational database. HBase queries are highly efficient, mainly because of query and display results.

hive is a distributed relational database. It is mainly used for parallel distributed processing of large amounts of data. All queries in hive except for "SELECT * from table;" Need to be carried out in a map\reduce way. Because of the map\reduce, even if a table with only 1 rows and 1 columns is not queried by the select * FROM table, it may also take 8, 9 seconds. But hive is better at processing large amounts of data. When there is a lot of data to be processed, and the Hadoop cluster is large enough, it can be an advantage.

Hive and HBase can be integrated using hive's storage interface.

1, hive is the SQL language, the way to operate the HDFs file system through the database, in order to simplify programming, the underlying calculation is mapreduce.

2, Hive is a row-oriented storage database.

3. Hive itself does not store and calculate data, it relies entirely on table-pure logic in HDFs and mapreduce,hive.

4, HBase for the query, it through the organization of all the machines inside the node, to provide a large deposit hash table

5, HBase is not a relational database, but a column-oriented distributed database developed on HDFS and does not support SQL.

6, HBase is a physical table, not a logical table, provides a large memory hash table, the search engine through it to store indexes, convenient query operation.

7, HBase is a column store.

Hive is for maintenance only and really very slow to look up!

This is because the bottom layer is distributed by MapReduce, which is the case for hbase, Hive, and pig. But Hadoop is a lot faster in general, because it's massive data storage and distributed computing, and that's a pretty good speed.

Hive and HBase have different characteristics: hive is high latency, structured, and analysis-oriented, and hbase is low latency, unstructured, and programming-oriented. The Hive Data Warehouse is high latency on Hadoop.

Where HBase is located in a structured storage tier, Hadoop HDFS provides high-reliability, low-level storage support for HBase, and Hadoop MapReduce provides high-performance computing power for HBase. Zookeeper provides a stable service and failover mechanism for hbase.

In addition, pig and hive provide high-level language support for HBase, making data statistics processing on hbase very simple. Sqoop provides a convenient RDBMS data import function for HBase, which makes it very convenient to migrate traditional database data to hbase.

Hive HBase Differences

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More