Talking about the difference between hive and HBase

Last Update:2016-07-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source:

Http://www.cnblogs.com/justinzhang/p/4273470.html

Https://www.xplenty.com/blog/2014/05/hive-vs-hbase/?%20utm_source=dzone&utm_medium=link&utm_term=upd &utm_campaign=may%202014%20blog%205.22

1. What are the two differences?

Apache Hive is a data warehouse built on top of the Hadoop infrastructure. Hive allows you to query the data stored on HDFS using the HQL language. HQL is a class of SQL language that eventually translates to map/reduce. Although hive provides SQL query functionality, hive is not able to query interactively-because it can only execute Hadoop in batches on Haoop.

Apache HBase is a key/value system that runs on top of HDFs. Unlike Hive, HBase can run in real time on its database, rather than running a mapreduce task. Hive is partitioned into tables, and tables are further segmented into columns. A column cluster must use the schema definition, and the column family will assemble a type column (the column does not require a schema definition). For example, the "Message" Column cluster may contain: "To", "from" "Date", "Subject", and "body". Each key/value pair is defined in HBase as a cell, each key consists of Row-key, a column cluster, a column, and a timestamp. In HBase, a row is a collection of key/value mappings that are uniquely identified by Row-key. HBase leverages the infrastructure of Hadoop to scale horizontally with common devices.

2. characteristics of both

Hive helps people who are familiar with SQL to run the MapReduce task. Because it is JDBC-compatible, it can also be integrated with existing SQL tools. Running a hive query can take a long time because it iterates through all the data in the table by default. Despite this shortcoming, the amount of data that is traversed at once can be controlled by the partitioning mechanism of hive. Partitioning allows filtering queries to be run on datasets that are stored in different folders and that only traverse the data in the specified folder (partition) when queried. This mechanism can be used, for example, to only process files within a certain time frame, as long as these file names include the time format.

HBase works by storing key/value. It supports four main operations: Add or update rows, view a cell within a range, get a specified row, delete a specified row, column, or version of a column. Version information is used to obtain historical data (the history of each row can be deleted, and then the space can be freed through HBase compactions). Although HBase includes tables, the schema is only required by tables and columns, and the columns do not need schema. HBase tables include the Add/Count feature.

3. Restrictions

Hive does not currently support update operations. Also, because Hive runs a bulk operation on Hadoop, it takes a long time, usually minutes to hours to get the results of the query. Hive must provide a pre-defined schema to map files and directories to columns, and hive is incompatible with acid.

HBase queries are written in a particular language that needs to be re-learned. The functionality of class SQL can be implemented through Apache Phonenix, but this is at the cost of having to provide the schema. In addition, HBase is not compatible with all ACID properties, although it supports certain features. Last but not least-in order to run HBase,zookeeper is a must , and zookeeper is a service for distributed coordination that includes configuration services, maintenance meta-information, and namespace services.

4. Application Scenarios

Hive is ideal for analyzing queries for data over a period of time, such as the logs used to calculate trends or websites. Hive should not be used for real-time queries. Because it takes a long time to return the results.

HBase is ideal for real-time queries for big data. Facebook uses HBase for messages and real-time analytics. It can also be used to count the number of Facebook connections.

5. Summary

Hive and HBase are two different technologies based on Hadoop--hive is a kind of SQL-like engine and runs the MapReduce task, HBase is a nosql key/vale database on top of Hadoop. Of course, both of these tools can be used at the same time . Like using Google to search and socialize with Facebook, hive can be used for statistical queries , HBase can be used for real-time queries, and data can be written from hive to HBase, set up to be written back to hive from HBase.

Talking about the difference between hive and HBase

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Talking about the difference between hive and HBase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Talking about the difference between hive and HBase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support