Talking about the difference between hive and HBase

Source: Internet
Author: User

Source:

Http://www.cnblogs.com/justinzhang/p/4273470.html

Https://www.xplenty.com/blog/2014/05/hive-vs-hbase/?%20utm_source=dzone&utm_medium=link&utm_term=upd &utm_campaign=may%202014%20blog%205.22

1. What are the two differences?

Apache Hive is a data warehouse built on top of the Hadoop infrastructure. Hive allows you to query the data stored on HDFS using the HQL language. HQL is a class of SQL language that eventually translates to map/reduce. Although hive provides SQL query functionality, hive is not able to query interactively-because it can only execute Hadoop in batches on Haoop.

Apache HBase is a key/value system that runs on top of HDFs. Unlike Hive, HBase can run in real time on its database, rather than running a mapreduce task. Hive is partitioned into tables, and tables are further segmented into columns. A column cluster must use the schema definition, and the column family will assemble a type column (the column does not require a schema definition). For example, the "Message" Column cluster may contain: "To", "from" "Date", "Subject", and "body". Each key/value pair is defined in HBase as a cell, each key consists of Row-key, a column cluster, a column, and a timestamp. In HBase, a row is a collection of key/value mappings that are uniquely identified by Row-key. HBase leverages the infrastructure of Hadoop to scale horizontally with common devices.

2. characteristics of both

Hive helps people who are familiar with SQL to run the MapReduce task. Because it is JDBC-compatible, it can also be integrated with existing SQL tools. Running a hive query can take a long time because it iterates through all the data in the table by default. Despite this shortcoming, the amount of data that is traversed at once can be controlled by the partitioning mechanism of hive. Partitioning allows filtering queries to be run on datasets that are stored in different folders and that only traverse the data in the specified folder (partition) when queried. This mechanism can be used, for example, to only process files within a certain time frame, as long as these file names include the time format.

HBase works by storing key/value. It supports four main operations: Add or update rows, view a cell within a range, get a specified row, delete a specified row, column, or version of a column. Version information is used to obtain historical data (the history of each row can be deleted, and then the space can be freed through HBase compactions). Although HBase includes tables, the schema is only required by tables and columns, and the columns do not need schema. HBase tables include the Add/Count feature.

3. Restrictions

Hive does not currently support update operations. Also, because Hive runs a bulk operation on Hadoop, it takes a long time, usually minutes to hours to get the results of the query. Hive must provide a pre-defined schema to map files and directories to columns, and hive is incompatible with acid.

HBase queries are written in a particular language that needs to be re-learned. The functionality of class SQL can be implemented through Apache Phonenix, but this is at the cost of having to provide the schema. In addition, HBase is not compatible with all ACID properties, although it supports certain features. Last but not least-in order to run HBase,zookeeper is a must , and zookeeper is a service for distributed coordination that includes configuration services, maintenance meta-information, and namespace services.

4. Application Scenarios

Hive is ideal for analyzing queries for data over a period of time, such as the logs used to calculate trends or websites. Hive should not be used for real-time queries. Because it takes a long time to return the results.

HBase is ideal for real-time queries for big data. Facebook uses HBase for messages and real-time analytics. It can also be used to count the number of Facebook connections.

5. Summary

Hive and HBase are two different technologies based on Hadoop--hive is a kind of SQL-like engine and runs the MapReduce task, HBase is a nosql key/vale database on top of Hadoop. Of course, both of these tools can be used at the same time . Like using Google to search and socialize with Facebook, hive can be used for statistical queries , HBase can be used for real-time queries, and data can be written from hive to HBase, set up to be written back to hive from HBase.

Talking about the difference between hive and HBase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.