What is the difference between hbase and hive?

Last Update:2014-08-21 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hive was created to simplify mapreduce programming. Anyone who has used mapreduce for data analysis knows that many analysis programs have the same procedure except for different business logic. In this case, you need to use APIs such as hive. Hive itself does not store and compute data. It relies entirely on HDFS and mapreduce. The pure table logic in hive is the definition of some tables, that is, the table metadata. Using SQL to implement hive is because SQL is familiar to everyone and the conversion cost is low. Pig, which works similarly, is not SQL.

Hbase is generated for query. It organizes the memory of all machines in the node and provides a large memory hash table. It needs to organize its own data structure, this includes disks and In-memory, while hive does not. In hbase, a table is a physical table rather than a logical table. Search engines use it to store indexes, to meet the real-time query requirements.

Hive is similar to cloudbase and is a set of software that provides data warehouse SQL functions based on hadoop distributed computing platform. This simplifies ad-hoc queries by aggregating massive data stored in hadoop. Hive provides a set of Ql query languages, which are SQL-based and easy to use.

Hbase is a distributed column-based non-relational database. The query efficiency of hbase is very high, mainly because of the query and display results.

Hive is a Distributed Relational Database. It is mainly used for parallel distributed processing of large amounts of data. All queries in hive except "select * from table;" must be executed in map \ reduce mode. Because map \ reduce is required, it may take 8 to 9 seconds to query a table with only one row and one column, if it is not queried using the select * from table; method. However, hive is better at processing large amounts of data. When there is a lot of data to be processed and the hadoop cluster has enough capacity, it can reflect its advantages.

Hive and hbase can be integrated through the hive storage interface.

Further summary:

1. hive is an SQL language. It operates HDFS file systems through databases. To simplify programming, the underlying computing mode is mapreduce.

2. hive is a row-oriented database.

3. Hive itself does not store and compute data. It relies entirely on the pure table logic in HDFS, mapreduce, and hive.

4. hbase is generated for query. It organizes the memory of all machines in the node and provides a large memory hash table.

5. hbase is not a relational database, but a column-Oriented Distributed Database developed on HDFS. It does not support SQL.

6. hbase is a physical table, not a logical table. It provides a large memory hash table through which search engines store indexes to facilitate query operations.

7. hbase is a column store.

Note:Hive is only for maintenance. It is very slow to check!

This is because the underlying layer is to be computed through mapreduce distributed computing, and the underlying layers of hbase, hive, and pig are all like this. However, hadoop is still relatively fast, because it is used for massive data storage and distributed computing, and this speed is already very good.

Let's look at what hbase is. hbase is very special and has many framework location services.

Hive and hbase have different features: hive is highly-delayed, structured, and analysis-oriented, while hbase is low-latency, unstructured, and programming-oriented. Hive Data Warehouses have high latency on hadoop.

Hbase is located at the structured storage layer, and hadoop HDFS provides hbase with high-reliability underlying storage support. hadoop mapreduce provides hbase with high-performance computing capabilities, zookeeper provides stable services and Failover mechanisms for hbase.

In addition, pig and hive provide high-level language support for hbase, making data statistics processing on hbase very simple. Sqoop provides a convenient RDBMS data import function for hbase, making it very convenient for traditional database data to be migrated to hbase.

Now let's look at the image version Resolution:

I think that before asking for the difference, I should show the same points. I am confused when I think about it. hive and hbase seem to be different from each other. Since hive and hbase are different from each other, what is the difference between them.
So what is hive?
Hive can be considered as a package of Map-reduce. Hive converts a well-written hive SQL into a complex and hard-to-write map-Reduce program.
So what is hbase?
Similarly, hbase can be considered as a package of HDFS. Its essence is data storage, which is a nosql database. hbase is deployed on HDFS and overcomes the disadvantages of HDFS in random read/write.
To ask the difference between hive and hbase, we should ask the difference between map-Reduce and HDFS. To ask the difference, we must first talk about the differences between them.
So where did you say map-Reduce and HDFS look like?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More