International - English

Cart Console

Topic Center

Contact Sales

Home > Developer Tools > Technical Articles

Hadoop ecological relationship between several technologies and the difference: hive, pig, hbase relations and differences

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Can Difference Real-time if

Tags access aliyun data data warehouse difference differences get get you

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop technology friends will certainly be confused about its system under the parasitic open-source projects confused, and I promise Hive, Pig, http://www.aliyun.com/zixun/aggregation/13713.html "> HBase these open source Technology will engage you some confused, do not confused more than just one, such as a rookie post doubt, when to use Hbase and when to use Hive? .... Consult ^ _ ^ No problem here, I help you manage Clear each technology principles and ideas.

A lightweight scripting language that had been used by hadoop, originally introduced by Yahoo Inc., but now is on the decline. When Yahoo slowly pulled out of pig maintenance, it contributed to its open source community by all hobbyists. However, some companies are still in use, but I think it is better to use hive than pig. :)

Is a data flow language used to quickly and easily handle huge amounts of data.

Contains two parts: Pig Interface, Pig Latin.

HFS can handle HDFS and HBase data very conveniently. Just like Hive, Pig can handle it very efficiently and can save a lot of labor and time by directly manipulating Pig query. When you want to do some conversion on your data, and do not want to write MapReduce jobs can be used

Do not want to use the programming language to develop MapReduce friends such as DB who are familiar with SQL friends can use Hive off-line for data processing and analysis.

Note that Hive is now suitable for offline operation of data, that is not suitable for real-time online query or operation in a real production environment, because the word "slow." in contrast

Originated in FaceBook, Hive plays the role of data warehouse in Hadoop. Built on the top level of a Hadoop cluster, it operates on a SQL-like interface to the data stored on the Hadoop cluster. You can use HiveQL select, join, and so on.

If you have a data warehouse needs and you are good at writing SQL and do not want to write MapReduce jobs you can use Hive instead.

As a column-oriented database running on top of HDFS, HDFS lacks immediate read and write operations, which is what HBase does. HBase is Google BigTable-based, stored as key-value pairs. The goal of the project is to quickly locate and access the data needed in billions of rows of data in the host.

Is a database, a NoSql database, like any other database to provide instant read and write capabilities, Hadoop can not meet real-time needs, HBase is to meet. If you need to access some data in real time, put it in HBase.

You can use Hadoop as a static data warehouse, HBase as data storage, put some data that will change some operations.

More suitable for the data warehouse tasks, Hive is mainly used for static structure and the need for frequent analysis of the work. Hive's similarity to SQL makes it an ideal intersection of Hadoop and other BI tools.

Give developers more flexibility in the area of big data and allow the development of concise scripts for transforming data streams for embedding into larger applications.

The main advantage over Hive is its relative strength in terms of significantly reducing the amount of code compared to using Hadoop Java APIs directly. Because of this, Pig is still attracting a large number of software developers.

And Pig can be used in combination with HBase, Hive and Pig also provide high-level language support for HBase, making statistics on HBase become very simple

Is a batch system built on top of Hadoop to reduce the amount of work that MapReduce jobs is written in. HBase is designed to support projects that make up for the shortcomings of Hadoop's real-time operations.

Imagine you are operating the RMDB database, Hive + Hadoop if it is a full table scan, HBase + Hadoop if it is indexed access.

It is MapReduce jobs can be more than 5 minutes to several hours, HBase is very efficient, certainly more efficient than Hive.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

difference between hadoop and aws hadoop and hive tutorial difference between hadoop and data warehouse difference between big data analytics and hadoop differences between python and java the difference between android and smartphone the difference between function and relation

What is SFTP Commands Linux_the Introduction 01-20

How to Configure CentOS 7.4 SFTP Server 01-19

Build an SFTP Server Using CentOS Built-in SSH Service 01-17

Configure Linux SFTP and Configure User Access 01-16

How to Easily Configure SFTP Server Linux In 6 Steps 01-15

Automatic Upload and Download of SFTP Files_Shell Script 01-14

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

html add blank space register business logo register ssl certificate full site sign in sign up node js build cloud register register a subdomain in python network management system tutorial how to learn computer science by myself

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop ecological relationship between several technologies and the difference: hive, pig, hbase relations and differences

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support