Hadoop ecological relationship between several technologies and the difference: hive, pig, hbase relations and differences

Source: Internet
Author: User
Keywords Can Difference Real-time if
Tags access aliyun data data warehouse difference differences get get you

Hadoop technology friends will certainly be confused about its system under the parasitic open-source projects confused, and I promise Hive, Pig, http://www.aliyun.com/zixun/aggregation/13713.html "> HBase these open source Technology will engage you some confused, do not confused more than just one, such as a rookie post doubt, when to use Hbase and when to use Hive? .... Consult ^ _ ^ No problem here, I help you manage Clear each technology principles and ideas.

A lightweight scripting language that had been used by hadoop, originally introduced by Yahoo Inc., but now is on the decline. When Yahoo slowly pulled out of pig maintenance, it contributed to its open source community by all hobbyists. However, some companies are still in use, but I think it is better to use hive than pig. :)

Is a data flow language used to quickly and easily handle huge amounts of data.

Contains two parts: Pig Interface, Pig Latin.

HFS can handle HDFS and HBase data very conveniently. Just like Hive, Pig can handle it very efficiently and can save a lot of labor and time by directly manipulating Pig query. When you want to do some conversion on your data, and do not want to write MapReduce jobs can be used

Do not want to use the programming language to develop MapReduce friends such as DB who are familiar with SQL friends can use Hive off-line for data processing and analysis.

Note that Hive is now suitable for offline operation of data, that is not suitable for real-time online query or operation in a real production environment, because the word "slow." in contrast

Originated in FaceBook, Hive plays the role of data warehouse in Hadoop. Built on the top level of a Hadoop cluster, it operates on a SQL-like interface to the data stored on the Hadoop cluster. You can use HiveQL select, join, and so on.

If you have a data warehouse needs and you are good at writing SQL and do not want to write MapReduce jobs you can use Hive instead.

As a column-oriented database running on top of HDFS, HDFS lacks immediate read and write operations, which is what HBase does. HBase is Google BigTable-based, stored as key-value pairs. The goal of the project is to quickly locate and access the data needed in billions of rows of data in the host.

Is a database, a NoSql database, like any other database to provide instant read and write capabilities, Hadoop can not meet real-time needs, HBase is to meet. If you need to access some data in real time, put it in HBase.

You can use Hadoop as a static data warehouse, HBase as data storage, put some data that will change some operations.

More suitable for the data warehouse tasks, Hive is mainly used for static structure and the need for frequent analysis of the work. Hive's similarity to SQL makes it an ideal intersection of Hadoop and other BI tools.

Give developers more flexibility in the area of ​​big data and allow the development of concise scripts for transforming data streams for embedding into larger applications.

The main advantage over Hive is its relative strength in terms of significantly reducing the amount of code compared to using Hadoop Java APIs directly. Because of this, Pig is still attracting a large number of software developers.

And Pig can be used in combination with HBase, Hive and Pig also provide high-level language support for HBase, making statistics on HBase become very simple

Is a batch system built on top of Hadoop to reduce the amount of work that MapReduce jobs is written in. HBase is designed to support projects that make up for the shortcomings of Hadoop's real-time operations.

Imagine you are operating the RMDB database, Hive + Hadoop if it is a full table scan, HBase + Hadoop if it is indexed access.

It is MapReduce jobs can be more than 5 minutes to several hours, HBase is very efficient, certainly more efficient than Hive.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.