Hadoop ecological hive, pig, hbase relationship and difference

Source: Internet
Author: User
Keywords Can Difference Real-time if
Tags access aliyun data data warehouse difference get get you hadoop

Hadoop technology friends will certainly be confused about its system under the parasitic open-source projects confused, and I promise Hive, Pig, http://www.aliyun.com/zixun/aggregation/13713.html "> HBase these open source Technology will engage you some confused, do not confused more than just one, such as a rookie post doubt, when to use Hbase and when to use Hive? .... Consult ^ _ ^ No problem here, I help you manage Clear each technology principles and ideas.

Pig

A lightweight scripting language that had been used by hadoop, originally introduced by Yahoo Inc., but now is on the decline. When Yahoo slowly pulled out of pig maintenance, it contributed to its open source community by all hobbyists. However, some companies are still in use, but I think it is better to use hive than pig. :)

Pig is a data flow language used to quickly and easily manipulate huge amounts of data.

Pig consists of two parts: Pig Interface, Pig Latin.

Pig can handle HDFS and HBase data very easily. Like Hive, Pig can handle its needs very efficiently and can save a lot of labor and time by directly manipulating Pig queries. When you want to do some conversion on your data, and you do not want to write MapReduce jobs you can use Pig.

Hive

Do not want to use the programming language to develop MapReduce friends such as DB who are familiar with SQL friends can use Hive off-line for data processing and analysis.

Note that Hive is now suitable for offline operation of data, that is not suitable for real-time online query or operation in a real production environment, because the word "slow." in contrast

Originated in FaceBook, Hive plays the role of data warehouse in Hadoop. Built on the top level of a Hadoop cluster, it operates on a SQL-like interface to the data stored on the Hadoop cluster. You can use HiveQL select, join, and so on.

If you have a data warehouse needs and you are good at writing SQL and do not want to write MapReduce jobs you can use Hive instead.

HBase

HBase runs as a column-oriented database on top of HDFS, and HDFS lacks immediate read and write operations, which is what HBase does. HBase is Google BigTable-based, stored as key-value pairs. The goal of the project is to quickly locate and access the data needed in billions of rows of data in the host.

HBase is a database, a NoSql database, providing read and write capabilities like any other database, Hadoop can not meet real-time needs, HBase can meet. If you need to access some data in real time, put it in HBase.

You can use Hadoop as a static data warehouse, HBase as data storage, put some data that will change some operations.

Pig VS Hive

Hive is more suitable for data warehouse tasks, Hive is mainly used for static structure and the need to constantly analyze the work. Hive's similarity to SQL makes it an ideal intersection of Hadoop and other BI tools.

Pig gives developers more flexibility in the big data set and allows for the development of concise scripts for transforming data streams for embedding into larger applications.

Pig is relatively lightweight compared to Hive and has the main advantage of drastically reducing the amount of code compared to using Hadoop Java APIs directly. Because of this, Pig is still attracting a large number of software developers.

Both Hive and Pig can be used in combination with HBase, and Hive and Pig also provide HBase with high-level language support that makes it easy to count data on HBase

Hive VS HBase

Hive is a batch system built on top of Hadoop to reduce the amount of work that MapReduce jobs is written in. HBase is designed to support projects that make up for the shortcomings of Hadoop's real-time operations.

Imagine you are operating the RMDB database, Hive + Hadoop if it is a full table scan, HBase + Hadoop if it is indexed access.

Hive query is MapReduce jobs can be more than 5 minutes to several hours, HBase is very efficient, certainly more efficient than Hive.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.