Hadoop technology friends will certainly be confused about its system under the parasitic open-source projects confused, and I promise Hive, Pig, http://www.aliyun.com/zixun/aggregation/13713.html "> HBase these open source Technology will engage you some confused, do not confused more than just one, such as a rookie post doubt, when to use Hbase and when to use Hive? .... Consult ^ _ ^ No problem here, I help you manage Clear each technology principles and ideas.
A lightweight scripting language that had been used by hadoop, originally introduced by Yahoo Inc., but now is on the decline. When Yahoo slowly pulled out of pig maintenance, it contributed to its open source community by all hobbyists. However, some companies are still in use, but I think it is better to use hive than pig. :)
Is a data flow language used to quickly and easily handle huge amounts of data.
Contains two parts: Pig Interface, Pig Latin.
HFS can handle HDFS and HBase data very conveniently. Just like Hive, Pig can handle it very efficiently and can save a lot of labor and time by directly manipulating Pig query. When you want to do some conversion on your data, and do not want to write MapReduce jobs can be used
Do not want to use the programming language to develop MapReduce friends such as DB who are familiar with SQL friends can use Hive off-line for data processing and analysis.
Note that Hive is now suitable for offline operation of data, that is not suitable for real-time online query or operation in a real production environment, because the word "slow." in contrast
Originated in FaceBook, Hive plays the role of data warehouse in Hadoop. Built on the top level of a Hadoop cluster, it operates on a SQL-like interface to the data stored on the Hadoop cluster. You can use HiveQL select, join, and so on.
If you have a data warehouse needs and you are good at writing SQL and do not want to write MapReduce jobs you can use Hive instead.
As a column-oriented database running on top of HDFS, HDFS lacks immediate read and write operations, which is what HBase does. HBase is Google BigTable-based, stored as key-value pairs. The goal of the project is to quickly locate and access the data needed in billions of rows of data in the host.
Is a database, a NoSql database, like any other database to provide instant read and write capabilities, Hadoop can not meet real-time needs, HBase is to meet. If you need to access some data in real time, put it in HBase.
You can use Hadoop as a static data warehouse, HBase as data storage, put some data that will change some operations.
More suitable for the data warehouse tasks, Hive is mainly used for static structure and the need for frequent analysis of the work. Hive's similarity to SQL makes it an ideal intersection of Hadoop and other BI tools.
Give developers more flexibility in the area of big data and allow the development of concise scripts for transforming data streams for embedding into larger applications.
The main advantage over Hive is its relative strength in terms of significantly reducing the amount of code compared to using Hadoop Java APIs directly. Because of this, Pig is still attracting a large number of software developers.
And Pig can be used in combination with HBase, Hive and Pig also provide high-level language support for HBase, making statistics on HBase become very simple
Is a batch system built on top of Hadoop to reduce the amount of work that MapReduce jobs is written in. HBase is designed to support projects that make up for the shortcomings of Hadoop's real-time operations.
Imagine you are operating the RMDB database, Hive + Hadoop if it is a full table scan, HBase + Hadoop if it is indexed access.
It is MapReduce jobs can be more than 5 minutes to several hours, HBase is very efficient, certainly more efficient than Hive.