Is it a win or a flat? Pig vs Hive!!!

Last Update:2015-06-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Translated from: Http://www.aptibook.com/Articles/Pig-and-hive-advantages-disadvantages-features

This article discusses the characteristics of pig and hive.
Developers are typically in a technology system that chooses to meet their business needs. In the Hadoop system, pig and hive are similar and can give almost the same results, but is that technology better suited for a particular business scenario? Here is a list of some comparisons between pig and hive.

PIG and Hive:
Stream type:
Pig is a process-type data Flow language. Programming languages are usually written in a step-by-step manner, and you can control and optimize each step.
Hive is more like SQL, so it's a declarative language, and you want to specify what you need to do instead of how to do it. Hive relies on its own optimizer, so optimization in hive is difficult.
Ease of Use:
Pig has a new and different syntax and takes extra time to learn.
Hive is more like SQL, and developers are more excited to use hive.
General Scenario:
Recommended program developers use Pig. The main reason is that it is computationally efficient, and when your query has a large number of joins and filter, pig is more appropriate.
Hive is more used for analysis, and it follows some rules for Hadoop and Datawarehouse. It is generally more inclined to use hive to generate reports. If your query joins and filters are less likely to continue to use hive, the performance of a query with many join,hive may be degraded.
Data type:
Pig can efficiently handle both structured and unstructured data.
Hive is able to efficiently process structured data.
Interlayer:
Pig uses variables to represent data, and when you want to store intermediate results, you can easily save them with variables and reference them later.
Hive uses tables to represent data, it is difficult to store intermediate results, and you need to create a table and insert it from other tables. Therefore, when rendering a complex query, you may need hundreds of lines of code.
Debug mode:
Pig can be debugged using local mode.
Hive debugging with original mode is complex and time consuming.
Scalability:
The UDF in pig is easy.
UDF in hive is relatively troublesome.
Serviceability:
Pig is almost the same as hive.
Hive is relatively simple.
Durability:
The value of the variable may not be preserved in pig, and each time you need to re-execute the pig code to retrieve the variable value.
The external table still exists in hive even though exiting the current session because the external table still points to the HDFs file.
Development time:
Pig development takes more time and relies more on the familiarity of pig.
SQL statements with little development time.
Compatibility:
RDBMS is a little bit more compatible with pig because the pig syntax is completely different.
Most of the SQL in the RDBMS can be executed in hive, and only a few need to be modified.
Data Volume:
Pig handles big data very efficiently.
Hive sometimes has memory leaks and unreliable performance, however, there are some parameters that can be adjusted and positioned for problems.
Giants support:
Pig:yahoo,twitter, LinkedIn
Hive:facebook

Is it a win or a flat? Pig vs Hive!!!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Is it a win or a flat? Pig vs Hive!!!

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support