Translated from: Http://www.aptibook.com/Articles/Pig-and-hive-advantages-disadvantages-features
This article discusses the characteristics of pig and hive.
Developers are typically in a technology system that chooses to meet their business needs. In the Hadoop system, pig and hive are similar and can give almost the same results, but is that technology better suited for a particular business scenario? Here is a list of some comparisons between pig and hive.
PIG and Hive:
Stream type:
Pig is a process-type data Flow language. Programming languages are usually written in a step-by-step manner, and you can control and optimize each step.
Hive is more like SQL, so it's a declarative language, and you want to specify what you need to do instead of how to do it. Hive relies on its own optimizer, so optimization in hive is difficult.
Ease of Use:
Pig has a new and different syntax and takes extra time to learn.
Hive is more like SQL, and developers are more excited to use hive.
General Scenario:
Recommended program developers use Pig. The main reason is that it is computationally efficient, and when your query has a large number of joins and filter, pig is more appropriate.
Hive is more used for analysis, and it follows some rules for Hadoop and Datawarehouse. It is generally more inclined to use hive to generate reports. If your query joins and filters are less likely to continue to use hive, the performance of a query with many join,hive may be degraded.
Data type:
Pig can efficiently handle both structured and unstructured data.
Hive is able to efficiently process structured data.
Interlayer:
Pig uses variables to represent data, and when you want to store intermediate results, you can easily save them with variables and reference them later.
Hive uses tables to represent data, it is difficult to store intermediate results, and you need to create a table and insert it from other tables. Therefore, when rendering a complex query, you may need hundreds of lines of code.
Debug mode:
Pig can be debugged using local mode.
Hive debugging with original mode is complex and time consuming.
Scalability:
The UDF in pig is easy.
UDF in hive is relatively troublesome.
Serviceability:
Pig is almost the same as hive.
Hive is relatively simple.
Durability:
The value of the variable may not be preserved in pig, and each time you need to re-execute the pig code to retrieve the variable value.
The external table still exists in hive even though exiting the current session because the external table still points to the HDFs file.
Development time:
Pig development takes more time and relies more on the familiarity of pig.
SQL statements with little development time.
Compatibility:
RDBMS is a little bit more compatible with pig because the pig syntax is completely different.
Most of the SQL in the RDBMS can be executed in hive, and only a few need to be modified.
Data Volume:
Pig handles big data very efficiently.
Hive sometimes has memory leaks and unreliable performance, however, there are some parameters that can be adjusted and positioned for problems.
Giants support:
Pig:yahoo,twitter, LinkedIn
Hive:facebook
Is it a win or a flat? Pig vs Hive!!!