This series of articles analyzes the Pig running the mainline process to explore the possibility of (class) Pig Latin on Spark, using Pig Latin on Hadoop.
Pig Overview
The Apache Pig is Yahoo! in order for researchers and engineers to be able to handle, analyze, and excavate large numbers more easily. From the point of view of data access, yarn can be regarded as an operating system with large data, then pig is an integral part of various types of data applications.
Although pig's learning cost is higher than Hive's, the advantage of pig is its ability to express and flexibility. If the user uses declarative hive HQL to express only what data they want, then the user uses the process-pig lation, through a series of statement combinations, to fully control the entire process of data analysis.
Pig overall process
noun explanation
Noun |
Explain |
Note |
Pig Latin |
Pig Data Flow Processing language |
|
Loader/store |
Pig for loading and storing data |
|
Schema |
Data format specified when data is loaded |
Pig data types are classified into scalar and complex types, scalar basic and Java basic data types, complex types including tuple (tuples), maps, and bag (unordered sets of tuples) |
Relation |
Data collection for pig operations |
A set of tuples, or a bag (more precisely, external bag, because there are nested internal bag) |
Logical Plan |
Logical Execution Plan |
|
Physical plan |
Physical execution Plan |
|
Optimizer |
Optimizer |
rule-based Logic Optimizer |
Dag |
Directed acyclic graph with direction-free graph |
|
See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/database/storage/