Pig syntax parsing based on ANTLR and generates a logical execution plan. The logical execution plan basically corresponds to the operation step one by one in the Pig Latin, arranged in dag form.
The following code (reference Pig Latin paper at SIGMOD 2008) is an example of an analysis that includes common operations such as load, filter, join, group, foreach, Count functions, and Stroe.
Pigserver pigserver = new Pigserver (exectype.local);
Pigserver.registerquery ("A = Load ' file1 ' as (x,y,z);");
Pigserver.registerquery ("B = Load ' file2 ' as (t,u,v);");
Pigserver.registerquery ("C = Filter A by y>0;");
Pigserver.registerquery ("D = Join C by x,b by U;");
Pigserver.registerquery ("E = group D by z;");
Pigserver.registerquery ("F = foreach E generate Group,count (D);");
Pigserver.explain ("F", "dot", True, False, System.out, System.out,
system.out);
Pigserver.store ("F", "output");
The resulting logical execution plan is as follows
Parse process
QUERYLEXER.G and QUERYPARSER.G are the lexical files and grammar files used by Pig Latin respectively.
Pig not only uses the anltr-generated lexical analyzer and parser to validate user input legality (astvalidator). Two things were done at the same time (ANTLR details do not unfold here)
1 embed the action in the grammar file, add the Java code, do the further processing to the expression.
(2) using the ANTLR syntax of the abstract syntax tree, the parsing of the user input into the abstract syntax tree, such as
Foreach_plan_complex:left_curly nested_blk right_curly-> ^ (Foreach_plan_complex nested_blk)
The following figure is a visual representation of the syntax rules for a JOIN statement
The parse sequence diagram is as follows (omitting macro expansion, user Register statement substitution, etc., where querylexer,queryparser and Astvalidator are all ANTLR generated classes):