Pig 0.9 uses Python as an embedded support voice, using the Jython interpreter to take advantage of the python2.5 function, the topmost layer of this interface is Org.apache.pig.scripting.Pig
First the Python script compiles a pig Latin script, then passes the variable defined in Python to it, and finally executes it.
1) Pig.compile or compilefromfile to pre-compile the code
2) The Bind method binds a variable in the control flow to a variable in the Pig Latin script and returns a Boundscript object
3) for the Boundscript object, you can call the Runsingle method to execute him, return a Pigstat object, if the Pig object is bound to a set of map containing parameters during the binding process, call the Run method, and return a Pigstats object as well.
A separate instance of a user-written UDF is built and runs in each map or reduce task, and the constructor parameter is a way of passing information to the user UDF.
Python corresponds to the type of pig
int number
Long number
Float number
Double number
Chararray string
ByteArray string
Map Dictionary
Tuple tuple
Bag List Oftuples
Pig's load function is created from Hadoop-based InputFormat, and the base class is Loadfunc,loadfunc's default implementation is for HDFs, and Pig provides the Preparetoread method for loading functions that provide a way to initialize themselves. Once the user's load function implements the GetSchema method, the LOAD statement no longer needs to define their schema.
Similarly, storage functions are built on Hadoop-based Outoutformat. A tuple of pig is accepted, and then the base class is the Preparetowrite method that Storefunc,pig calls the stored function in each map or reduce task, based on the output of a good thing, and then writes the key-value pair to the store, Putnext is the core method of the stored function.
Pigs and pythons (pig and Python)