There are very many functions in Piggybank, which can be called with register and define. can also be used in Java modeled Piggybank self-developed.
For example, reading sequence binary files, can be used piggybank inside the function Sequencefileloader, can also develop their own functions.
--register Piggybank.jar;
REGISTER Wizad-etl-udf-0.1.jar;
--define Sequencefileloader Org.apache.pig.piggybank.storage.SequenceFileLoader ();
DEFINE Sequencefileloader Com.etl.pig.SequenceFileCSVLoader ();
--origin_cleaned_data = LOAD ' $Cleaned _log ' USING pigstorage (', ')
Origin_cleaned_data = LOAD ' $Cleaned _log ' USING sequencefileloader
The following is a pig built-in function that can be used directly:
Load function: Pigstorage,hbasestorage
Testloader reads the text file by line, each line as a tuple (is a field of type Chararay)
Storage functions: Pigstorage (HDFS), Hbasestorage (HBase)
Built-in math functions:
ABS () absolute value,
Tan () Tangent
ATAN () Anyway cut
TANH () Hyperbolic tangent
SQRT () square root
CBRT () cube root
Sin sine
Sinh hyperbolic sine
cos () cosine
ACOS () Inverse cosine value
COSH () Hyperbolic cosine
EXP (): The Power of E (exponential function of the bottom of E)
Log () natural logarithm
LOG10 (): 10 logarithm function for the base
ROUND: Rounding Value
Ceil () rounding up
Floor (double) is less than or equal to the largest integer of an expression
Note: {(int)} is a Tupe bag that includes the Int type field.
Built-in aggregation functions:
AVG ({(int)}): Average of all values, null ignored.
AVG ({(Long)}): Average of all values, null ignored.
AVG ({(float)}): Average of all values, null ignored.
AVG ({(double)}): Average of all values, null ignored.
AVG ({(ByteArray)}): All ByteArray values are converted to the average after the double type, and null is ignored.
COUNT
Count_star: Equivalent to COUNT (*) in SQL
SUM ({(int)}): There is also sum ({(float)}) ...
Sum ({(ByteArray)}): ByteArray type goes double after sum, NULL is ignored.
Max () find Max
MIN () Find minimum
UDF with built-in Chararray and ByteArray
CONCAT (Chararray A, Chararray b): Connection string field B
CONCAT (ByteArray A, ByteArray b): Connection string field B
A character lookup that returns the first and last position of the lookup.
Index_of (Chararray Source,chararray Search): In the Source field, find the search field and return to the first position where search appears, none returns-1
such as: SPLIT iOS into IOS6 if (INDEXOF (os_version, ' 7 ')!=0), iOS7 if INDEXOF (os_version, ' 7 ') ==0;
Last_index_of (Chararray Source,chararray Search): Finds the search field in source and returns where the last character of the search appears. None Returns-1
Lcfirst (Chararray): first character turn lowercase
UCFISRT (Chararray input): first character to uppercase
LOWER (Chararray): all characters to lowercase
UPPER (Chararray): all characters to uppercase
Regex_extract (Chararray source, chararray regex, int n): Regex is a regular expression, matches all Regex strings in Source (Chararray), returns NTH (n starts from 1) ; NULL is returned without
Example: Alladid =foreach allrow GENERATE regex_extract ((chararray) $, ' (. *) (. *) ', 1) as Time,regex_extract ((Chararray) $ A, ' ( . *) _ (. *) ', 1) as adn,$6 as ad_id;
(Chararray) Regex_extract_all (Chararray source, Chararray regex): Finds all parts of a match regex in source as a tuple return (Chararray), and no returns NULL.
Chararry Replace (Chararry source, Chararray Toreplace, Chararray newvalue): In the Source field, replace all newvalue fields with Toreplace.
Long SIZE (Chararray input): Returns the number of characters in input
(Chararray) Strsplit (Chararray Source): Separates the source string by a space, returning a tuple that includes a field, such as (AA BB cc)
(Chararray) Strsplit (Chararray source, Chararray regex): Separates the source string by a regular-form regex, returning a tuple that includes a field
(Chararray) Strsplit (Chararray source, chararray regex, int maxsplits): separates the source string by a regular-form regex, a partial discard that exceeds the maximum number of delimiters maxsplits, leaving only the previous part, Returns a tuple that includes a field
SUBSTRING (Chararray source,int start, int end): Extracts the substring of the source from start to end (excluding content from the end position, leaving only the end).The starting position starts from 0, not from 1. Input string less than start will cause an error.
{(Chararray)} Tokenize (Chararray Source): Cuts the source string into multiple parts by a space, respectively, into a tuple, and returns the whole as a bag.
Chararray TRIM (Chararray input): Remove all spaces before and after the string
There's a lot more.
Random (): 0 to 1
Empty IsEmpty (bag) and isempty (tuple)
Small summary of Pig's built-in functions (incomplete)