Small summary of Pig's built-in functions (incomplete)

Source: Internet
Author: User
Tags natural logarithm

There are very many functions in Piggybank, which can be called with register and define. can also be used in Java modeled Piggybank self-developed.

For example, reading sequence binary files, can be used piggybank inside the function Sequencefileloader, can also develop their own functions.

--register Piggybank.jar;
REGISTER Wizad-etl-udf-0.1.jar;


--define Sequencefileloader Org.apache.pig.piggybank.storage.SequenceFileLoader ();
DEFINE Sequencefileloader Com.etl.pig.SequenceFileCSVLoader ();

--origin_cleaned_data = LOAD ' $Cleaned _log ' USING pigstorage (', ')
Origin_cleaned_data = LOAD ' $Cleaned _log ' USING sequencefileloader



The following is a pig built-in function that can be used directly:


Load function: Pigstorage,hbasestorage

Testloader reads the text file by line, each line as a tuple (is a field of type Chararay)


Storage functions: Pigstorage (HDFS), Hbasestorage (HBase)


Built-in math functions:
ABS () absolute value,
Tan () Tangent
ATAN () Anyway cut
TANH () Hyperbolic tangent


SQRT () square root
CBRT () cube root


Sin sine
Sinh hyperbolic sine
cos () cosine
ACOS () Inverse cosine value
COSH () Hyperbolic cosine
EXP (): The Power of E (exponential function of the bottom of E)
Log () natural logarithm
LOG10 (): 10 logarithm function for the base


ROUND: Rounding Value
Ceil () rounding up


Floor (double) is less than or equal to the largest integer of an expression






Note: {(int)} is a Tupe bag that includes the Int type field.
Built-in aggregation functions:
AVG ({(int)}): Average of all values, null ignored.
AVG ({(Long)}): Average of all values, null ignored.
AVG ({(float)}): Average of all values, null ignored.
AVG ({(double)}): Average of all values, null ignored.
AVG ({(ByteArray)}): All ByteArray values are converted to the average after the double type, and null is ignored.


COUNT
Count_star: Equivalent to COUNT (*) in SQL
SUM ({(int)}): There is also sum ({(float)}) ...
Sum ({(ByteArray)}): ByteArray type goes double after sum, NULL is ignored.
Max () find Max
MIN () Find minimum


UDF with built-in Chararray and ByteArray
CONCAT (Chararray A, Chararray b): Connection string field B
CONCAT (ByteArray A, ByteArray b): Connection string field B


A character lookup that returns the first and last position of the lookup.
Index_of (Chararray Source,chararray Search): In the Source field, find the search field and return to the first position where search appears, none returns-1
such as: SPLIT iOS into IOS6 if (INDEXOF (os_version, ' 7 ')!=0), iOS7 if INDEXOF (os_version, ' 7 ') ==0;


Last_index_of (Chararray Source,chararray Search): Finds the search field in source and returns where the last character of the search appears. None Returns-1


Lcfirst (Chararray): first character turn lowercase
UCFISRT (Chararray input): first character to uppercase
LOWER (Chararray): all characters to lowercase
UPPER (Chararray): all characters to uppercase


Regex_extract (Chararray source, chararray regex, int n): Regex is a regular expression, matches all Regex strings in Source (Chararray), returns NTH (n starts from 1) ; NULL is returned without

Example: Alladid =foreach allrow GENERATE regex_extract ((chararray) $, ' (. *) (. *) ', 1) as Time,regex_extract ((Chararray) $ A, ' ( . *) _ (. *) ', 1) as adn,$6 as ad_id;


(Chararray) Regex_extract_all (Chararray source, Chararray regex): Finds all parts of a match regex in source as a tuple return (Chararray), and no returns NULL.


Chararry Replace (Chararry source, Chararray Toreplace, Chararray newvalue): In the Source field, replace all newvalue fields with Toreplace.


Long SIZE (Chararray input): Returns the number of characters in input




(Chararray) Strsplit (Chararray Source): Separates the source string by a space, returning a tuple that includes a field, such as (AA BB cc)
(Chararray) Strsplit (Chararray source, Chararray regex): Separates the source string by a regular-form regex, returning a tuple that includes a field
(Chararray) Strsplit (Chararray source, chararray regex, int maxsplits): separates the source string by a regular-form regex, a partial discard that exceeds the maximum number of delimiters maxsplits, leaving only the previous part, Returns a tuple that includes a field


SUBSTRING (Chararray source,int start, int end): Extracts the substring of the source from start to end (excluding content from the end position, leaving only the end).The starting position starts from 0, not from 1. Input string less than start will cause an error.


{(Chararray)} Tokenize (Chararray Source): Cuts the source string into multiple parts by a space, respectively, into a tuple, and returns the whole as a bag.


Chararray TRIM (Chararray input): Remove all spaces before and after the string


There's a lot more.
Random (): 0 to 1

Empty IsEmpty (bag) and isempty (tuple)




Small summary of Pig's built-in functions (incomplete)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.