Summary of pig's built-in functions (incomplete)

Source: Internet
Author: User
Tags mathematical functions natural logarithm

Piggybank has many functions that can be called using register and define. You can also use Java to develop it based on piggybank.

For example, to read the sequence binary file, you can use the sequencefileloader function in piggybank or develop the function on your own.

-- Register piggybank. jar;
Register wizad-etl-udf-0.1.jar;


-- Define sequencefileloader org. Apache. Pig. piggybank. Storage. sequencefileloader ();
Define sequencefileloader com. ETL. Pig. sequencefilecsvloader ();

-- Origin_cleaned_data = load '$ cleaned_log' using pigstorage (',')
Origin_cleaned_data = load '$ cleaned_log' using sequencefileloader



The following are pig built-in functions that can be used directly:


Load function: pigstorage, hbasestorage

Testloader reads text files by row, and each row acts as a tuple (A chararay field)


Storage functions: pigstorage (HDFS) and hbasestorage (hbase)


Built-in mathematical functions:
ABS () absolute value,
Tan () tangent
Atan () arc tangent
Tanh () hyperbolic tangent


SQRT () Square Root
Cbrt () Root


Sin sine
Sinh hyperbolic sine
Cos () cosine
ACOs () Inverse cosine value
Cosh () hyperbolic cosine
Exp (): power of E (base-e exponential function)
Log () natural logarithm
Log10 (): Base-10 logarithm function


Round: Rounding Value
Ceil () rounded up


Floor (double) is less than or equal to the maximum integer of an expression






Note: {(INT)} Is the tupe bag containing int fields.
Built-in Aggregate functions:
AVG ({(INT)}): average value of all values. null is ignored.
AVG ({(long)}): average value of all values. null is ignored.
AVG ({(float)}): average value of all values. null is ignored.
AVG ({(double)}): average value of all values. null is ignored.
AVG ({(bytearray)}): the average value of all bytearray values converted to the double type. null is ignored.


Count
Count_star: equivalent to count (*) in SQL (*)
Sum ({(INT)}): And sum ({(float )})...
Sum ({(bytearray)}): sum after conversion from bytearray type to double type. null is ignored.
Max () find Max
Min () Minimum search


Built-in chararray and bytearray udfs
Concat (chararray A, chararray B): connection string field a, B
Concat (bytearray A, bytearray B): connection string field a, B


Character search, returns the first and last positions of the search.
Index_of (chararray source, chararray search): In the source field, search for the search field and return the first position in search. If no value exists,-1 is returned.
For example, split IOS into ios6 if (indexof (OS _version, '7 ')! = 0), ios7 if indexof (OS _version, '7') = 0;


Last_index_of (chararray source, chararray search): searches for the search field in source and returns the position where the last character of search appears. -1 is returned if none exist.


Lcfirst (chararray): First character to lowercase
Ucfisrt (chararray input): converts the first character to uppercase.
Lower (chararray): converts all characters to lowercase letters.
Upper (chararray): converts all characters to uppercase


Regex_extract (chararray source, chararray RegEx, int N): RegEx is a regular expression. In source, all strings matching RegEx (chararray) are returned, and the nth value (N starts from 1) is returned ); if none, null is returned.

Instance: alladid = foreach allrow generate regex_extract (chararray) $3 ,'(. *)(. *) ', 1) as time, regex_extract (chararray) $0 ,'(. *)_(. *) ', 1) As ADN, $6 as ad_id;


(Chararray) regex_extract_all (chararray source, chararray RegEx): Find all parts matching RegEx in source as a tuple to return (chararray). If not, return null.


Chararry Replace (chararry source, chararray toreplace, chararray newvalue): replace all toreplace fields with newvalue in the source field.


Long SIZE (chararray input): returns the number of characters in input.




(Chararray) strsplit (chararray source): separates the source string by spaces, and returns a tuple containing a field, such as (aa bb cc)
(Chararray) strsplit (chararray source, chararray RegEx): separates the source string by the regular expression RegEx, and returns the tuple containing a field
(Chararray) strsplit (chararray source, chararray RegEx, int maxsplits): separates the source string by the regular expression RegEx. If the maximum number of maxsplits is exceeded, only the previous part is retained, returns the tuple containing a field.


Substring (chararray source, int start, int end): extract the source substring from start to end. If the input string is less than start, an error is returned.


{(Chararray)} tokenize (chararray source): splits the source string by space into multiple parts, stores them into tuple, and returns the result as a bag.


Chararray trim (chararray input): removes all spaces before and after a string.


There are many more ..
Random (): Random Number from 0 to 1

Empty isempty (BAG) and isempty (tuple)




Summary of pig's built-in functions (incomplete)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.