Apache Pig Study notes (ii)

Source: Internet
Author: User


Mainly organized a bit, pig inside the meaning and usage of some key words, pig although is a data stream processing as the core framework, but the database of most key words and operations, in pig basically can find corresponding function, very flexible and concise, the last article before the Spring Festival, I wish you a happy Spring festival!
1, reserved Keywords:
--Aassert, and, any, all, arrange, as, ASC, AVG
--Bbag, Binstorage, by, ByteArray, BIGINTEGER, BIGDECIMAL
--Ccache, Case, Cat, CD, Chararray, Cogroup, CONCAT, copyfromlocal, Copytolocal, COUNT, CP, Cross
--Ddatetime,%declare,%default, define, dense, desc, describe, DIFF, distinct, double, du, dump
--Ee, E, eval, exec, explain
--Ff, F, filter, flatten, float, foreach, full
--Ggenerate, group
--Hhelp
--Iif, illustrate, import, inner, input, int, into, is
--Jjoin
--Kkill
--Ll, L, left, limit, load, long, LS
--Mmap, Matches, MAX, MIN, mkdir, MV
--Nnot, NULL
--Oonschema, or, order, outer, output
--Pparallel, Pig, Pigdump, Pigstorage, pwd
--Qquit
--Rregister, returns, right, RM, RMF, rollup, run
--Ssample, set, ship, SIZE, Split, stderr, stdin, stdout, store, stream, SUM
--Ttextloader, tokenize, through, tuple
--Uunion, using
--V, W, X, Y, zvoid
2, case sensitive, alias case sensitive, keyword case can be, for example, Load,group,foreach and Load,group,foreach are equivalent
3, alias definition (the first character must be a letter, other positions can be letters, numbers, underscores)
4, collection type
Bags, similar to table, can contain multiple row
Tuples, like row rows, can have more than one field
fields, specific data
5, Column name reference, in a relational database we can use column names to locate a field value of a row of data, in JDBC, we can either by the column name reference, or by index subscript reference, in pig, also support the two references, subscript reference need to add $0,$1 such a digital ID.
6, data type
(Basic type)
INT: Signed 32-bit integer
Long: Signed 64-bit integer
Float:32 bit single precision
double:64 bit single precision
The type of string inside the Chararray:java must be UTF-8 encoded
Bytearray:blob byte type
Boolean: Boolean type
DateTime: Date Type
Biginteger:java Bigingteger
Bigdecimal:java Bigdecimal
(collection type)
Tuple: An ordered set of field values, similar to the list in Java
A collection of bag:tuple, similar to the collection collection Super interface inside Java
Map:java inside the map,k and V, directly using the # number separately, the reference need to add the # number
7, Operator:
(1) comparison operator ==,!=,<,>,>=,<=
(2) comparison operator matches, suitable for strings, supports regular
(3) arithmetic operator +,-, *,/,%,?:,case
(4) The null operator is not NULL and is null
(5) Collection type reference symbol tuple (.), Map (#)
(6) relational operator Cogroup,group,join
(7) Function count_star,sum,min,max,count,avg,concat,size
8, when multiple data sources join, alias is distinguished, using a::name,b::name
9, Fallten can flatten a collection type, or nested type, to a line, see the following example
b={(A,b,c), (B,B,C)}
After FLATTEN (B)
A,b,c,b,b,c is a row of data.
10,cogroup, multi-table grouping use
11,cross, two data source links, will produce a Cartesian set
12,distinct, go to heavy, and relational database is different, can not go to a single field to heavy, must be a row, if you want to filed to the single, then you need to take this filed, separately extracted, and then in the distinct
13,filter, filters, similar to the Where condition of the database, returns a Boolean value.
14,foreach, iterate, extract a column, or columns of data,
15,group, grouping, database-like group
16,partition by, same as partition components in Hadoop
17,join, internal and external connections, similar to the relational database, in Hadoop and different connection methods: Copy connection, merge connection, skewed connection, etc.
18,limit, limiting the number of rows returned by the result set, similar to the limit keyword in MySQL
19,load,pig-specific keywords, responsible for loading data sources from a specified path, which can be consistent with the path wildcard of Hadoop using wildcard characters
20,mapreduce, in pig, executes a jar package in the form of Mr
21,order by is similar to the order of the relational database
22,rank, gives a set, generates a sequence number, like a for loop when the index self-increment
23,sample, sampler, to randomly extract the specified number of records from a specified set of data
24,split, you can split a large data set by condition, generating several different small datasets
The function that stores the result inside the 25,store,pig can store a collection in the specified place in the specified storage mode.
26,stream, which provides streaming ways to interact with other programming languages in the pig script, such as passing the intermediate results of pig processing to python,perl, or the shell, etc.
27,union, union of similar data, merging two result sets into one result set
28,REGISTER,UDF, use this keyword to register our component, which may be a jar package, or it may be a python file
29,define, define an alias for the reference to the UDF
30,import, in a pig script, use the IMPRT keyword to introduce another pig script

Apache Pig Study notes (ii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.