A small example of how to record a pig string interception:
The requirement is as follows to extract the value of column 2nd (after the colon) from the following string:
Java code
1 2 3 4a:ab#c#da:c#c#da:dd#c#da:zz#c#d
If it is in Java, the method may have many kinds, such as substring, or split several times, and so on in pig, you can use the substring built-in functions to complete, but it is recommended to use the following way, this method is more flexible, suitable for most scenes of data extraction, The 2 functions that need to be used are as follows:
(1) regex_extract (' Primitive string ', ' regular ', return index of type int)
(2) Strsplit (' Primitive string ', ' regular ', limit number of returns)
Pig script notation:
Java code
A = Laod '/tmp/data ' as (Data:chararray)//Note that the logic is to first get the data after the colon, then split into a tuple, and then through the (subscript access Element) to get the data we need. b = foreach a Generate Strsplit (regex_extract (Data, ' (. *):(. *) ', 2), ' # ', 5). $0;dump b;
Apache Pig string Interception in combat small example