First, I want to process many documents in a folder in batches. These documents save the data I want to process, because pig is a beginner ,, so I don't know how to load data in batches, and I haven't written it.
Your UDF can only load files one by one and then process them.
But this is definitely not the way I want to handle it, so I think it is possible to insert the pig script into the shell and then execute it cyclically.
The final attempt was successful. Of course, I believe that the pig UDF can define this load method by myself, but for quick implementation
Use this method first.
The following is the shell code:
Mkdir result_0925_d2for eachfile in 'LS-B | grep 00. * 'do echo $ eachfile input _ = $ eachfile output _ =. /result_0925_d2/$ input _ echo $ output _ file = "$ input _" file_out = "$ output _" pig-Param input = $ file-Param output = $ file_out-x local new_getresult.pigdone
The most important thing is to input and output file loops during pig execution.
Pig is used in shell to load and process files in batches