The first pig Program
Environment:
Hadoop-1.1.2
Pig-1, 0.11.1
Centos6.4 for Linux
Jdk1.6
Run in pseudo-distributed mode
Start: Pig or pig-x mapreduce
After the startup, you will see this interface, indicating that the startup is successful.
Let's run an example.
The input data student.txt is as follows:
201000101: zhanglong: Man: 20: Computer
201000102: WangLi: Women: 19: Software
201000103: Liuhua: Women: 18: compuer
201000104: Lixiao: Man: 19: datastructer
201000105: wuda: Man: 19: System
201000106: huake: Man: 19: computersystem
Upload student.txt to the input directory on the HDFS file system.
View FS-ls/Input
At the bottom of the page, student.txt
Running method 1
-- Load data (note that spaces are required between the left and right sides of "=)
Grunt> A = load '/input/student.txt' using pigstorage (':') as (SNO: chararray, sname: chararray, ssex: chararray, sage: int, sdept: chararray );
-- Select the student field from a (note that spaces are required between the left and right sides of "=)
Grunt> B = foreach A generate sname, Sage;
-- Output content in B to the screen
Grunt> dump B;
-- Output the content of B to the file in the HDFS File System
Grunt> store B into '/output/result.txt ';
Grunt> FS-CAT/output/result.txt/part-m-00000
The result is as follows:
The first pig is running successfully.
Operation Method 2
Create a script. Pig file and put all the statements executed above,
A = load '/input/student.txt' using pigstorage (':') as (SNO: chararray, sname: chararray, ssex: chararray, sage: int, sdept: chararray );
B = foreach A generate sname, Sage;
Dump B;
Store B into '/result1.txt ';
Then it is stored locally in the Linux system, and then runs pig script. Pig under the same directory.
Still successful
The first pig task