The previous blog recorded the maximum and total number of implementations of the MapReduce task using hive, and the same functionality was achieved with another powerful tool, pig.
First download the pig-0.10.1.tar.gz version to Hadoop/pig, configure the Hadoop_home and PATH environment variables.
#读取HDFS中的数据到变量中 grunt> line = Load ' inputtest/test.log ' using Pigstorage (") as (day, bytes, tag, user);
Grunt> describe line;
Line: {day:bytearray,bytes:bytearray,tag:bytearray,user:bytearray} grunt> dump line;
(20121221,04567,user,s00001)
(20121221,75531,user,s00003)
(20121222,52369,user,s00002)
(20121222,01297,user,s00001)
(20121223,61223,user,s00002)
(20121223,33121,user,s00003) #对变量进行group操作 grunt> groupd_line = Group line by day;
Grunt> describe Groupd_line; Groupd_line: {group:bytearray,line: {(Day:bytearray,bytes:bytearray,tag:bytearray,user:bytearray)}} grunt> Dump
Groupd_line;
(20121221,{(20121221,04567,user,s00001), (20121221,75531,user,s00003)})
(20121222,{(20121222,52369,user,s00002), (20121222,01297,user,s00001)}) (20121223,{(20121223,61223,user,s00002), (20121223,33121,user,s00003)}) #计算总量 grunt> sum_groupd_line = foreach
Groupd_line Generate group, SUM (Line.bytes);
Grunt> describe Sum_groupd_line; Sum_groupd_line: {GROUP:BYTEARRAY,DOUBLe} grunt> dump sum_groupd_line;
(20121221,80098.0) (20121222,53666.0) (20121223,94344.0) #保存到HDFS中 grunt> store sum_groupd_line into ' sumoutput '; #通过HDFS查看结果 Root:~/hadoop # Hadoop fs-cat/user/root/sumoutput/part-r-00000 20121221 80098.0 20121222 53666
.0 20121223 94344.0 #计算最大值也是类似操作 grunt> max_groupd_line = foreach Groupd_line generate group, Max (Line.bytes);
grunt> dump Max_groupd_line; (20121221,75531.0) (20121222,52369.0) (20121223,61223.0) grunt> store max_groupd_line into ' maxoutput ';
Same action for files in Lzo compressed format
Grunt> line = Load ' Inputtest/test.log.lzo ' using Pigstorage (") as (day, bytes, tag, user);
grunt> Filt = FILTER line by day = = 20121221;
grunt> dump Filt;
(20121221,04567,user,s00001)
(20121221,75531,user,s00003)
Reference: Installation and use of pig