Pig Installation Use

Source: Internet
Author: User
Tags foreach hadoop fs

The previous blog recorded the maximum and total number of implementations of the MapReduce task using hive, and the same functionality was achieved with another powerful tool, pig.

First download the pig-0.10.1.tar.gz version to Hadoop/pig, configure the Hadoop_home and PATH environment variables.

#读取HDFS中的数据到变量中 grunt> line = Load ' inputtest/test.log ' using Pigstorage (") as (day, bytes, tag, user);
Grunt> describe line;
Line: {day:bytearray,bytes:bytearray,tag:bytearray,user:bytearray} grunt> dump line;
(20121221,04567,user,s00001)
(20121221,75531,user,s00003)
(20121222,52369,user,s00002)
(20121222,01297,user,s00001)
(20121223,61223,user,s00002)
(20121223,33121,user,s00003) #对变量进行group操作 grunt> groupd_line = Group line by day;
Grunt> describe Groupd_line; Groupd_line: {group:bytearray,line: {(Day:bytearray,bytes:bytearray,tag:bytearray,user:bytearray)}} grunt> Dump
Groupd_line;
(20121221,{(20121221,04567,user,s00001), (20121221,75531,user,s00003)})
(20121222,{(20121222,52369,user,s00002), (20121222,01297,user,s00001)}) (20121223,{(20121223,61223,user,s00002), (20121223,33121,user,s00003)}) #计算总量 grunt> sum_groupd_line = foreach
Groupd_line Generate group, SUM (Line.bytes);
Grunt> describe Sum_groupd_line; Sum_groupd_line: {GROUP:BYTEARRAY,DOUBLe} grunt> dump sum_groupd_line;

(20121221,80098.0) (20121222,53666.0) (20121223,94344.0) #保存到HDFS中 grunt> store sum_groupd_line into ' sumoutput '; #通过HDFS查看结果 Root:~/hadoop # Hadoop fs-cat/user/root/sumoutput/part-r-00000 20121221 80098.0 20121222 53666 
.0 20121223 94344.0 #计算最大值也是类似操作 grunt> max_groupd_line = foreach Groupd_line generate group, Max (Line.bytes);
grunt> dump Max_groupd_line; (20121221,75531.0) (20121222,52369.0) (20121223,61223.0) grunt> store max_groupd_line into ' maxoutput ';

Same action for files in Lzo compressed format

Grunt> line = Load ' Inputtest/test.log.lzo ' using Pigstorage (") as (day, bytes, tag, user);
grunt> Filt = FILTER line by day = = 20121221;
grunt> dump Filt;
(20121221,04567,user,s00001)
(20121221,75531,user,s00003)

Reference: Installation and use of pig

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.