Reprinted from: http://blog.csdn.net/a925907195/article/details/42325579
1 installation
Install only on the Namenode node
1.1 Download and Unzip
Download: http://pig.apache.org/releases.html download pig-0.12.1 version of pig-0.12.1.tar.gz
Storage path:/home/hadoop/
Decompression: TAR-ZXVF pig-0.12.1.tar.gz renamed: MV pig-0.12.1 Pig and put it under/usr/local/hadoop
1.2 Changing the owner of Pig
Chown-r Hadoop:hadoop/usr/local/hadoop/pig
1.3 Modifying a configuration file
Add path Path: Open/etc/profile file (vi/etc/profile) at the end, add the following #pig path
Export path= $PATH:/usr/local/hadoop/pig/bin make changes effective: source/etc/profile
1.4 Verifying that the installation is successful
Enter the Pig–x local command. See the "grunt>" prompt appears, indicating that the pig has been installed successfully, as follows:
Pig–x Local
1.5 Configuring the MapReduce mode for pig
Edit the/etc/profile file and join the hadoop/conf path
VIM/ETC profile
Export path= $PATH:/usr/local/hadoop/pig/bin
Export pig_classpath=/usr/local/hadoop/conf
Execute to make the configuration file effective
Source/etc/profile
1.6 Verifying the MapReduce mode of pig
Enter the pig command directly with the "grunt>" prompt (Hadoop must be started first)
1.7 Modifying the log file directory for pig
Pig logs by default in the current directory, inconvenient for analysis and management, need to modify the log file directory, modified as follows:
1) Create a new folder under the/usr/pig directory logs
Midir/usr/local/hadoop/pig/logs
2) Modify the Pig.logfile=/usr/local/hadoop/pig/logs in the/usr/local/hadoop/pig/conf/pig.properties file
Open a/usr/local/hadoop/pig/conf/pig.properties file and find the Pig.logfile modified as follows:
Pig.logfile=/usr/local/hadoop/pig/logs
1.8 Pig Common Commands
Pig–x local Enter pig in native mode
Pig enters pig directly in HDFs system mode
Test Pig Latin Statement
Common statements:
Load: Indicates how the data is loaded
FOREACH: Progressive scan for some sort of processing
Filter: Filtering rows
Dump: Display the results to the screen
Store: Save the results to a file
Usually write the order of execution:
Load--〉foreach--〉store
1.9 Testing the execution of pig jobs in MapReduce mode
Step one: Upload passwd.txt to the HDFs file system
Cat/home/hadoop/fjshtest/passwd.txt
Root:x:0:0:root:/root:/bin/bash
Daemon:x:1:1:daemon:/usr/sbin:/bin/sh
Bin:x:2:2:bin:/bin:/bin/sh
Sys:x:3:3:sys:/dev:/bin/sh
Sync:x:4:65534:sync:/bin:/bin/sync
Games:x:5:60:games:/usr/games:/bin/sh
Man:x:6:12:man:/var/cache/man:/bin/sh
Lp:x:7:7:lp:/var/spool/lpd:/bin/sh
Mail:x:8:8:mail:/var/mail:/bin/sh
News:x:9:9:news:/var/spool/news:/bin/sh
Uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
Proxy:x:13:13:proxy:/bin:/bin/sh
Www-data:x:33:33:www-data:/var/www:/bin/sh
Backup:x:34:34:backup:/var/backups:/bin/sh
List:x:38:38:mailinglist manager:/var/list:/bin/sh
Bin/hadoop fs-put/home/hadoop/fjshtest/passwd.txt/user/hadoop/in
Bin/hadoop fs-ls/user/hadoop/in
-rw-r--r--2 hadoop supergroup 1705 2015-01-01 22:46/user/hadoop/in/passwd.txt
-rw-r--r--2 Hadoop supergroup 1026 2015-01-01 22:23/user/hadoop/in/pigtest
-rw-r--r--2 Hadoop supergroup 2014-11-14 23:18/user/hadoop/in/test1.txt
-rw-r--r--2 Hadoop supergroup 2014-11-14 23:18/user/hadoop/in/test2.txt
Step Two: Execute the following command in the Grunt compiler command line in turn
A = Load '/user/hadoop/in/passwd.txt ' Usingpigstorage (': ');
B = foreach A generate$0 as ID;
Dump B;
The command execution results can be viewed directly on the screen
-Total input Paths toprocess:1
(Root)
(daemon)
(BIN)
(SYS)
(sync)
(games)
(man)
(LP)
(mail)
(News)
(UUCP)
(proxy)
(Www-data)
(Backup)
(list)
(IRC)
(gnats)
(nobody)
(LIBUUID)
Common Error Grooming:
Pig statement equals two times need space, otherwise error
A=load ' Test.txt ' as {ip:chararray , Other:chararray} Usingpigstorage (");
--Error
Grunt>a=load ' Test.txt ' as {Ip:chararray,other:chararray} using Pigstorage (');
2014-07-0416:05:35,935 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1000:errorduring parsing. Encountered "<PATH>" A=load "" Atline 2, column 1.
Problem 2:load Loading data, usingpigstorage (') needs to be written to as before
A =load ' Test.txt ' as (ip:chararray , other : Chararray) using Pigstorage (");
--Error
Grunt>a = LOAD ' Test.txt ' as (Ip:chararray,other:chararray) using Pigstorage (');
2014-07-0416:03:35,421 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1200:<line 1, column 54> Mismatched input ' using ' Expectingsemi_colon
Issue 3: Some functions and keywords such as count,pigstorage, partition casing, otherwise the hint does not exist
C =foreach B {generate Ip,count (IP);};
--Error
Grunt>c = foreach B {generate Ip,count (IP);};
2014-07-0416:19:40,167 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1070:couldnot Resolve count using imports: [ , Java.lang., Org.apache.pig.builtin.,org.apache.pig.impl.builtin.]
Detailsat logfile:/app01/pig-0.13.0/pig_1404460981802.log
Issue 4: Specify the field name, you need to specify the relationship (A.IP)
C =foreach B {generate Ip,count (IP);};
--Error
Grunt>c = foreach B {generate Ip,count (IP);};
2014-07-0416:18:54,919 [main] ERROR Org.apache.pig.tools.grunt.grunt-error 1025:
<line4, column 24> Invalid field projection. Projected field [IP] does not existin schema:group:chararray,a:bag{:tuple (Ip:chararray,other:chararray)}.
Install pig under Ubuntu