Install pig under Ubuntu

Last Update:2016-05-25 Source: Internet

Author: User

Tags builtin hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprinted from: http://blog.csdn.net/a925907195/article/details/42325579

1 installation

Install only on the Namenode node

1.1 Download and Unzip

Download: http://pig.apache.org/releases.html download pig-0.12.1 version of pig-0.12.1.tar.gz

Storage path:/home/hadoop/

Decompression: TAR-ZXVF pig-0.12.1.tar.gz renamed: MV pig-0.12.1 Pig and put it under/usr/local/hadoop

1.2 Changing the owner of Pig

Chown-r Hadoop:hadoop/usr/local/hadoop/pig

1.3 Modifying a configuration file

Add path Path: Open/etc/profile file (vi/etc/profile) at the end, add the following #pig path

Export path= $PATH:/usr/local/hadoop/pig/bin make changes effective: source/etc/profile

1.4 Verifying that the installation is successful

Enter the Pig–x local command. See the "grunt>" prompt appears, indicating that the pig has been installed successfully, as follows:

Pig–x Local

1.5 Configuring the MapReduce mode for pig

Edit the/etc/profile file and join the hadoop/conf path

VIM/ETC profile

Export path= $PATH:/usr/local/hadoop/pig/bin

Export pig_classpath=/usr/local/hadoop/conf

Execute to make the configuration file effective

Source/etc/profile

1.6 Verifying the MapReduce mode of pig

Enter the pig command directly with the "grunt>" prompt (Hadoop must be started first)

1.7 Modifying the log file directory for pig

Pig logs by default in the current directory, inconvenient for analysis and management, need to modify the log file directory, modified as follows:

1) Create a new folder under the/usr/pig directory logs

Midir/usr/local/hadoop/pig/logs

2) Modify the Pig.logfile=/usr/local/hadoop/pig/logs in the/usr/local/hadoop/pig/conf/pig.properties file

Open a/usr/local/hadoop/pig/conf/pig.properties file and find the Pig.logfile modified as follows:

Pig.logfile=/usr/local/hadoop/pig/logs

1.8 Pig Common Commands

Pig–x local Enter pig in native mode

Pig enters pig directly in HDFs system mode

Test Pig Latin Statement

Common statements:

Load: Indicates how the data is loaded

FOREACH: Progressive scan for some sort of processing

Filter: Filtering rows

Dump: Display the results to the screen

Store: Save the results to a file

Usually write the order of execution:

Load--〉foreach--〉store

1.9 Testing the execution of pig jobs in MapReduce mode

Step one: Upload passwd.txt to the HDFs file system

Cat/home/hadoop/fjshtest/passwd.txt

Root:x:0:0:root:/root:/bin/bash

Daemon:x:1:1:daemon:/usr/sbin:/bin/sh

Bin:x:2:2:bin:/bin:/bin/sh

Sys:x:3:3:sys:/dev:/bin/sh

Sync:x:4:65534:sync:/bin:/bin/sync

Games:x:5:60:games:/usr/games:/bin/sh

Man:x:6:12:man:/var/cache/man:/bin/sh

Lp:x:7:7:lp:/var/spool/lpd:/bin/sh

Mail:x:8:8:mail:/var/mail:/bin/sh

News:x:9:9:news:/var/spool/news:/bin/sh

Uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh

Proxy:x:13:13:proxy:/bin:/bin/sh

Www-data:x:33:33:www-data:/var/www:/bin/sh

Backup:x:34:34:backup:/var/backups:/bin/sh

List:x:38:38:mailinglist manager:/var/list:/bin/sh

Bin/hadoop fs-put/home/hadoop/fjshtest/passwd.txt/user/hadoop/in

Bin/hadoop fs-ls/user/hadoop/in

-rw-r--r--2 hadoop supergroup 1705 2015-01-01 22:46/user/hadoop/in/passwd.txt

-rw-r--r--2 Hadoop supergroup 1026 2015-01-01 22:23/user/hadoop/in/pigtest

-rw-r--r--2 Hadoop supergroup 2014-11-14 23:18/user/hadoop/in/test1.txt

-rw-r--r--2 Hadoop supergroup 2014-11-14 23:18/user/hadoop/in/test2.txt

Step Two: Execute the following command in the Grunt compiler command line in turn
A = Load '/user/hadoop/in/passwd.txt ' Usingpigstorage (': ');

B = foreach A generate$0 as ID;

Dump B;

The command execution results can be viewed directly on the screen

-Total input Paths toprocess:1

(Root)

(daemon)

(BIN)

(SYS)

(sync)

(games)

(man)

(LP)

(mail)

(News)

(UUCP)

(proxy)

(Www-data)

(Backup)

(list)

(IRC)

(gnats)

(nobody)

(LIBUUID)

Common Error Grooming:

Pig statement equals two times need space, otherwise error
A=load ' Test.txt ' as {ip:chararray , Other:chararray} Usingpigstorage (");
--Error
Grunt>a=load ' Test.txt ' as {Ip:chararray,other:chararray} using Pigstorage (');
2014-07-0416:05:35,935 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1000:errorduring parsing. Encountered "<PATH>" A=load "" Atline 2, column 1.
Problem 2:load Loading data, usingpigstorage (') needs to be written to as before
A =load ' Test.txt ' as (ip:chararray , other : Chararray) using Pigstorage (");
--Error
Grunt>a = LOAD ' Test.txt ' as (Ip:chararray,other:chararray) using Pigstorage (');
2014-07-0416:03:35,421 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1200:<line 1, column 54> Mismatched input ' using ' Expectingsemi_colon

Issue 3: Some functions and keywords such as count,pigstorage, partition casing, otherwise the hint does not exist
C =foreach B {generate Ip,count (IP);};
--Error
Grunt>c = foreach B {generate Ip,count (IP);};
2014-07-0416:19:40,167 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1070:couldnot Resolve count using imports: [ , Java.lang., Org.apache.pig.builtin.,org.apache.pig.impl.builtin.]
Detailsat logfile:/app01/pig-0.13.0/pig_1404460981802.log

Issue 4: Specify the field name, you need to specify the relationship (A.IP)
C =foreach B {generate Ip,count (IP);};
--Error
Grunt>c = foreach B {generate Ip,count (IP);};
2014-07-0416:18:54,919 [main] ERROR Org.apache.pig.tools.grunt.grunt-error 1025:
<line4, column 24> Invalid field projection. Projected field [IP] does not existin schema:group:chararray,a:bag{:tuple (Ip:chararray,other:chararray)}.

Install pig under Ubuntu

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More