Install pig under Ubuntu

Source: Internet
Author: User
Tags builtin hadoop fs

Reprinted from: http://blog.csdn.net/a925907195/article/details/42325579

1 installation

Install only on the Namenode node

1.1 Download and Unzip

Download: http://pig.apache.org/releases.html download pig-0.12.1 version of pig-0.12.1.tar.gz

Storage path:/home/hadoop/

Decompression: TAR-ZXVF pig-0.12.1.tar.gz renamed: MV pig-0.12.1 Pig and put it under/usr/local/hadoop

1.2 Changing the owner of Pig

Chown-r Hadoop:hadoop/usr/local/hadoop/pig

1.3 Modifying a configuration file

Add path Path: Open/etc/profile file (vi/etc/profile) at the end, add the following #pig path

Export path= $PATH:/usr/local/hadoop/pig/bin make changes effective: source/etc/profile

1.4 Verifying that the installation is successful

Enter the Pig–x local command. See the "grunt>" prompt appears, indicating that the pig has been installed successfully, as follows:

Pig–x Local

1.5 Configuring the MapReduce mode for pig

Edit the/etc/profile file and join the hadoop/conf path

VIM/ETC profile

Export path= $PATH:/usr/local/hadoop/pig/bin

Export pig_classpath=/usr/local/hadoop/conf

Execute to make the configuration file effective

Source/etc/profile

1.6 Verifying the MapReduce mode of pig

Enter the pig command directly with the "grunt>" prompt (Hadoop must be started first)

1.7 Modifying the log file directory for pig

Pig logs by default in the current directory, inconvenient for analysis and management, need to modify the log file directory, modified as follows:

1) Create a new folder under the/usr/pig directory logs

Midir/usr/local/hadoop/pig/logs

2) Modify the Pig.logfile=/usr/local/hadoop/pig/logs in the/usr/local/hadoop/pig/conf/pig.properties file

Open a/usr/local/hadoop/pig/conf/pig.properties file and find the Pig.logfile modified as follows:

Pig.logfile=/usr/local/hadoop/pig/logs

1.8 Pig Common Commands

Pig–x local Enter pig in native mode

Pig enters pig directly in HDFs system mode

Test Pig Latin Statement

Common statements:

Load: Indicates how the data is loaded

FOREACH: Progressive scan for some sort of processing

Filter: Filtering rows

Dump: Display the results to the screen

Store: Save the results to a file

Usually write the order of execution:

Load--〉foreach--〉store

1.9 Testing the execution of pig jobs in MapReduce mode

Step one: Upload passwd.txt to the HDFs file system

Cat/home/hadoop/fjshtest/passwd.txt

Root:x:0:0:root:/root:/bin/bash

Daemon:x:1:1:daemon:/usr/sbin:/bin/sh

Bin:x:2:2:bin:/bin:/bin/sh

Sys:x:3:3:sys:/dev:/bin/sh

Sync:x:4:65534:sync:/bin:/bin/sync

Games:x:5:60:games:/usr/games:/bin/sh

Man:x:6:12:man:/var/cache/man:/bin/sh

Lp:x:7:7:lp:/var/spool/lpd:/bin/sh

Mail:x:8:8:mail:/var/mail:/bin/sh

News:x:9:9:news:/var/spool/news:/bin/sh

Uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh

Proxy:x:13:13:proxy:/bin:/bin/sh

Www-data:x:33:33:www-data:/var/www:/bin/sh

Backup:x:34:34:backup:/var/backups:/bin/sh

List:x:38:38:mailinglist manager:/var/list:/bin/sh

Bin/hadoop fs-put/home/hadoop/fjshtest/passwd.txt/user/hadoop/in

Bin/hadoop fs-ls/user/hadoop/in

-rw-r--r--2 hadoop supergroup 1705 2015-01-01 22:46/user/hadoop/in/passwd.txt

-rw-r--r--2 Hadoop supergroup 1026 2015-01-01 22:23/user/hadoop/in/pigtest

-rw-r--r--2 Hadoop supergroup 2014-11-14 23:18/user/hadoop/in/test1.txt

-rw-r--r--2 Hadoop supergroup 2014-11-14 23:18/user/hadoop/in/test2.txt

Step Two: Execute the following command in the Grunt compiler command line in turn
A = Load '/user/hadoop/in/passwd.txt ' Usingpigstorage (': ');

B = foreach A generate$0 as ID;

Dump B;

The command execution results can be viewed directly on the screen

-Total input Paths toprocess:1

(Root)

(daemon)

(BIN)

(SYS)

(sync)

(games)

(man)

(LP)

(mail)

(News)

(UUCP)

(proxy)

(Www-data)

(Backup)

(list)

(IRC)

(gnats)

(nobody)

(LIBUUID)

Common Error Grooming:

Pig statement equals two times need space, otherwise error
A=load ' Test.txt ' as {ip:chararray       , Other:chararray} Usingpigstorage (");
--Error
Grunt>a=load ' Test.txt ' as {Ip:chararray,other:chararray} using Pigstorage (');
2014-07-0416:05:35,935 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1000:errorduring parsing. Encountered "<PATH>" A=load "" Atline 2, column 1.
Problem 2:load Loading data, usingpigstorage (') needs to be written to as before
A =load ' Test.txt ' as (ip:chararray       , other : Chararray) using Pigstorage (");
--Error
Grunt>a = LOAD ' Test.txt ' as (Ip:chararray,other:chararray) using Pigstorage (');
2014-07-0416:03:35,421 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1200:<line 1, column 54>   Mismatched input ' using ' Expectingsemi_colon

Issue 3: Some functions and keywords such as count,pigstorage, partition casing, otherwise the hint does not exist
C =foreach B {generate Ip,count (IP);};
--Error
Grunt>c = foreach B {generate Ip,count (IP);};
2014-07-0416:19:40,167 [main] ERROR org.apache.pig.tools.grunt.grunt-error 1070:couldnot Resolve count using imports: [ , Java.lang., Org.apache.pig.builtin.,org.apache.pig.impl.builtin.]
Detailsat logfile:/app01/pig-0.13.0/pig_1404460981802.log

Issue 4: Specify the field name, you need to specify the relationship (A.IP)
C =foreach B {generate Ip,count (IP);};
--Error
Grunt>c = foreach B {generate Ip,count (IP);};
2014-07-0416:18:54,919 [main] ERROR Org.apache.pig.tools.grunt.grunt-error 1025:
<line4, column 24> Invalid field projection. Projected field [IP] does not existin schema:group:chararray,a:bag{:tuple (Ip:chararray,other:chararray)}.

Install pig under Ubuntu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.