[Hadoop Series] Pig Installation and Simple Demo sample

Source: Internet
Author: User

Inkfish original, do not reprint the commercial nature, reproduced please indicate the source (http://blog.csdn.net/inkfish). (Source: http://blog.csdn.net/inkfish)

Pig is a project that Yahoo! has donated to Apache, and is currently in the Apache Incubator (incubator) stage, with the version number v0.5.0. Pig is a Hadoop-based, large-scale data analysis platform that provides the sql-like language called Pig Latin, which translates the data analysis request of a class SQL into a series of optimized mapreduce operations. Pig provides a simple operation and programming interface for complex mass data parallel computing. This paper introduces the installation of pig and the execution of a simple demonstration sample, and mainly participates in/translates the Pig Setup from the official document. (Source: http://blog.csdn.net/inkfish)

Prerequisites: (Source: http://blog.csdn.net/inkfish)

    • Linux/unix system, or Windows operating system with Cygwin, I am using Ubuntu 8.04;
    • Hadoop 0.20.X
    • JDK 1.6 or higher
    • Ant 1.7 (optional, if you want to build your own pig, you need to)
    • JUnit 4.5 (optional, assuming you want to perform unit tests)

installation of Pig (Source: http://blog.csdn.net/inkfish)

1. Download Pig
can go to Pig's official homepage to download the latest pig, when writing an article, the latest version is Pig 0.5.0
2. Unzip
  $ tar -xvf pig-0.5.0.tar.gz
I usually like to put pig in the/opt/hadoop/pig-0.5.0 folder.
3. Environment variable Settings
To make it easier for pig to upgrade later, I created a soft link, the environment variable points to the soft link folder, and the soft link points to the latest pig version number.
  $ ln -s /opt/hadoop/pig-0.5.0 /opt/hadoop/pig
Edit/etc/enviroment, add the path to the bin subfolder of Pig in Path (you can also change ~/.BASHRC or ~/.profile).
4. Verify that the installation is complete
Once again into the terminal, type the env command and you should see that path is already in effect. Type the Pig-help command, and a help message appears that indicates that pig is properly installed. (Source: http://blog.csdn.net/inkfish)

the execution mode of pig (Source: http://blog.csdn.net/inkfish)

1. Local mode
Pig executes in local mode and only involves a single computer.
2.MapReduce mode
Pig executes in mapreduce mode and needs to be able to access a Hadoop cluster and need to install HDFS.

how pig is called (Source: http://blog.csdn.net/inkfish)

    • Grunt Shell mode: Through interactive way, enter the command to run the task;
    • Pig Script method: Execute the task by script;
    • Embedded mode: Embedded in the Java source code, through the Java call to perform the task.

Demo sample code for pig (Source: http://blog.csdn.net/inkfish)

The following describes the three different ways to call, first of all, the demo sample needs to use the source code, this part of the source code and official documents, but there are changes such as the following:

    • Fixed an error in the official documentation, that is, the Id.pig last line id.out the two-side full-width single-argument to the half-width single-quote;
    • 2. Fixed an error in the official documentation, that is, the Runidquery method of Idmapreduce.java a semicolon at the end of the first line;
    • 3. In accordance with common Java naming conventions, the class name is capitalized.

Script file: Id.pig(Source: http://blog.csdn.net/inkfish)

A = Load ' passwd ' using Pigstorage (': '); b = foreach A generate $ as id;dump b;store B into ' id.out ';

Java files in local mode: Idlocal.java(Source: http://blog.csdn.net/inkfish)

Import Java.io.ioexception;import Org.apache.pig.pigserver;public class Idlocal{public static void Main (string[] args) {try {pigserver pigserver = new Pigserver ("local"); Runidquery (Pigserver, "passwd"); } catch (Exception e) {}}public static void Runidquery (Pigserver pigserver, String inputfile) throws IOException { Pigserver.registerquery ("A = Load '" + inputfile + "' Using Pigstorage (': ');"); Pigserver.registerquery ("B = foreach A generate $ as ID;"); Pigserver.store ("B", "id.out"); }}

MapReduce mode Java file: Idmapreduce.java(Source: http://blog.csdn.net/inkfish)

Import Java.io.ioexception;import Org.apache.pig.pigserver;public class idmapreduce{public static void Main (string[] A RGS) {try {pigserver pigserver = new Pigserver ("MapReduce"); Runidquery (Pigserver, "passwd"); } catch (Exception e) {}}public static void Runidquery (Pigserver pigserver, String inputfile) throws IOException {p Igserver.registerquery ("A = Load '" + inputfile + "' Using Pigstorage (': ');"); Pigserver.registerquery ("B = foreach A generate $ as ID;"); Pigserver.store ("B", "idout"); }}

Two Java classes need to be compiled and compiled with commands:
    javac -cp .:/opt/hadoop/pig/pig-0.5.0-core.jar Idlocal.java
    javac -cp .:/opt/hadoop/pig/pig-0.5.0-core.jar Idmapreduce.java

If the Pig-0.5.0-core.jar is not in the current folder, indicate its full path. (Source: http://blog.csdn.net/inkfish)

1.Grunt Shell mode
The Grunt shell method starts with the Pig command First, the Pig command can be added to the "-X local" for local mode, or "-X MapReduce" represents the MapReduce mode, the default mapreduce mode.
    $ pig -x local
    $ pig
    $ pig -x mapreduce

Enter the command by line:
    grunt> A = load ‘passwd‘ using PigStorage(‘:‘);
    grunt> B = foreach A generate $0 as id;
    grunt> dump B;
    grunt> store B into ‘out‘;

, "Dump B" means that the result is displayed on the screen, "store B into '" means output to the out file/directory. In local mode, the out file is written to the current directory, and in MapReduce, the out directory needs to give an absolute path. (Source: http://blog.csdn.net/inkfish)

2.Pig Script Mode
In script mode, start with the Pig command, followed by the. Pig file you want to execute, such as:
    $ pig -x local id.pig
    $ pig id.pig
    $ pig -x mapreduce id.pig


(Source: Http://blog.csdn.net/inkfish)

3. Embedded mode (Source: http://blog.csdn.net/inkfish)

There is no difference between embedded mode and execution of ordinary Java class methods, such as:
    java -cp .:/opt/hadoop/pig/pig-0.5.0-core.jar Idmapreduce
    java -cp .:/opt/hadoop/pig/pig-0.5.0-core.jar Idlocal
(Source: Http://blog.csdn.net/inkfish)

[Hadoop Series] Pig Installation and Simple Demo sample

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.