The version used here is the pig-0.12.0-cdh5.1.2 of the cdh release. Click here
1. Pig introduction:
Pig is a project donated by Yahoo to Apache. It is an SQL-like language and an advanced query language built on mapreduce, compile some operations into map and reduce of the mapreduce model, and you can define your own functions. This is another clone Google project developed by Yahoo: sawzall.
Pig is a client application. Even if you want to run pig on a hadoop cluster, you do not need to install additional things on the cluster.
2. Installation
Decompress the downloaded pig to the specified directory. I will decompress it to the user's hadoop directory.
<span style="font-size:18px;">[email protected]:~/pig/conf$ tar -xzvf ~/Downloads/pig-0.12.0-cdh5.1.2.tar.gz -C ~/</span>
For configuration convenience, here we will establish a soft link to pig
<span style="font-size:18px;">[email protected]:~/pig/conf$ ln -s pig-0.12.0-cdh5.1.2/ pig</span>
3. environment variable configuration
By editing the/etc/. profile file or under the user directory ~ /. Profile file. Here I will edit the configuration file under the hadoop user directory to configure
<span style="font-size:18px;">export PIG_HOME=/home/hadoop/pigexport PIG_CLASSPATH=${HADOOP_HOME}/etc/hadoopexport PATH=$PATH:$PIG_HOME/bin</span>
Pig_classpath specifies the path of the hadoop configuration file. do not configure it in local mode. If you need to access hadoop, you must configure
Use source ~ /. Profile to make the configuration take effect
4. Run local
<span style="font-size:18px;">[email protected]:~/pig/conf$ pig -x local2014-10-13 19:17:34,862 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.1.2 (rexported) compiled Aug 25 2014, 19:51:482014-10-13 19:17:34,863 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig-0.12.0-cdh5.1.2/conf/pig_1413199054861.log2014-10-13 19:17:34,905 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found2014-10-13 19:17:35,204 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS2014-10-13 19:17:35,205 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address2014-10-13 19:17:35,206 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]2014-10-13 19:17:35,732 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2014-10-13 19:17:35,918 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum2014-10-13 19:17:35,922 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFSgrunt> </span>
The grunt prompt indicates that the instance is successfully started.
5. hadoop running
To start a hadoop cluster, pig automatically identifies the hadoop Cluster Based on the configuration file in the pig_classpath path.
<span style="font-size:18px;">grunt> [email protected]:~/pig/conf$ pig 2014-10-13 19:18:36,511 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.1.2 (rexported) compiled Aug 25 2014, 19:51:482014-10-13 19:18:36,511 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig-0.12.0-cdh5.1.2/conf/pig_1413199116510.log2014-10-13 19:18:36,541 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found2014-10-13 19:18:36,849 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address2014-10-13 19:18:36,849 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS2014-10-13 19:18:36,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://192.168.118.168:9100SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]2014-10-13 19:18:37,071 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2014-10-13 19:18:38,379 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFSgrunt> </span>
Now the installation is complete. The installation is simple but the function is not simple. It will be expanded step by step later. You can use pig to create an index for HDFS data and push it to the elasticsearch cluster. Coming soon ~
Installation and Operation of pig-0.12.0-cdh5.1.2, one of the pig Series