Pig's initial research: Delano's preliminary research
Pig environment Installation
Pig's installation is very simple. Decompress pig-0.14.0.tar.gz to the appropriate directory.
Tar-zxvf pig-0.14.0.tar.gz
Modify environment variables:
# Pig export PIG_HOME =/usr/local/cloud/pig-0.11.1/pig-0.11.1export PATH =.: $ PIG_HOME/bin: $ PATHexport PIG_CLASSPATH = $ HADOOP_HOME/etc/hadoop # To enable pig to identify your hadoop version, you do not need to configure this option if you only want to use pig's local mode.
Make environment variable changes take effect:
Source/etc/profile
Local Mode pig-x local
[root@leaf pig-0.11.1]# pig -x local2014-11-24 07:50:19,622 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:532014-11-24 07:50:19,622 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/cloud/pig-0.11.1/pig-0.11.1/logs/pig_1416844219621.log2014-11-24 07:50:19,663 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found2014-11-24 07:50:19,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///2014-11-24 07:50:19,903 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used2014-11-24 07:50:19,907 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS2014-11-24 07:50:19,907 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address2014-11-24 07:50:20,188 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum2014-11-24 07:50:20,190 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
Modify the default log storage directory of pig
In the pig-0.14.0/conf/pig. properties directory after pig decompression, change it to the appropriate directory:
pig.logfile=/usr/local/cloud/pig-0.11.1/pig-0.11.1/logs
Pig mapreduce startup mode (hadoop must be started ):
[root@leaf pig-0.11.1]# pig2014-11-24 07:57:16,370 [main] INFO org.apache.pig.Main - Apache Pig version <span style="font-family: Arial, Helvetica, sans-serif;">0.11.1</span><span style="font-family: Arial, Helvetica, sans-serif;">(r1459641) compiled Mar 22 2013, 02:13:53</span>2014-11-24 07:57:16,370 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/cloud/pig-0.11.1/pig-0.11.1/logs/pig_1416844636369.log2014-11-24 07:57:16,410 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found2014-11-24 07:57:16,681 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://192.168.1.240:90002014-11-24 07:57:16,684 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used2014-11-24 07:57:16,685 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS2014-11-24 07:57:17,634 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
Test whether the installation is successful:
grunt> ls hdfs:///hdfs://192.168.1.240:9000/source<dir>hdfs://192.168.1.240:9000/testdata<dir>hdfs://192.168.1.240:9000/tmp<dir>hdfs://192.168.1.240:9000/user<dir>hdfs://192.168.1.240:9000/usr<dir>
Pig Common commands:
grunt> cd /user/root/output
grunt> lshdfs://192.168.1.240:9000/user/root/output/_policy<r 1>194hdfs://192.168.1.240:9000/user/root/output/clusteredPoints<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-0<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-1<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-10-final<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-2<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-3<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-4<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-5<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-6<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-7<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-8<dir>hdfs://192.168.1.240:9000/user/root/output/clusters-9<dir>hdfs://192.168.1.240:9000/user/root/output/data<dir>hdfs://192.168.1.240:9000/user/root/output/random-seeds<dir>
grunt> pig -help2014-11-24 08:04:11,969 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <IDENTIFIER> "pig "" at line 1, column 1.Was expecting one of: <EOF> "cat" ... "clear" ... "fs" ... "sh" ... "cd" ... "cp" ... "copyFromLocal" ... "copyToLocal" ... "dump" ... "describe" ... "aliases" ... "explain" ... "help" ... "history" ... "kill" ... "ls" ... "mv" ... "mkdir" ... "pwd" ... "quit" ... "register" ... "rm" ... "rmf" ... "set" ... "illustrate" ... "run" ... "exec" ... "scriptDone" ... "" ... "" ... <EOL> ... ";" ...