之前介紹了我在Ubuntu下組合虛擬機器Centos6.4搭建hadoop2.7.2叢集,為了做mapreduce開發,要使用eclipse,並且需要對應的hadoop外掛程式hadoop-eclipse-plugin-2.7.2.jar,首先說明一下,在hadoop1.x之前官方hadoop安裝包中都內建有eclipse的外掛程式,而如今隨著程式員的開發工具eclipse版本的增多和差異,hadoop外掛程式也必須要和開發工具匹配,hadoop的外掛程式包也不可能全部相容.為了簡化,如今的hadoop安裝包內不會含有eclipse的外掛程式.需要各自根據自己的eclipse自行編譯.
使用ant製作自己的eclipse外掛程式,介紹一下我的環境和工具
Ubuntu 14.04,(系統不重要Win也可以,方法都一樣)ide工具eclipse-jee-mars-2-linux-gtk-x86_64.tar.gz
ant(這個也隨意,二進位安裝或者apt-get安裝都可以,配置好環境變數)
export ANT_HOME=/usr/local/ant/apache-ant-1.9.7
export PATH=$PATH:$ANT_HOME/bin
如果提示找不到ant的launcher.ja包,添加環境變數
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/toos.jar:$ANT_HOME/lib/ant-launcher.jar
hadoop@hadoop:~$ ant -versionApache Ant(TM) version 1.9.7 compiled on April 9 2016
ant製作eclipse外掛程式需要擷取ant的hadoop2x-eclipse-plugin外掛程式,下面是github提供的資源網址
https://github.com/winghc/hadoop2x-eclipse-plugin
以zip格式下載,然後解壓到一個合適的路徑下.注意路徑的許可權和目錄所有者是目前使用者下的
三個編譯工具和資源的路徑如下
hadoop@hadoop:~$ cd hadoop2x-eclipse-plugin-masterhadoop@hadoop:hadoop2x-eclipse-plugin-master$ pwd/home/hadoop/hadoop2x-eclipse-plugin-masterhadoop@hadoop:hadoop2x-eclipse-plugin-master$ cd /opt/software/hadoop-2.7.2hadoop@hadoop:hadoop-2.7.2$ pwd/opt/software/hadoop-2.7.2hadoop@hadoop:hadoop-2.7.2$ cd /home/hadoop/eclipse/hadoop@hadoop:eclipse$ pwd/home/hadoop/eclipse
根據github說明部分:如何製作,按照操作進行ant
解壓下載過來的hadoop2x-eclipse-plugin,進入其中目錄hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/執行操作
How to build[hdpusr@demo hadoop2x-eclipse-plugin]$ cd src/contrib/eclipse-plugin# Assume hadoop installation directory is /usr/share/hadoop[hdpusr@apclt eclipse-plugin]$ ant jar -Dversion=2.4.1 -Dhadoop.version=2.4.1 -Declipse.home=/opt/eclipse -Dhadoop.home=/usr/share/hadoopfinal jar will be generated at directory${hadoop2x-eclipse-plugin}/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.4.1.jar
但是此時我需要的是2.7.2的eclilpse外掛程式,而github下載過來的hadoop2x-eclipse-plugin配置是hadoop2.6的編譯環境,所以執行ant之前需要需要修改ant的build.xml設定檔以及相關檔案
第一個檔案: hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml
在第83行 找到 <target name="jar" depends="compile" unless="skip.contrib">標籤,添加和修改copy子標籤標籤一下內容
也就是127行下面
<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}-incubating.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar" todir="${build.dir}/lib" verbose="true"/>
然後找到標籤<attribute name="Bundle-ClassPath"在齊總的value的列表中對應的添加和修改lib,如下
lib/servlet-api-${servlet-api.version}.jar, lib/commons-io-${commons-io.version}.jar, lib/htrace-core-${htrace.version}-incubating.jar"/>
儲存退出.注意如果不修改這個,即便你編譯完成jar包,放到eclipse中,配置連結會報錯的
但是只是添加和修改這些lib是不行的,hadoop2.6到hadoop2.7中share/home/common/lib/下的jar版本都是有很多不同的,因此還需要修改相應的jar版本..這個耗費了我半天的時間啊.一個個的對號修改.
注意這個版本的環境設定檔在hadoop2x-eclipse-plugin-master跟目錄的ivy目錄下,也就hihadoop2x-eclipse-plugin-master/ivy/libraries.properties中
最終修改如下所示
為了方便大家,我複製過來,#覆蓋的就是原來的配置
hadoop.version=2.7.2hadoop-gpl-compression.version=0.1.0#These are the versions of our dependencies (in alphabetical order)apacheant.version=1.7.0ant-task.version=2.0.10asm.version=3.2aspectj.version=1.6.5aspectj.version=1.6.11checkstyle.version=4.2commons-cli.version=1.2commons-codec.version=1.4# commons-collections.version=3.2.1commons-collections.version=3.2.2commons-configuration.version=1.6commons-daemon.version=1.0.13# commons-httpclient.version=3.0.1commons-httpclient.version=3.1commons-lang.version=2.6# commons-logging.version=1.0.4commons-logging.version=1.1.3# commons-logging-api.version=1.0.4commons-logging-api.version=1.1.3# commons-math.version=2.1commons-math.version=3.1.1commons-el.version=1.0commons-fileupload.version=1.2# commons-io.version=2.1commons-io.version=2.4commons-net.version=3.1core.version=3.1.1coreplugin.version=1.3.2# hsqldb.version=1.8.0.10# htrace.version=3.0.4hsqldb.version=2.0.0htrace.version=3.1.0ivy.version=2.1.0jasper.version=5.5.12jackson.version=1.9.13#not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy# but still declared here as we are going to have a local copy from the lib folderjsp.version=2.1jsp-api.version=5.5.12jsp-api-2.1.version=6.1.14jsp-2.1.version=6.1.14# jets3t.version=0.6.1jets3t.version=0.9.0jetty.version=6.1.26jetty-util.version=6.1.26# jersey-core.version=1.8# jersey-json.version=1.8# jersey-server.version=1.8jersey-core.version=1.9jersey-json.version=1.9jersey-server.version=1.9# junit.version=4.5junit.version=4.11jdeb.version=0.8jdiff.version=1.0.9json.version=1.0kfs.version=0.1log4j.version=1.2.17lucene-core.version=2.3.1mockito-all.version=1.8.5jsch.version=0.1.42oro.version=2.0.8rats-lib.version=0.5.1servlet.version=4.0.6servlet-api.version=2.5# slf4j-api.version=1.7.5# slf4j-log4j12.version=1.7.5slf4j-api.version=1.7.10slf4j-log4j12.version=1.7.10wagon-http.version=1.0-beta-2xmlenc.version=0.52# xerces.version=1.4.4xerces.version=2.9.1protobuf.version=2.5.0guava.version=11.0.2netty.version=3.6.2.Final
修改完成後,大工搞成,開始ant
進入src/contrib/eclipse-plugin/執行ant命令,如下
hadoop@hadoop:hadoop2x-eclipse-plugin-master$ cd src/contrib/eclipse-plugin/hadoop@hadoop:eclipse-plugin$ lsbuild.properties build.xml.bak ivy.xml META-INF resourcesbuild.xml ivy makePlus.sh plugin.xml srchadoop@hadoop:eclipse-plugin$ ant jar -Dhadoop.version=2.7.2 -Declipse.home=/home/hadoop/eclipse -Dhadoop.home=/opt/software/hadoop-2.7.2
這個過程第一次會慢點,後來就會很快
當最終顯示如下,就表示ant製作成功
compile: [echo] contrib: eclipse-plugin [javac] /home/hadoop/hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml:76: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable buildsjar: [jar] Building jar: /home/hadoop/hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.7.2.jarBUILD SUCCESSFULTotal time: 4 secondshadoop@hadoop:eclipse-plugin$
然後將自己製作的外掛程式放入到eclipse目錄的plugins下
然後重啟eclipse或者shell命令列重新整理eclipse如下,同時也可以在shell中顯示eclipse的運行過程,以及出錯後及時發現原因
hadoop@hadoop:eclipse-plugin$ cp /home/hadoop/hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.7.2.jar /home/hadoop/eclipse/plugins/hadoop@hadoop:eclipse-plugin$ /home/hadoop/eclipse/eclipse -clean
選擇自己的workspace,進入eclipse,點擊windows選擇preferences後,在列表中可以發現多出來一個Hadoop Map/Reduce,選擇一個安裝目錄
此時在eclipse的Project Explorer中出現了一個Distributed File System,點擊Windows-->show View,選擇MapReduce Tools
開啟MR Locations視窗,出現了親切的大象表徵圖,然後選擇添加一個M/R配置,並且配置如下
當然這裡的Location name隨便填,然後是Map/Reduce的Master這裡要和你自己配置的hadoop叢集或者為分布式的core-site.xml和mapred-sitexml檔案一一對應,配置錯了就會顯示連結失敗
我的配置如下,所以Host為hadoop(主節點名稱),這個也可以寫自己的配置主節點的ip地址,連接埠號碼分別是9000(檔案系統的主機連接埠號碼)和9001(Mapreduce的管理節點joptracker主機連接埠號碼)
然後啟動hadoop叢集,在shell中簡單測試一下然後通過eclipse的DFS Locations進行檔案傳輸測試,以及使用FileSystem介面編程和MapReduce的API編程測試,這裡只是為了驗證這個外掛程式是否可用,hdfs自己測試一下,很簡單,這裡測試一個mr程式.電話統計,格式如下,左邊是撥打到電話,右面是被打電話,統計被打的電話次數排名,並顯示撥打者
11500001211 1008611500001212 1001015500001213 11015500001214 12011500001211 1001011500001212 1001015500001213 1008615500001214 110
代碼部分如下
package hdfs;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;public class MR extends Configured implements Tool {enum Counter{LINESKIP,}public static class WCMapper extends Mapper<LongWritable, Text, Text, Text> {@Overrideprotected void map(LongWritable key, Text value,Context context)throws IOException, InterruptedException {String line = value.toString();try {String[] lineSplit = line.split(" ");String anum = lineSplit[0];String bnum = lineSplit[1];context.write(new Text(bnum), new Text(anum));} catch (Exception e) {context.getCounter(Counter.LINESKIP).increment(1);//出錯計數器+1return;}}}public static class IntSumReduce extends Reducer<Text, Text, Text, Text> {@Overrideprotected void reduce(Text key, Iterable<Text> values,Context context)throws IOException, InterruptedException {String valueString;String out="";for(Text value: values){valueString = value.toString();out+=valueString+"|";}context.write(key, new Text(out));}}public int run(String[] args) throws Exception {Configuration conf = new Configuration();String[] strs = new GenericOptionsParser(conf, args).getRemainingArgs();Job job = parseInputAndOutput(this, conf, args);job.setJarByClass(MR.class);FileInputFormat.addInputPath(job, new Path(strs[0]));FileOutputFormat.setOutputPath(job, new Path(strs[1]));job.setMapperClass(WCMapper.class);job.setInputFormatClass(TextInputFormat.class);//job.setCombinerClass(IntSumReduce.class);job.setReducerClass(IntSumReduce.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);return job.waitForCompletion(true) ? 0 : 1;}public Job parseInputAndOutput(Tool tool, Configuration conf, String[] args) throws Exception {// validateif (args.length != 2) {System.err.printf("Usage: %s [generic options] <input> <output> \n");return null;}// step 2:create jobJob job = Job.getInstance(conf, tool.getClass().getSimpleName());return job;}public static void main(String[] args) throws Exception {// run map reduceint status = ToolRunner.run(new MR(), args);// step 5 exitSystem.exit(status);}}
上傳檔案結構如下
hadoop@hadoop:~$ hdfs dfs -mkdir -p /user/hadoop/mr/wc/inputhadoop@hadoop:~$ hdfs dfs -put top.data /user/hadoop/mr/wc/input
在eclipse中進行運行mr程式
執行成功,在eclipse控制台輸出執行步驟,查看執行結果
說明外掛程式沒有任何問題