Previously introduced me in Ubuntu under the combination of virtual machine Centos6.4 build hadoop2.7.2 cluster, in order to do mapreduce development, to use eclipse, and need the corresponding Hadoop plug-in Hadoop-eclipse-plugin-2.7.2.jar, first of all, before the hadoop1.x in the official Hadoop installation package is self-contained Eclipse plug-in, Now with the increase and divergence of the developer tools Eclipse version of the programmer, Hadoop plug-ins must be matched with development tools, and Hadoop's plug-in packages are not all compatible. To simplify, Today's Hadoop installation package does not contain eclipse plugins. You need to compile it yourself according to your own eclipse.
Make your own eclipse plugin with Ant and introduce my environment and tools
Ubuntu 14.04, (System not important win can also, methods are the same) IDE tools eclipse-jee-mars-2-linux-gtk-x86_64.tar.gz
Ant (This is also optional, binary installation or Apt-get installation can be, configure environment variables)
Export ant_home=/usr/local/ant/apache-ant-1.9.7
Export path= $PATH: $ANT _home/bin
If you are prompted to find an ant Launcher.ja package, add an environment variable
Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/jre/lib: $JAVA _home/lib/toos.jar: $ANT _home/lib/ Ant-launcher.jar
[Email protected]:~$ ant-versionapache Ant (TM) version 1.9.7 compiled on April 9 2016
The ant authoring Eclipse plugin requires access to the Ant Hadoop2x-eclipse-plugin plugin, which is the resource URL provided by GitHub
Https://github.com/winghc/hadoop2x-eclipse-plugin
Download it in zip format and unzip it to a suitable path. Note The permissions of the path and the directory owner are the current user
The paths to the three compilation tools and resources are as follows
[email protected]:~$ cd hadoop2x-eclipse-plugin-master
[email protected]:hadoop2x-eclipse-plugin-master$ pwd
/home/hadoop/hadoop2x-eclipse-plugin-master
[email protected]:hadoop2x-eclipse-plugin-master$ cd /opt/software/hadoop-2.7.2
[email protected]:hadoop-2.7.2$ pwd
/opt/software/hadoop-2.7.2
[email protected]:hadoop-2.7.2$ cd /home/hadoop/eclipse/
[email protected]:eclipse$ pwd
/home/hadoop/eclipse
According to the GitHub Instructions section: How to make, follow the actions of Ant
Unzip the downloaded Hadoop2x-eclipse-plugin and go to the directory hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/to perform the operation.
How to build
[[email protected] hadoop2x-eclipse-plugin]$ cd src/contrib/eclipse-plugin
# Assume hadoop installation directory is /usr/share/hadoop
[[email protected] eclipse-plugin]$ ant jar -Dversion=2.4.1 -Dhadoop.version=2.4.1 -Declipse.home=/opt/eclipse -Dhadoop.home=/usr/share/hadoop
final jar will be generated at directory
${hadoop2x-eclipse-plugin}/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.4.1.jar
But what I need at this point is 2.7.2 's Eclilpse plugin, and the Hadoop2x-eclipse-plugin configuration that GitHub has downloaded is hadoop2.6 's compilation environment, So you need to modify Ant's Build.xml configuration file and related files before executing ant
First file: Hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml
In line 83rd, find <target name= "jar" depends= "compile" unless= "Skip.contrib" > tags, add and modify copy sub-label label content
That's 127 lines below.
<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}-incubating.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar" todir="${build.dir}/lib" verbose="true"/>
Then find the label <attribute name= "Bundle-classpath" in the Align total value of the list corresponding to add and modify Lib, as follows
lib/servlet-api-${servlet-api.version}.jar,
lib/commons-io-${commons-io.version}.jar,
lib/htrace-core-${htrace.version}-incubating.jar"/>
Save exit. Note If you do not modify this, even if you compile the jar package and put it in Eclipse, the configuration link will be error-
However, just adding and modifying these lib is not possible, hadoop2.6 to hadoop2.7 in the jar version share/home/common/lib/is a lot of different, so also need to modify the corresponding jar version: It took me a half a day. A check mark changes.
Note that this version of the environment configuration file in the Hadoop2x-eclipse-plugin-master directory with the Ivy directory, also hihadoop2x-eclipse-plugin-master/ivy/ In Libraries.properties
The final modifications are shown below
In order to facilitate everyone, I copied over, #覆盖的就是原来的配置
hadoop.version=2.7.2
hadoop-gpl-compression.version=0.1.0
#These are the versions of our dependencies (in alphabetical order)
apacheant.version=1.7.0
ant-task.version=2.0.10
asm.version=3.2
aspectj.version=1.6.5
aspectj.version=1.6.11
checkstyle.version=4.2
commons-cli.version=1.2
commons-codec.version=1.4
# commons-collections.version=3.2.1
commons-collections.version=3.2.2
commons-configuration.version=1.6
commons-daemon.version=1.0.13
# commons-httpclient.version=3.0.1
commons-httpclient.version=3.1
commons-lang.version=2.6
# commons-logging.version=1.0.4
commons-logging.version=1.1.3
# commons-logging-api.version=1.0.4
commons-logging-api.version=1.1.3
# commons-math.version=2.1
commons-math.version=3.1.1
commons-el.version=1.0
commons-fileupload.version=1.2
# commons-io.version=2.1
commons-io.version=2.4
commons-net.version=3.1
core.version=3.1.1
coreplugin.version=1.3.2
# hsqldb.version=1.8.0.10
# htrace.version=3.0.4
hsqldb.version=2.0.0
htrace.version=3.1.0
ivy.version=2.1.0
jasper.version=5.5.12
jackson.version=1.9.13
#not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy
# but still declared here as we are going to have a local copy from the lib folder
jsp.version=2.1
jsp-api.version=5.5.12
jsp-api-2.1.version=6.1.14
jsp-2.1.version=6.1.14
# jets3t.version=0.6.1
jets3t.version=0.9.0
jetty.version=6.1.26
jetty-util.version=6.1.26
# jersey-core.version=1.8
# jersey-json.version=1.8
# jersey-server.version=1.8
jersey-core.version=1.9
jersey-json.version=1.9
jersey-server.version=1.9
# junit.version=4.5
junit.version=4.11
jdeb.version=0.8
jdiff.version=1.0.9
json.version=1.0
kfs.version=0.1
log4j.version=1.2.17
lucene-core.version=2.3.1
mockito-all.version=1.8.5
jsch.version=0.1.42
oro.version=2.0.8
rats-lib.version=0.5.1
servlet.version=4.0.6
servlet-api.version=2.5
# slf4j-api.version=1.7.5
# slf4j-log4j12.version=1.7.5
slf4j-api.version=1.7.10
slf4j-log4j12.version=1.7.10
wagon-http.version=1.0-beta-2
xmlenc.version=0.52
# xerces.version=1.4.4
xerces.version=2.9.1
protobuf.version=2.5.0
guava.version=11.0.2
netty.version=3.6.2.Final
After the modification is finished, the work is done and the ant is started
Enter src/contrib/eclipse-plugin/to execute the ant command, as follows
[email protected]:hadoop2x-eclipse-plugin-master$ cd src/contrib/eclipse-plugin/
[email protected]:eclipse-plugin$ ls
build.properties build.xml.bak ivy.xml META-INF resources
build.xml ivy makePlus.sh plugin.xml src
[email protected]:eclipse-plugin$ ant jar -Dhadoop.version=2.7.2 -Declipse.home=/home/hadoop/eclipse -Dhadoop.home=/opt/software/hadoop-2.7.2
This process will be slow for the first time, and then soon.
When the final display is shown below, it means that ant production is successful
compile:
[echo] contrib: eclipse-plugin
[javac] /home/hadoop/hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml:76: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
jar:
[jar] Building jar: /home/hadoop/hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.7.2.jar
BUILD SUCCESSFUL
Total time: 4 seconds
[email protected]:eclipse-plugin$
Then put the plugin you made into the Eclipse directory plugins
Then restart Eclipse or the shell command line to refresh Eclipse as follows, as well as display the eclipse's running process in the shell, and find out why in time after the error
[email protected]:eclipse-plugin$ cp /home/hadoop/hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.7.2.jar /home/hadoop/eclipse/plugins/
[email protected]:eclipse-plugin$ /home/hadoop/eclipse/eclipse -clean
Choose your own workspace, enter Eclipse, click on Windows Select Preferences, in the list can find out a Hadoop map/reduce, select an installation directory
A distributed file system appears in Eclipse's Project Explorer, click Windows-->show View and select MapReduce Tools
Open the Mr Locations window, there is a friendly elephant icon, then choose to add a M/R configuration, and configure the following
Of course, the location name is filled in here, and then the master of Map/reduce is here to be configured with your own Hadoop cluster or for distributed Core-site.xml and Mapred-sitexml file one by one, Incorrect configuration will show link failure
My configuration is as follows, so host is the Hadoop (Master node name), this can also write their own configuration master node IP address, port number is 9000 (file system host port number) and 9001 (MapReduce management node Joptracker host port number)
Then start the Hadoop cluster, test it briefly in the shell and then pass the file transfer test through the DFS locations of Eclipse, as well as API programming tests using FileSystem interface programming and MapReduce, Here is just to verify that this plugin is available, HDFS test it yourself, very simple, here to test a Mr Program. Phone statistics, the format is as follows, the left is to make a call, the right is to be called, the number of calls to statistics ranked, and show callers
11500001211 10086
11500001212 10010
15500001213 110
15500001214 120
11500001211 10010
11500001212 10010
15500001213 10086
15500001214 110
The code section is as follows
package hdfs;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MR extends Configured implements Tool {
enum Counter{
LINESKIP,
}
public static class WCMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String line = value.toString();
try {
String[] lineSplit = line.split(" ");
String anum = lineSplit[0];
String bnum = lineSplit[1];
context.write(new Text(bnum), new Text(anum));
} catch (Exception e) {
context.getCounter(Counter.LINESKIP).increment(1);//出错计数器+1
return;
}
}
}
public static class IntSumReduce extends Reducer<Text, Text, Text, Text> {
@Override
protected void reduce(Text key, Iterable<Text> values,Context context)
throws IOException, InterruptedException {
String valueString;
String out="";
for(Text value: values){
valueString = value.toString();
out+=valueString+"|";
}
context.write(key, new Text(out));
}
}
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] strs = new GenericOptionsParser(conf, args).getRemainingArgs();
Job job = parseInputAndOutput(this, conf, args);
job.setJarByClass(MR.class);
FileInputFormat.addInputPath(job, new Path(strs[0]));
FileOutputFormat.setOutputPath(job, new Path(strs[1]));
job.setMapperClass(WCMapper.class);
job.setInputFormatClass(TextInputFormat.class);
//job.setCombinerClass(IntSumReduce.class);
job.setReducerClass(IntSumReduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public Job parseInputAndOutput(Tool tool, Configuration conf, String[] args) throws Exception {
// validate
if (args.length != 2) {
System.err.printf("Usage: %s [generic options] <input> <output> \n");
return null;
}
// step 2:create job
Job job = Job.getInstance(conf, tool.getClass().getSimpleName());
return job;
}
public static void main(String[] args) throws Exception {
// run map reduce
int status = ToolRunner.run(new MR(), args);
// step 5 exit
System.exit(status);
}
}
Upload the file structure as follows
[email protected]:~$ hdfs dfs -mkdir -p /user/hadoop/mr/wc/input
[email protected]:~$ hdfs dfs -put top.data /user/hadoop/mr/wc/input
Run the MR Program in Eclipse
Execution succeeds, output execution steps in Eclipse console, view execution results
Description plugin without any problems
Hadoop 2.7.2 (hadoop2.x) uses Ant to make Eclipse plugins Hadoop-eclipse-plugin-2.7.2.jar