Configuration hadoop1.2.1+eclipse (Juno version ) development environment, and run WordCount program
First, Requirements Section
Using the Eclipse IDE for Hadoop-related development on Ubuntu requires the installation of Hadoop's development plug-in on eclipse. The latest release of Hadoop contains the source code package to hadoop-1. X, for example, contains the source code for the relevant Eclipse plug-in, so you can compile an Eclipse plugin for Hadoop for your own eclipse version. The following is a detailed introduction to the build-and-install process for plug-ins and the process of configuring Hadoop development plug-ins on eclipse.
Second, Environment
- Vmware®workstation 10.04
- Ubuntu14.04 32-bit
- Java JDK 1.6.0
- Hadoop1.2.1
- Eclipse:juno Service Release 2 Version
Third, compiling hadoop1.2.1 with the Eclipse-juno Plugins
1) Install Ant
sudo apt-get install ant
2) Modify the compilation configuration file
In the Hadoop decompression directory, locate src\contrib\eclipse-plugin\build.xmland modify the following lines:
<path id= "Hadoop-core-jar" >
<fileset dir= "${hadoop.root}/" >
<include name= "Hadoop*.jar"/>
</fileset>
</path>
<!--Override classpath to include Eclipse SDK jars--
<path id= "Classpath" >
<pathelement location= "${build.classes}"/>
<pathelement location= "${hadoop.root}/build/classes"/>
<path refid= "Eclipse-sdk-jars"/>
<path refid= "Hadoop-core-jar"/>
</path>
......
<target name= "Jar" depends= "compile" unless= "Skip.contrib" >
<mkdir dir= "${build.dir}/lib"/>
<copy file= "${hadoop.root}/hadoop-core-${version}.jar" tofile= "${build.dir}/lib/hadoop-core.jar" verbose= "true "/>
<copy file= "${hadoop.root}/lib/commons-cli-1.2.jar" todir= "${build.dir}/lib" verbose= "true"/>
<copy file= "${hadoop.root}/lib/commons-lang-2.4.jar" todir= "${build.dir}/lib" verbose= "true"/>
<copy file= "${hadoop.root}/lib/commons-configuration-1.6.jar" todir= "${build.dir}/lib" verbose= "true"/>
<copy file= "${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar" todir= "${build.dir}/lib" verbose= "true"/>
<copy file= "${hadoop.root}/lib/jackson-core-asl-1.8.8.jar" todir= "${build.dir}/lib" verbose= "true"/>
<copy file= "${hadoop.root}/lib/commons-httpclient-3.0.1.jar" todir= "${build.dir}/lib" verbose= "true"/>
<jar
Jarfile= "${build.dir}/hadoop-${name}-${version}.jar"
Manifest= "${root}/meta-inf/manifest. MF ">
<fileset dir= "${build.dir}" includes= "classes/lib/"/>
<fileset dir= "${root}" includes= "Resources/plugin.xml"/>
</jar>
</target>
L Find src\contrib\build-contrib.xml, add the following lines:
<property name= "version" value= "1.2.1"/>
<property name= "ivy.version" value= "2.1.0"/>
<property name= "Eclipse.home" location= "..."/>
The path to eclipse is replaced by the Eclipse storage path on your host.
3) Then, open the command line, enter the directory \src\contrib\eclipse-plugin, enter the ant compilation, if all is well compiled.
Finally, you can find the compiled plugin under {hadoophome}\build\contrib\eclipse-plugin path.
4) Several points of note:
L must be in the network environment, if you need to set up the Internet agent, you can add the following lines in the Src\contrib\build-contrib.xml:
<target name= "proxy" >
<property name= "Proxy.host" value= ""/>
<property name= "Proxy.port" value= "/>"
<property name= "Proxy.user" value= ""/>
<property name= "Proxy.pass" value= ""/>
<setproxy proxyhost= "${proxy.host}" proxyport= "${proxy.port}"
Proxyuser= "${proxy.user}" proxypassword= "${proxy.pass}"/>
</target>
In the download task for Ivy-related files in the XML file, add the dependencies for the above agent tasks, configured as:
<target name= "Ivy-download" depends= "proxy" description= "To download Ivy" unless= "Offline" >
<get src= "${ivy_repo_url}" dest= "${ivy.jar}" usetimestamp= "true"/>
</target>
If there is a problem with the version mismatch of the Compile prompt class, make sure that your Java version is greater than 1.6.
Four, Configuration hadoop1.2.1 with the Eclipse Development Environment
Once you have the Hadoop1.2.1-eclipse development plugin (jar package), place it in the Eclipse/plugins directory and restart Eclipse. One thing to note here is that sometimes eclipse fails to load the plugin, and if it does, it starts with the Eclipse-clean command, and a blue elephant logo should appear in the upper right corner of Eclipse when it starts.
Five, Run WordCount program
After you start eclipse, File->new->project. If the Map/reduce project option appears, after selecting Next, enter project name to finish, and the plugin installation is successful. If the Map/reduce project option appears, but next prompts an error, it means that the plugin you are using is not available.
Configure the Hadoop directory in the window->preferences option below
Then start Hadoop, click on the yellow image below eclipse, and in the lower space right click on the New Hadoop location.
The Map/reduce master setting on the left of the host and port corresponds to the host and port under conf mapred-site.xml file settings under your Hadoop installation directory, and the DFS on the right Master corresponds to Core-site.xml. Set it up and then you can browse and manipulate HDFs in eclipse.
Below we try to run a wordcount algorithm.
Right-click on the SRC folder under Map/reduce project that you just built, New->class
Then copy the code inside the Wordcount.java in the Src/examples/org/apache/hadoop/examples in the Hadoop installation directory into the Wordcount.java in the project.
Note the first line. Save.
Under Ubuntu documents, create a new file input and enter the content:
My name is Sun Bin Bin,what is your name?
Then upload the input file to HDFs:
bin/hadoop fs-put/home/binbin/documents/input. Pay attention to the back.
The file is uploaded to HDFs, and the directory under Myhdoop in Eclipse can be seen
And then start running. Right-click on the established Wordcount.java, Run as->run configurations
Left Java application Right-click New
Arguments setting Parameters:
To ensure that the output directory does not exist in HDFs, an exception is thrown. Click Run on Hadoop.
At the end of the run, you can see the output in the DFS Locations/myhadoop on the left (to right-click to refresh), or at the terminal via the command line.
Because the wordcount algorithm only makes spacing symbols for spaces, there is a case where bin,what is counted as a word.
Reference:
Http://www.cnblogs.com/alex-blog/p/3160619.html
Http://blog.sina.com.cn/s/blog_7deb436e0101kh0d.html
(iii) Configuring the Hadoop1.2.1+eclipse (version Juno) development environment and running the WordCount program