For Hadoop developers, Java API programming is the first step in distributed development of Map-Reduce. Eclipse does not support the MapReduce programming mode, so some simple steps are required. 1. Install Hadoop. Hadoop in this article is a pseudo distribution mode deployed on virtual machines. The related software environment is as follows: JDK: sunjdk1.6.0 _ 30 Hadoop: hadoop-0.20.20
For Hadoop developers, java api programming is the first step in distributed development of Map-Reduce. Eclipse does not support the MapReduce programming mode, so some simple steps are required.
1. Install Hadoop.
Hadoop in this article is a pseudo distribution mode deployed on virtual machines. The related software environment is as follows:
JDK: sun jdk1.6.0 _ 30
Hadoop: hadoop-0.20.203.0
Linux: Ubuntu 11.10
Eclipse: Eclipse Indigo 3.7.1
The specific configuration method of the Hadoop pseudo-distribution environment is not described here. In terms of settings, hdfs: // localhost: 9000 is entered in fs. default. name in the core-site.xml
2. Set the environment
Enter
$ Sudo vim/etc/profile
Modify the running environment and add the following content at the end of the file:
Export HADOOP_HOME =/home/wangyucao/hadoop-0.20.203.0 (this is the hadoop installation directory)
Export PATH = $ PATH: # HADOOP_HOME/bin
3. Install Eclipse
Search for Eclipse in the Ubuntu Software Center for installation, or download the Eclipse archive file from the official website. Download from the official website:
Eclipse-jee-indigo-SR1-linux-gtk.tar.gz
Decompress the file and put it in the/usr directory:
$ Tar-zxvf eclipse-jee-indigo-SR1-linux-gtk.tar.gz
$ Sudo mv eclipse/usr/
Complete the installation steps.
4. Install the hadoop-eclipse plug-in
The hadoop release version contains the hadoop-eclipse-plugin plug-in. When developing a hadoop application, you must first install the Eclipse plug-in.
See online tutorials, generally said that the HADOOP_HOME/lib directly copy the hadoop-eclipse-plugin-0.20.203.0.jar to the eclipse installation directory plugins directory.However, in my practice, I found that the hadoop-0.20.203.0 version of this package if directly copied to the eclipse plug-in directory, when connecting to the DFS error will occur, the prompt message is: "error: failure to login. the pop-up error prompt box is "An internal error occurred during:" Connecting to DFS hadoop ". org/apache/commons/configuration/Configuration ". check the log of Eclipse and find that the jar package is missing. Further find the information, found that directly copy the hadoop-eclipse-plugin-0.20.203.0.jar, the lib directory of the package is missing jar package.
After collecting online information, the correct installation method is provided here:
First, modify the hadoop-eclipse-plugin-0.20.203.0.jar. Open the package with the archive manager and discover that there are only two packages, commons-cli-1.2.jar and hadoop-core.jar. Copy the commons-configuration-1.6.jar, commons-httpclient-3.0.1.jar, commons-lang-2.4.jar, jackson-core-asl-1.0.1.jar and jackson-mapper-asl-1.0.1.jar under the HADOOP_HOME/lib directory to the lib directory of the hadoop-eclipse-plugin-0.20.203.0.jar, such:
Then, modify MANIFEST. MF under the META-INF directory of the package and change classpath to the following:
Bundle-ClassPath: classes/, lib/hadoop-core.jar, lib/commons-cli-1.2.jar, lib/commons-httpclient-3.0.1.jar, lib/jackson-core-asl-1.0.1.jar, lib/jackson-mapper-asl-1.0.1.jar, lib/commons-configuration-1.6.jar, lib/commons-lang-2.4.jar
For example:
This completes the modification to the hadoop-eclipse-plugin-0.20.203.0.jar.
Finally, copy the hadoop-eclipse-plugin-0.20.203.0.jar to the Eclipse plugins directory:
$ Cd ~ /Hadoop-0.20.203.0/lib
$ Sudo cp hadoop-eclipse-plugin-0.20.203.0.jar/usr/eclipse/plugins/