Use Eclipse to build hadoop2.4 operating environment under Ubuntu system

Source: Internet
Author: User
Tags hadoop fs

When using Hadoop for MapReduce programming, we all want to use the IDE for development, and this article focuses on how to use Eclipse for Hadoop programming.

If your cluster is not ready, you can refer to my previous article using hadoop2.4 to build a cluster (pseudo-distributed)

First, install Eclipse

Method one: Download directly in the Ubuntu Software Center, as shown in.


Method Two: After downloading Eclispe compressed file, use the command to install,:http://pan.baidu.com/s/1mgiHFok

      sudo tar-zxvf eclipse-dsl-juno-sr1-linux-gtk.tar.gz

So eclipse is installed.


Second, installation Hadoop-eclipse-plugin

Download Hadoop2x-eclipse-plugin, will release in the Hadoop-eclipse-kepler-plugin-2.2.0.jar (although labeled 2.2.0, but under the 2.4.1 is not a problem, should be in the 2.x version Yes, but sometimes it is suggested that some things expire, for the learner, I personally feel that I can temporarily do not consider these details) copied to the Eclipse installation directory in the plugin folder, run restart ./eclipse -clean eclipse. The default installation directory for Eclipse using method one is:/usr/lib/eclipse: Install Eclipse using method Two, the directory depends on itself, my directory is/usr/local/eclipse. The ./eclipse -clea n command needs to be in the installation directory of Eclipse. After opening, you can see the file system.


The plugin requires further configuration.

First step: Select Preference under the Window menu, then pop up a form, the left side of the form will have the Hadoop map/reduce option, click this option, select the Hadoop installation directory (such as/usr/local/hadoop).


The second step: switch map/reduce working directory, choose Open Perspective under the Window menu, pop up a form, select the Map/reduce option to switch.


In the third step, click on the Map/reduce Location tab and click on the icon on the right to open the Hadoop location Configuration window:


Enter location name, any name. Configure Map/reduce Master and DFS Mastrer,host and port to be configured to match core-site.xml settings.



Click the "Finish" button to close the window.

Click on the left Dfslocations->mapreduceproject (location name in the previous step), if you can see user, the installation is successful.

If this prompts this error Error:call from mylinux/127.0.1.1 to localhost:9090 failed on connection exception java. Connection.net.ConnectException refused to connect.

First make sure that Hadoop has no boot . I was also because I did not start Hadoop, and then a long time to find out the problem, I hope to help you.

For some reasons, you can refer to one of my blogs. Use Eclispe to connect to Hadoop when using a link solution summary

Third, the new WordCount example

file- > Project , select Map/reduce Project, enter the name WordCount, and so on.

Create a new class in the WordCount project named WordCount with the following code:

Import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.mapper;import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser; public class WordCount {public static class Tokenizermapper extends Mapper<object, text, text, intwritable>{Priva  Te final static intwritable one = new intwritable (1);  Private text Word = new text ();  public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {stringtokenizer ITR      = New StringTokenizer (value.tostring ());        while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());     Context.write (Word, one); }}}public Static Class Intsumreducer extends Reducer<text,intwritable,text,intwritable> {private intwritable result = new I   Ntwritable (); public void reduce (Text key, iterable<intwritable> Values,context Context) throws IOException,    interruptedexception {int sum = 0;    for (intwritable val:values) {sum + = Val.get ();    } result.set (sum);  Context.write (key, result);  }}public static void Main (string[] args) throws Exception {configuration conf = new Configuration ();  string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();    if (otherargs.length! = 2) {System.err.println ("Usage:wordcount <in> <out>");  System.exit (2);  Job Job = new Job (conf, "word count");  Job.setjarbyclass (Wordcount.class);  Job.setmapperclass (Tokenizermapper.class);  Job.setcombinerclass (Intsumreducer.class);  Job.setreducerclass (Intsumreducer.class);  Job.setoutputkeyclass (Text.class);  Job.setoutputvalueclass (Intwritable.class); Fileinputformat.Addinputpath (Job, New Path (Otherargs[0]));  Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1])); System.exit (Job.waitforcompletion (true)? 0:1);}}

1. Create a directory on HDFs input

    Hadoop Fs-mkdir Input  

This is created using commands, and we can right-click Hadoop in Eclipse (depending on the individual configuration this will be different) to create


2. Copy the local README.txt to the input in HDFs

Hadoop fs-copyfromlocal/usr/local/hadoop/readme.txt Input

Similarly, we can right-click input and select Upload file to upload files using visual format.

3, click Wordcount.java, right click on Run As->run configurations, configure the run parameters, namely the input and output folder

Of course, we can also write the path directly in the code, really understand the file system you will find a lot of methods, just need to modify the Java code.

The following configuration is corresponding to the code inside this

string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();

Hdfs://localhost:9000/user/hadoop/input Hdfs://localhost:9000/user/hadoop/output


4, after the completion of the operation, view the results of the operation

The first approach is to use the command line directly in the terminal to view it.

  Hadoop fs-ls Output

you can see that there are two output results, _success and part-r-00000

Execution

Hadoop Fs-cat output/*

The second way is to view it directly in Eclipse. First remember to refresh the file system ~

Expand Dfs Locations, as shown, double-click Open part-r00000 View Results




Use Eclipse to build hadoop2.4 operating environment under Ubuntu system

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.