Run the MapReduce program using Eclipse compilation Hadoop2.6.0_ubuntu/centos

Source: Internet
Author: User
Tags hdfs dfs log4j

Article source: http://www.powerxing.com/hadoop-build-project-using-eclipse/running a mapreduce program using Eclipse compilation hadoop2.6.0_ubuntu/ Centos

This tutorial shows you how to use Eclipse in Ubuntu/centos to develop a MapReduce program that is validated under Hadoop 2.6.0. Although we can run our own MapReduce program using the command-line compilation package, it is not convenient to write code. With Eclipse, we can directly manipulate the files in HDFS, run the code directly, and save a lot of tedious commands. This tutorial is produced by Xiamen University Database Laboratory, reproduced please specify.

Environment

This tutorial is validated under Hadoop 2.6.0 and is suitable for ubuntu/centos systems and can theoretically be used in any native Hadoop 2 version, such as Hadoop 2.4.1,hadoop 2.7.1.

This tutorial mainly tests the environment:

    • Ubuntu 14.04
    • Hadoop 2.6.0 (pseudo-distributed)
    • Eclipse 3.8

In addition, this textbook has been validated in the CentOS 6.4 system and has been annotated with different configurations of Ubuntu and CentOS.

Install Eclipse

There are different ways to install Eclipse in Ubuntu and CentOS, but then the configuration and usage are the same.

Install Eclipse in Ubuntu, search for installation directly from Ubuntu Software Center, and in the left-hand taskbar on the desktop, click "Ubuntu Software Center".

Ubuntu Software Center

Search for Eclipse in the search bar in the upper-right corner, click Eclipse in the search results, and click Install.

Install Eclipse

Wait for the installation to complete, and the default installation directory for Eclipse is:/usr/lib/eclipse.

Installing eclipse in CentOS requires downloading the installer and we choose Eclipse IDE for Java version developers:

    • 32-bit: http://eclipse.bluemix.net/packages/mars.1/?JAVA-LINUX32
    • 64-bit: http://eclipse.bluemix.net/packages/mars.1/?JAVA-LINUX64

After downloading, execute the following command to install Eclipse into the/usr/lib directory:

shell Command
sudo tar-zxf ~/download/eclipse-java-mars-1-linux-gtk*.tar.gz-c/usr/lib

Can be used after decompression. In CentOS, you can create a desktop shortcut for your program, as shown in the right-click Desktop, select Create Launcher, fill in the name and program location (/usr/lib/eclipse/eclipse):

Install Eclipse

Installing Hadoop-eclipse-plugin

To compile and run the MapReduce program on Eclipse, you need to install Hadoop-eclipse-plugin to download the Hadoop2x-eclipse-plugin on Github (alternate:/HTTP// PAN.BAIDU.COM/S/1I4IKIOP).

After downloading, copy the Hadoop-eclipse-kepler-plugin-2.6.0.jar in release (also available in 2.2.0 and version 2.4.1) to the plugins folder of the Eclipse installation directory to run the restart eclipse -clean Eclipse (add plug-ins only need to run the command once and then start as normal).

shell Command
# unzip to ~/download unzip-qo ~/download/hadoop2x-eclipse-plugin-master.zip-d ~/#  Copy to the Eclipse installation directory plug INS directory #  Add plug-ins in this way to make the plug-in effective
Configure Hadoop-eclipse-plugin

Make sure that Hadoop is turned on before you continue with the configuration.

After you start Eclipse, you can see DFS Locations in Project Explorer on the left (if you see the Welcome interface, click the x close in the upper left corner to see it.) CentOS needs to switch perspective to see the second step of the next configuration step).

After installing the Hadoop-eclipse-plugin plug-in effect

The plugin requires further configuration.

First step: Select Preference under the Window menu.

Open preference

A form will pop up with the Hadoop map/reduce option on the left side of the form, click this option to select the installation directory for Hadoop (for example,/usr/local/hadoop,ubuntu is not a good choice for the directory, just enter the line).

Select the installation directory for Hadoop

The second step: switch map/reduce development View, select Open Perspective and other under the Window menu (CentOS is window, perspective, open perspecti ve and other), pop up a form and choose the map/reduce option to switch.

Toggle Map/reduce Development View

Step three: Establish a connection to the Hadoop cluster, click the Map/reduce Locations panel in the lower right corner of the Eclipse software, right-click in the panel and select New Hadoop location.

Establishing a connection to a Hadoop cluster

In the pop-up General Options panel, general's settings are consistent with the configuration of Hadoop. Generally two Host value is the same, if it is pseudo-distributed, fill localhost can, in addition I use the Hadoop pseudo-distributed configuration, set Fs.defaultfs to hdfs://localhost:9000, DFS Master Port will be changed to 9000. Map/reduce (V2) Master Port is available by default, location Name is optional.

The final settings are as follows:

Settings for Hadoop location

The Advanced Parameters Options Panel is configured for Hadoop parameters, which is essentially a configuration entry for Hadoop (configuration file in/usr/local/hadoop/etc/hadoop), as I configured the Hadoop.tmp.dir , the corresponding modification is necessary. But the changes will be cumbersome, and we can do it by copying the configuration file (as we'll say below).

In short, we just configure the general on the line, click Finish,map/reduce location is created.

Working with files in HDFS in Eclipse

Once configured, click on the MapReduce location in the Project Explorer on the left (click the triangle to expand) to see the list of files in HDFs directly (in HDFs with the file as WordCount output), double-click to view the content, right-click You can upload, download, and delete files in HDFS without having to work with cumbersome hdfs dfs -ls commands.

Use Eclipse to view file contents in HDFs

If you can't see it, you can right-click on location to try reconnect or restart Eclipse.

Tips

After the content changes in HDFS, Eclipse does not synchronize refreshes, you need to right click on the MapReduce location in Project Explorer and select Refresh to see the changed files.

Create a MapReduce project in Eclipse

Click the File menu and choose New Project ...:

Create Project

Select Map/reduce Project and click Next.

Create a MapReduce project

Fill out project name as WordCount, and click Finish to create the item.

Fill in the project name

At this point, Project Explorer on the left will be able to see the projects you have just created.

Project Creation Complete

Then right-click on the WordCount project you just created and choose New Class

New class

Two places to fill in: Fill in the Org.apache.hadoop.examples at the package and fill in the WordCount in Name.

Fill in the class information

Once you have created the Class, you will see the Wordcount.java file in src in Project. Copy the following WordCount code into the file.

1  PackageOrg.apache.hadoop.examples;2  3 Importjava.io.IOException;4 ImportJava.util.StringTokenizer;5  6 Importorg.apache.hadoop.conf.Configuration;7 ImportOrg.apache.hadoop.fs.Path;8 Importorg.apache.hadoop.io.IntWritable;9 ImportOrg.apache.hadoop.io.Text;Ten ImportOrg.apache.hadoop.mapreduce.Job; One ImportOrg.apache.hadoop.mapreduce.Mapper; A ImportOrg.apache.hadoop.mapreduce.Reducer; - ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat; - ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat; the ImportOrg.apache.hadoop.util.GenericOptionsParser; -   -  Public classWordCount { -   +    Public Static classTokenizermapper -        extendsMapper<object, text, text, intwritable>{ +   A     Private Final StaticIntwritable one =NewIntwritable (1); at     PrivateText Word =NewText (); -   -      Public voidmap (Object key, Text value, context context -)throwsIOException, interruptedexception { -StringTokenizer ITR =NewStringTokenizer (value.tostring ()); -        while(Itr.hasmoretokens ()) { in Word.set (Itr.nexttoken ()); - Context.write (Word, one); to       } +     } -   } the   *    Public Static classIntsumreducer $        extendsReducer<text,intwritable,text,intwritable> {Panax Notoginseng     Privateintwritable result =Newintwritable (); -   the      Public voidReduce (Text key, iterable<intwritable>values, + Context Context A)throwsIOException, interruptedexception { the       intsum = 0; +        for(intwritable val:values) { -Sum + =val.get (); $       } $ result.set (sum); - Context.write (key, result); -     } the   } -  Wuyi    Public Static voidMain (string[] args)throwsException { theConfiguration conf =NewConfiguration (); -string[] Otherargs =Newgenericoptionsparser (conf, args). Getremainingargs (); Wu     if(Otherargs.length! = 2) { -System.err.println ("Usage:wordcount <in> <out>"); AboutSystem.exit (2); $     } -Job Job =NewJob (conf, "word count"); -Job.setjarbyclass (WordCount.class); -Job.setmapperclass (Tokenizermapper.class); AJob.setcombinerclass (Intsumreducer.class); +Job.setreducerclass (Intsumreducer.class); theJob.setoutputkeyclass (Text.class); -Job.setoutputvalueclass (intwritable.class); $Fileinputformat.addinputpath (Job,NewPath (otherargs[0])); theFileoutputformat.setoutputpath (Job,NewPath (otherargs[1])); theSystem.exit (Job.waitforcompletion (true) ? 0:1); the   } the}

Running MapReduce through Eclipse

Before running the MapReduce program, you need to perform an important operation (that is, to resolve the parameter setting problem by copying the configuration file mentioned above): Will have the modified configuration file in/usr/local/hadoop/etc/hadoop (such as pseudo-distributed requires core-site.xml and Hdfs-site.xml), and log4j.properties copied to the SRC folder (~/WORKSPACE/WORDCOUNT/SRC) under the WordCount project:

Cp/usr/local/hadoop/etc/hadoop/core-site.xml ~/workspace/wordcount//usr/local/hadoop/etc/hadoop/ Hdfs-site.xml ~/workspace/wordcount//usr/local/hadoop/etc/hadoop/log4j.properties ~/workspace/WordCount /src

The program will not run correctly without copying these files, and at the end of this tutorial you'll explain why you need to copy them.

After the copy is complete, be sure to right-click WordCount Select Refresh to refresh (does not automatically refresh, need to manually refresh), you can see the file structure as follows:

WordCount Project file Structure

You can run the MapReduce program by clicking the Run icon in the toolbar, or by right-clicking Wordcount.java in Project Explorer and choosing Run as-on Hadoop. However, because there are no parameters specified, the runtime will prompt "Usage:wordcount" and need to set the run parameters through eclipse.

Right-click on the Wordcount.java that you just created, select Run as, run configurations, where you can set the runtime parameters (if there is no WordCount under Java application, you need to double-click the Jav A application). Switch to the "Arguments" column and fill in "Input Output" at program Arguments.

Set Program run parameters

Alternatively, you can set the input parameters directly in the code. You can change the code main () function to String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); :

// string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); String[] otherargs=new/** *

After setting the parameters, run the program again, you can see the prompt to run successfully, and after refreshing the DFS location, you can also see the output folder.

WordCount Running Results

At this point, you can use Eclipse to facilitate the development of the MapReduce program.

Problems running a MapReduce program in Eclipse

When running a MapReduce program using Eclipse, the Hadoop-eclipse-plugin advanced parameters is read as the Hadoop runtime parameter, and if we do not modify it, the default parameter is actually a standalone (non-distributed) argument When the program runs to read the local directory instead of the HDFS directory, you will be prompted that the Input path does not exist.


Input Path does not exist:file:/home/hadoop/workspace/wordcountproject/input

So we can either modify the plug-in parameters or copy the configuration files to the SRC directory in the project to overwrite the parameters in order for the program to run correctly.

In addition, log4j is used to log the program's output diary, need to log4j.properties this configuration file, if the file is not copied to the project, after running the program in the Console panel will appear a warning prompt:

Log4j:warn No Appenders could is found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). Log4j:warn Initialize the log4j system Properly.log4j:WARN see http://logging.apache.org/log4j/1.2/faq.html#noconfig for More info.

Although it does not affect the correct operation of the program, you cannot see any prompt messages when the program is running (only the error message is visible).

Resources
    • Http://www.cnblogs.com/xia520pi/archive/2012/05/20/2510723.html
    • Http://www.blogjava.net/LittleRain/archive/2006/12/31/91165.html

Run the MapReduce program using Eclipse compilation Hadoop2.6.0_ubuntu/centos

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.