Hands-on teaching hadoop2.5.1+eclipse development and commissioning Environment Construction (2)

Source: Internet
Author: User
Tags hadoop mapreduce hadoop fs

In the previous blog post we set up a good operating environment, this article we began to build a development debugging environment. This is the essence of the real, is the countless tears cast!


4. Eclipse, see also eclipse

This I think as long as the Java is not unfamiliar, so I will not say more, everything to http://www.eclipse.org request.

Note that the eclipse environment here is installed in the virtual machine Oh, don't put it in the wrong place!


5. Install MAVEN Environment

Go to maven.apache.org download Maven3, extract to/home (because/home is usually a data disk, installed here does not occupy the space of the system disk). Configure ~/.bash_profile, modify path

Path= $PATH:/home/maven3/bin/

Then source ~/.bash_profile, making the environment variable effective.

Execute the mvn-version to see if the configuration is successful.

Because repo.maven.org/maven2 in the foreign visit too slow, finally set up a local image, I use a oschina image, please refer to: http://maven.oschina.net/help.html


Second, debug the MapReduce code

1. A simple Hadoop mapreduce example

First we create a MAVEN project in Eclipse, directly with a simple template, and modify Pom.xml to add dependencies:

<dependency>    <groupId>org.apache.hadoop</groupId>    <artifactid>hadoop-client </artifactId>    <version>2.5.1</version></dependency>


Then refer to: http://www.cnblogs.com/xia520pi/archive/2012/06/04/2534533.html to write a simple mapreduce program. Of course, he used the Hadoop-eclipse-plugin, this east of the complex configuration, not recommended to use, direct use of the command line, not only can exercise the proficiency of the Hadoop command, but also save a lot of things. For the sake of convenience, I used his first dedup example, a sample of two input files and a Dedup class.

Create a directory on HDFs with Bin/hadoop fs-mkdir/user/root/dedup_in first.

Two sample data can be placed in two TXT files, such as In.txt and In1.txt, the name is arbitrary, placed in the/home/temp/directory, and then Bin/hadoop dfs-copyfromlocal/home/temp/user/root/ The dedup_in is uploaded to HDFs and the data is ready for completion.

At the command line, execute MVN install to package this program into the jar, and then enter it at the command line:

Bin/hadoop Jar Engineering Catalog/target/xxx.jar [package]. Dedup

Can run a mapreduce, of course, you have to start Hadoop with sbin/start-all.sh first.

2, modify the Dedup class

In order to track and debug MapReduce locally, we also need to modify the Dedup class, mainly by modifying the main function, and adding a few lines of code after the configuration conf = new configuration ():

Configuration conf = new configuration (); Conf.addresource ("Classpath:mapred-site.xml"); Conf.set ("Fs.defaultfs", " hdfs://virtual Machine ip:8020 "), Conf.set (" Mapreduce.framework.name "," yarn "); Conf.set (" Yarn.resourcemanager.address "," Virtual machine IP : 8032 "); Conf.set (" Mapred.remote.os "," Linux "); Conf.set (" Hadoop.job.ugi "," Hadoop,hadoop ");


3. Create a local runtime configuration

Add a Mapred-site.xml file under Src/main/resources to configure the Hadoop parameters at run time:

<configuration> <property> <name>mapreduce.framework.name</name> <value>yar n</value> </property> <property> <name>mapred.child.java.opts</name> &lt ;value>-xmx800m-xdebug-xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000</value> </ property> <property> <name>mapreduce.jobtracker.staging.root.dir</name> <value&gt ;/tmp</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name > <value>/tmp</value> </property> <property> <name>mapreduce.framewor k.name</name> <value>local</value> </property> <property> <name>ma        preduce.jobtracker.address</name> <value>local</value> </property> <property> <name>mapred.job.trackeR</name> <value>local</value> </property></configuration> 


Note: The address=8000 in-xmx800m-xdebug-xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 can be configured with a port number that is not occupied, In case this port is already occupied.

4. Add a Localjob class

This class is intended to help hadoop-mapreduce-client publish the jar in local mode, with the following code:

Import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import org.apache.hadoop.mapred.JobConf; Import org.apache.hadoop.mapreduce.job;/** * @author root * */public class Localjob extends Job {public localjob () throws IOException {super ();} Public Localjob (Configuration conf, String jobName) throws IOException {super (conf, jobName);} Public Localjob (Configuration conf) throws IOException {super (conf);} public static Localjob getinstance (Configuration conf, String jobName) throws ioexception{jobconf jobconf = new Jobconf (CO NF); Localjob job=new localjob (jobconf); return job;} public void Setjarbyclass (class<?> clazz) {super.setjarbyclass (clazz); Conf.setjar ("file:///Engineering catalog/target/ Xxx.jar ");}}

Modify the behavior of the Dedup class main function Job Job=new job (conf, "Data deduplication"):

Job Job = localjob.getinstance (conf, "Data deduplication");

5. Add a remote Java application instance debug8000 to the debug configuration of Eclipse and set the port number to 8000, which is the one we configured earlier in Mapred-site.xml. Then execute the MVN install once on the command line to compile the jar file.

6, in the Dedup class map or reduce code set breakpoints, in order to application form Rundedup class, note is not the debug mode, but the run mode. If this times Dedup_out directory already exists, use Bin/hadoop fs-rm-r/user/root/dedup_out to delete the output directory

7, if see Dedup class normal start, no error, but stop moving, then don't be surprised, because you just set the breakpoint to take effect. At this point, you will be running debug8000 this remote Java application, in the Eclipse debug perspective you can see the breakpoint we just set, and jump directly to the line we set!


If you are done step by step, congratulations! You have opened the door to Hadoop development!




Hands-on teaching hadoop2.5.1+eclipse development and commissioning Environment Construction (2)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.