Hands-on teaching hadoop2.5.1+eclipse development and commissioning Environment Construction (2)

Last Update:2014-12-03 Source: Internet

Author: User

Tags hadoop mapreduce hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the previous blog post we set up a good operating environment, this article we began to build a development debugging environment. This is the essence of the real, is the countless tears cast!

4. Eclipse, see also eclipse

This I think as long as the Java is not unfamiliar, so I will not say more, everything to http://www.eclipse.org request.

Note that the eclipse environment here is installed in the virtual machine Oh, don't put it in the wrong place!

5. Install MAVEN Environment

Go to maven.apache.org download Maven3, extract to/home (because/home is usually a data disk, installed here does not occupy the space of the system disk). Configure ~/.bash_profile, modify path

Path= $PATH:/home/maven3/bin/

Then source ~/.bash_profile, making the environment variable effective.

Execute the mvn-version to see if the configuration is successful.

Because repo.maven.org/maven2 in the foreign visit too slow, finally set up a local image, I use a oschina image, please refer to: http://maven.oschina.net/help.html

Second, debug the MapReduce code

1. A simple Hadoop mapreduce example

First we create a MAVEN project in Eclipse, directly with a simple template, and modify Pom.xml to add dependencies:

<dependency>    <groupId>org.apache.hadoop</groupId>    <artifactid>hadoop-client </artifactId>    <version>2.5.1</version></dependency>

Then refer to: http://www.cnblogs.com/xia520pi/archive/2012/06/04/2534533.html to write a simple mapreduce program. Of course, he used the Hadoop-eclipse-plugin, this east of the complex configuration, not recommended to use, direct use of the command line, not only can exercise the proficiency of the Hadoop command, but also save a lot of things. For the sake of convenience, I used his first dedup example, a sample of two input files and a Dedup class.

Create a directory on HDFs with Bin/hadoop fs-mkdir/user/root/dedup_in first.

Two sample data can be placed in two TXT files, such as In.txt and In1.txt, the name is arbitrary, placed in the/home/temp/directory, and then Bin/hadoop dfs-copyfromlocal/home/temp/user/root/ The dedup_in is uploaded to HDFs and the data is ready for completion.

At the command line, execute MVN install to package this program into the jar, and then enter it at the command line:

Bin/hadoop Jar Engineering Catalog/target/xxx.jar [package]. Dedup

Can run a mapreduce, of course, you have to start Hadoop with sbin/start-all.sh first.

2, modify the Dedup class

In order to track and debug MapReduce locally, we also need to modify the Dedup class, mainly by modifying the main function, and adding a few lines of code after the configuration conf = new configuration ():

Configuration conf = new configuration (); Conf.addresource ("Classpath:mapred-site.xml"); Conf.set ("Fs.defaultfs", " hdfs://virtual Machine ip:8020 "), Conf.set (" Mapreduce.framework.name "," yarn "); Conf.set (" Yarn.resourcemanager.address "," Virtual machine IP : 8032 "); Conf.set (" Mapred.remote.os "," Linux "); Conf.set (" Hadoop.job.ugi "," Hadoop,hadoop ");

3. Create a local runtime configuration

Add a Mapred-site.xml file under Src/main/resources to configure the Hadoop parameters at run time:

<configuration> <property> <name>mapreduce.framework.name</name> <value>yar n</value> </property> <property> <name>mapred.child.java.opts</name> &lt ;value>-xmx800m-xdebug-xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000</value> </ property> <property> <name>mapreduce.jobtracker.staging.root.dir</name> <value&gt ;/tmp</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name > <value>/tmp</value> </property> <property> <name>mapreduce.framewor k.name</name> <value>local</value> </property> <property> <name>ma        preduce.jobtracker.address</name> <value>local</value> </property> <property> <name>mapred.job.trackeR</name> <value>local</value> </property></configuration>

Note: The address=8000 in-xmx800m-xdebug-xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 can be configured with a port number that is not occupied, In case this port is already occupied.

4. Add a Localjob class

This class is intended to help hadoop-mapreduce-client publish the jar in local mode, with the following code:

Import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import org.apache.hadoop.mapred.JobConf; Import org.apache.hadoop.mapreduce.job;/** * @author root * */public class Localjob extends Job {public localjob () throws IOException {super ();} Public Localjob (Configuration conf, String jobName) throws IOException {super (conf, jobName);} Public Localjob (Configuration conf) throws IOException {super (conf);} public static Localjob getinstance (Configuration conf, String jobName) throws ioexception{jobconf jobconf = new Jobconf (CO NF); Localjob job=new localjob (jobconf); return job;} public void Setjarbyclass (class<?> clazz) {super.setjarbyclass (clazz); Conf.setjar ("file:///Engineering catalog/target/ Xxx.jar ");}}

Modify the behavior of the Dedup class main function Job Job=new job (conf, "Data deduplication"):

Job Job = localjob.getinstance (conf, "Data deduplication");

5. Add a remote Java application instance debug8000 to the debug configuration of Eclipse and set the port number to 8000, which is the one we configured earlier in Mapred-site.xml. Then execute the MVN install once on the command line to compile the jar file.

6, in the Dedup class map or reduce code set breakpoints, in order to application form Rundedup class, note is not the debug mode, but the run mode. If this times Dedup_out directory already exists, use Bin/hadoop fs-rm-r/user/root/dedup_out to delete the output directory

7, if see Dedup class normal start, no error, but stop moving, then don't be surprised, because you just set the breakpoint to take effect. At this point, you will be running debug8000 this remote Java application, in the Eclipse debug perspective you can see the breakpoint we just set, and jump directly to the line we set!

If you are done step by step, congratulations! You have opened the door to Hadoop development!

Hands-on teaching hadoop2.5.1+eclipse development and commissioning Environment Construction (2)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More