Build a Hadoop cluster (iii)

Last Update:2016-02-02 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

By building a Hadoop cluster (ii), we have been able to run our own WordCount program smoothly.

Learn how to create your own Java applications, run on a Hadoop cluster, and debug with Debug.

How many kinds of debug methods are there?How Hadoop is debug on eclipse

In general, the most debug scenario is debugging the code logic in Mr, and partly debugging some of the code logic in the Main method.

Whether it's standalone, pesudo-distributed, or fully-distributed Mode, you can debug.

In particular, if you just want to verify the code logic, even if you don't have Hadoop installed, you can debug in Eclipse, which is called install-free Debug.

Comparison of several debug modes:

Installation-free debug (recommended)

The simplest way to debug is to choose the Hadoop version you want.

Benefits: Completely independent of your Hadoop installation environment, without any changes to the current Hadoop installation environment.

Even on Windows hosts, you can debug directly. (It can be troublesome to enter a path, you need to convert it to a Linux path or use hardcode instead of the input directly in the main function)

Standalone

Almost identical to the installation-free debug mode. The local file system is also used, and all processes are in the local JVM. Eclipse can be directly remote Debug.

The difference:

1. Standalone uses the Hadoop jar package under the installation path, while the free-install mode is to download the required Hadoop jar package externally.

2. Standalone needs to modify the Hadoop startup script.

pesudo-distributed

As with standalone, all daemon are now running in a JVM.

The difference:

1. Using HDFs instead of the local file system

2. You must modify the Hadoop startup script to debug into the main function. If you want to debug MR, you must add the "mapred.child.java.opts" property to the Mapred-site.xml.

fully-distributed

Debugging in this mode is tricky.

There is no problem with the debug to main function, but it is difficult to get into the Mr Code. Because the job is run on a namenode JVM, remote debug needs to know/guess which Namenode will run the task. Operation is very difficult, unless necessary, it is not recommended to debug in this mode.

Hadoop-eclipse-plugin

Principle is still the same, just modify the pseudo-distributed configuration of the steps, changed to the configuration plug-in to solve, to avoid the installation of the environment modification.

Depending on your installation settings, add the plug-in well:

Map/reduce Master Host:localhost, post:9001
DFS Master Host:localhost, post:9000
User NAME:HM
The Dfs.data.dir, Dfs.name.dir, Dfs.tmp.dir and so on are filled in with core-site.xml and other documents.
Mapred.child.java.opts also must be set up well.

: Https://github.com/winghc/hadoop2x-eclipse-plugin

Photo Guide: http://www.powerxing.com/hadoop-build-project-using-eclipse/

The following separately talk about how to operate in each of the debug mode.

To facilitate the invocation of an existing Hadoop jar package, I chose to install Eclipse and debug on a virtual machine with Hadoop installed.

Build Linux development environment install Eclipse 4.4

Download the Eclipse Luna installation package and extract it to the/OPT directory

[email protected] ~]$ sudo tar-zxvf eclipse-jee-luna-sr2-linux-gtk-x86_64.tar.gz-c/opt

Establish symbolic links to facilitate command line startup

[email protected] ~]$ sudo ln-s/opt/eclipse/eclipse/usr/bin/eclipse

Add Shortcut to Applications

vi/usr/share/applications/eclipse.desktop[desktop entry]encoding=utf-8Name= Eclipse 4.4.1Comment=Eclipse lunaexec=/usr/bin/eclipseicon=/opt/eclipse/  Icon.xpmcategories=application;development; Java;ideversion=1.0Type=applicationterminal=0

It can also be created by right-clicking Create launcher on the desktop.

In the shortcut bar you can see:

Enter eclipse in terminal or start Eclipse with the shortcut above.

If you download and install a non-JEE version, you can install the necessary plugins via marketplaces self-installing configuration egit, Maven, and so on.

Get the MapReduce example code

Source code for Hadoop MapReduce examples can be obtained in several ways

1. Search directly in the Hadoop installation directory: ~/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.6.2-sources.jar

2. SVN http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/

Installation-Free Debug

This is the point, most of the time we just need this kind of debug. Here is an example of a wordcount with its own:

Create a new Java Project.

Copy the contents of Wordcount.java from example and put them in your Java project for Debug.

Create the input folder and put some text files into it.

Setting input output in the debug option

For Hadoop 1, on Project, right-click Property->java build Path->libraries->add external jars, adding:

If it is a Hadoop 2.x version, add:

The difference is that the 2.x version no longer uses the Hadoop-core jar package, split into Hadoop-common and Hadoop-mapreduce-client-core, and so on jar packages.

You can set breakpoints in the map and reduce functions, and click Debug to go to breakpoint debugging.

Standalone Mode Debug

and no installation mode. In the Eclipse Debug configuration, select "Remote java application" instead of "Java application" for Debug.

Also need to modify the installation directory under the Bin/hadoop file, add: (the port itself arbitrarily set, no conflict can be)

hadoop_opts= "$HADOOP _opts-agentlib:jdwp=transport=dt_socket,address=8883,server=y,suspend=y"

Set up local input (e.g. install the new input folder, put in several txt files), run the command:

Bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount input/wordcount output

Start Remote debug on eclipse (the ports and Bin/hadoop are set consistently, if not in VM native debug, replace localhost with IP or hostname):

You can debug to main and Mr Code at this point.

pesudo-distributed Mode Debug

First of all, format Namenode.

If you want to debug the main function, just like standalone, modify the Bin/hadoop and run.

If you want to debug the MapReduce code, you must modify mapred-site.xml to add:

<property>    <name>mapred.child.java.opts</name>    <value>-agentlib:jdwp=transport =dt_socket,address=8887,server=y,suspend=y</value></property>

Create a new remote Debug configuration and set the port to 8887.

Then start sbin/start-dfs.sh and you can see that the log indicates that port 8887 has been monitored.

Start sbin/start-yarn.sh again. To run the command:

Bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount input/wordcount output

You can debug the corresponding code by starting two remote debugs in turn.

A little bit of tricks

If you find it inconvenient to change the Hadoop startup command frequently, consider adding a custom parameter at the end, such as "Debug":

debug=${!  #}  "debug=\$$#" #这样写也可以 if"$debug"  "debug"then  hadoop_opts="$ Hadoop_opts-agentlib:jdwp=transport=dt_socket,address=8887,server=y,suspend=y"  fi

So only the start command is lost.

Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6. 2. jar wordcount input/wordcount Output debug

Will enter debug mode only. Otherwise, it starts normally.

However, for a program like WordCount, the last parameter is automatically output on the spot, and all other parameters are treated as input. This will not work properly. It is necessary to modify the logic of the main function, or to discard the debug form of the Add parameter.

Build a Hadoop cluster (iii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More