Eclipse/intellij idea Remote Debugging Hadoop 2.6.0_java

Source: Internet
Author: User
Tags iterable static class xsl hadoop fs log4j

Many Hadoop beginners estimate all of me, because there is not enough machine resources, only in the virtual machine for a Linux installation of the pseudo distribution of Hadoop, and then on the host machine Win7 using Eclipse or INTELLJ idea to write code tests, then the problem, Win7 Eclipse or IntelliJ idea how to remotely submit map/reduce tasks to remote Hadoop and breakpoint debugging?

First, preparatory work

1.1 In the Win7, find a directory, decompression hadoop-2.6.0, this article is D:\yangjm\Code\study\hadoop\hadoop-2.6.0 (the following is represented by $hadoop_home)

1.2 Add several environment variables to the Win7

hadoop_home=d:\yangjm\code\study\hadoop\hadoop-2.6.0

Hadoop_bin_path=%hadoop_home%\bin

hadoop_prefix=d:\yangjm\code\study\hadoop\hadoop-2.6.0

In addition, the path variable is appended at the end;%hadoop_home%\bin

Second, Eclipse remote debugging

1.1 Download Hadoop-eclipse-plugin Plugin

Hadoop-eclipse-plugin is a Hadoop plug-in dedicated to eclipse, which allows you to view the contents of HDFs directories and files directly in the IDE environment. Its source code is hosted on the GitHub, the official website address is Https://github.com/winghc/hadoop2x-eclipse-plugin

Interested can download the source code compiled, Baidu click n more articles, but if just use https://github.com/winghc/hadoop2x-eclipse-plugin/tree/master/release% 20 here has provided a variety of compiled versions, directly with the line, the downloaded Hadoop-eclipse-plugin-2.6.0.jar copy to the Eclipse/plugins directory, and then restart Eclipse is finished

1.2 Download the hadoop2.6 plug-in package for the WINDOWS64 bit platform (Hadoop.dll,winutils.exe)

Under the hadoop2.6.0 source Hadoop-common-project\hadoop-common\src\main\winutils, there is a vs.net project that compiles this project to get this pile of files, the output of the file,

Hadoop.dll, Winutils.exe These two most useful, copy Winutils.exe to $hadoop_home\bin directory, copy Hadoop.dll to%windir%\system32 directory ( The main is to prevent plug-ins to report a variety of inexplicable errors, such as the null object reference what the

Note: If you do not want to compile, you can download the compiled file directly hadoop2.6 (x64) V0.2.rar

1.3 Configure Hadoop-eclipse-plugin Plugin

Start Eclipse,windows->show View->other

Window->preferences->hadoop map/reduce Specifies the Hadoop root directory on Win7 (that is, $HADOOP _home)

Then, in the Map/reduce locations panel, click on the image icon

Add a location

This interface is often important, explaining a few parameters:

Location Name Here's a name, whatever.

map/reduce (V2) Master Host This is the IP address of the Hadoop master in the virtual machine, and the following port corresponds The port specified by the Dfs.datanode.ipc.address property in the Hdfs-site.xml

DFS Master Port: ports here, corresponding to the port specified in the Core-site.xml fs.defaultfs

The last user name is the same as the username running Hadoop in the virtual machine, and I'm running Hadoop 2.6.0 with a Hadoop, so fill out Hadoop here, if you're installing with root, change it to root.

When these parameters are specified, click Finish,eclipse to know how to connect to Hadoop, and everything goes well, in the Project Explorer panel, you can see the directories and files in the HDFs.

You can right-click on a file, select Delete try, usually the first time is unsuccessful, will prompt a bunch of things, the effect is insufficient permissions, the reason is that the current Win7 login user is not a virtual machine running Hadoop users, the solution has many, For example, you can create a new Hadoop administrator user on Win7, then switch to a Hadoop login win7, then use eclipse development, but this is too annoying, the easiest way:

Add in Hdfs-site.xml

<property>
 <name>dfs.permissions</name>
 <value>false</value>
 </ Property>

Then run the Hadoop dfsadmin-safemode in the virtual machine leave

To be on the safe side, one more Hadoop fs-chmod 777/

All in all, it's all about turning off the security of Hadoop (the learning phase does not need these, do not do this in the formal production, do not do so), finally restart Hadoop, and then go to eclipse, repeat the delete file operation just try, should be OK.

1.4 Creating the Woldcount sample project

Create a new item, select Map/reduce Project

Next on the line, then put a Wodcount.java, the code is as follows:

Package yjmyzz;
Import java.io.IOException;

Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import Org.apache.hadoop.util.GenericOptionsParser;

 public class WordCount {public static class Tokenizermapper extends Mapper<object, text, text, intwritable> {
 Private final static intwritable one = new intwritable (1);

 Private Text Word = new text (); public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {stringtokenizer ITR
  = New StringTokenizer (value.tostring ());
  while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (woRd, one); }} public static class Intsumreducer extends Reducer<text, intwritable, Text, intwritable> {private Intwrit

 Able result = new intwritable (); public void reduce (Text key, iterable<intwritable> values, context context) throws IOException,
  interruptedexception {int sum = 0;
  for (intwritable val:values) {sum + = Val.get ();
  } result.set (sum);
 Context.write (key, result); 
 } public static void Main (string[] args) throws Exception {Configuration conf = new Configuration ();
 string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (Otherargs.length < 2) {System.err.println ("Usage:wordcount <in> [<in> ...]
  <out> ");
 System.exit (2);
 Job Job = job.getinstance (conf, word count);
 Job.setjarbyclass (Wordcount.class);
 Job.setmapperclass (Tokenizermapper.class);
 Job.setcombinerclass (Intsumreducer.class);
 Job.setreducerclass (Intsumreducer.class);
 Job.setoutputkeyclass (Text.class); Job.setOutputvalueclass (Intwritable.class);
 for (int i = 0; i < otherargs.length-1 ++i) {fileinputformat.addinputpath (Job, New Path (otherargs[i));
 } fileoutputformat.setoutputpath (Job, New Path (otherargs[otherargs.length-1));
 System.exit (Job.waitforcompletion (true)? 0:1);
 }
}

And then put a log4j.properties, the contents are as follows: (in order to facilitate the operation, see the various output)

Log4j.rootlogger=info, stdout

#log4j. Logger.org.springframework=info
#log4j. logger.org.apache.activemq= INFO
#log4j. Logger.org.apache.activemq.spring=warn
#log4j. logger.org.apache.activemq.store.journal= INFO
#log4j. Logger.org.activeio.journal=info

Log4j.appender.stdout=org.apache.log4j.consoleappender
Log4j.appender.stdout.layout=org.apache.log4j.patternlayout
Log4j.appender.stdout.layout.conversionpattern=%d{absolute} | %-5.5p | %-16.16t | %-32.32c{1} | %-32.32C%4l | %m%n

The final directory structure is as follows:

Then you can run, of course, will not succeed, because did not give WordCount input parameters, refer to the following figure:

1.5 Setting Run Parameters

Because WordCount is to enter a file for statistical word, and then output to another folder, so give two parameters, refer to the above figure, in the program arguments, input

Hdfs://172.28.20.xxx:9000/jimmy/input/readme.txt
hdfs://172.28.20.xxx:9000/jimmy/output/

We refer to this change (mainly the IP into their own virtual machine in the IP), note that if the Input/readm.txt file does not, please manually upload, and then/output/must not exist, otherwise the program run to the end, found that the target directory exists, will also be an error, this finished , you can make a breakpoint in the right place, and you can finally debug it:

Three, IntelliJ idea remote debugging Hadoop

3.1 Create a Maven WordCount project

The pom file is as follows:

<?xml version= "1.0" encoding= "UTF-8"?> <project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "http ://www.w3.org/2001/XMLSchema-instance "xsi:schemalocation=" http://maven.apache.org/POM/4.0.0 http:// Maven.apache.org/xsd/maven-4.0.0.xsd "> <modelVersion>4.0.0</modelVersion> <groupid>yjmyzz </groupId> <artifactId>mapreduce-helloworld</artifactId> <version>1.0-snapshot</ version> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifacti
  d>hadoop-common</artifactid> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactid>hadoop-mapreduce-client-jobclient</ artifactid> <version>2.6.0</version> </dependency> <dependency> <groupId> Commons-cli</groupid> <artifactId>commons-cli</artifactId> <version>1.2</version> </dependency> </dependencies> <build> <finalname>${project.artifactid}
 </finalName> </build> </project>

The project structure is as follows:

Right-click on item--open module Settings or press F12 to turn on modules properties

To add a dependent libary reference

and $hadoop_home the corresponding packets under the door.

The imported libary can have a name, such as hadoop2.6

3.2 Setting Run parameters

Note two places :

1 is program aguments, and here's a similar approach to eclipes, specifying input files and output folders

2 is working directory, the working directory, designated as the $hadoop_home directory

And then you can debug it.

IntelliJ the only bad, because there is no similar to the eclipse of the Hadoop plug-in, each run WordCount, the next time to run, you can only manually delete the output directory, and then Debug. To solve this problem, you can improve the WordCount code, delete the output directory before running, see the following code:

Package yjmyzz;
Import java.io.IOException;

Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import Org.apache.hadoop.util.GenericOptionsParser;

 public class WordCount {public static class Tokenizermapper extends Mapper<object, text, text, intwritable> {
 Private final static intwritable one = new intwritable (1);

 Private Text Word = new text (); public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {stringtokenizer ITR
  = New StringTokenizer (value.tostring ()); while (Itr.hasmoretokens ()) {Word.Set (Itr.nexttoken ());
  Context.write (Word, one); }} public static class Intsumreducer extends Reducer<text, intwritable, Text, intwritable> {private Intwrit

 Able result = new intwritable (); public void reduce (Text key, iterable<intwritable> values, context context) throws IOException,
  interruptedexception {int sum = 0;
  for (intwritable val:values) {sum + = Val.get ();
  } result.set (sum);
 Context.write (key, result); }/** * Delete specified directory * * @param conf * @param dirpath * @throws ioexception/private static void Deletedir (Config
 Uration conf, String Dirpath) throws IOException {filesystem fs = Filesystem.get (conf);
 Path TargetPath = new Path (Dirpath);
  if (fs.exists (TargetPath)) {Boolean delresult = Fs.delete (TargetPath, true);
  if (Delresult) {System.out.println (TargetPath + "has been deleted sucessfullly.");
  else {System.out.println (TargetPath + "deletion failed."); }} public static void Main (string[] args) throWS Exception {Configuration conf = new Configuration ();
 string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (Otherargs.length < 2) {System.err.println ("Usage:wordcount <in> [<in> ...]
  <out> ");
 System.exit (2);

 ////delete the output directory Deletedir (conf, otherargs[otherargs.length-1]) first;
 Job Job = job.getinstance (conf, word count);
 Job.setjarbyclass (Wordcount.class);
 Job.setmapperclass (Tokenizermapper.class);
 Job.setcombinerclass (Intsumreducer.class);
 Job.setreducerclass (Intsumreducer.class);
 Job.setoutputkeyclass (Text.class);
 Job.setoutputvalueclass (Intwritable.class);
 for (int i = 0; i < otherargs.length-1 ++i) {fileinputformat.addinputpath (Job, New Path (otherargs[i));
 } fileoutputformat.setoutputpath (Job, New Path (otherargs[otherargs.length-1));
 System.exit (Job.waitforcompletion (true)? 0:1);
 }
}

But that's not enough. When you run in an IDE environment, the IDE needs to know which HDFs instance (as if you need to specify DataSource in the configuration XML in DB development) to $hadoop_home\etc\ Hadoop under the Core-site.xml, copied to the Resouces directory, similar to the following:

The contents are as follows:

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "
text/xsl" href= "configuration.xsl"?>
<configuration>
 <property>
 <name>fs.defaultFS</name>
 <value> hdfs://172.28.20.***:9000</value>
 </property>
</configuration>

The IP above is replaced by the IP in the virtual machine.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.