Many Hadoop beginners estimate all of me, because there is not enough machine resources, only in the virtual machine for a Linux installation of the pseudo distribution of Hadoop, and then on the host machine Win7 using Eclipse or INTELLJ idea to write code tests, then the problem, Win7 Eclipse or IntelliJ idea how to remotely submit map/reduce tasks to remote Hadoop and breakpoint debugging?
First, preparatory work
1.1 In the Win7, find a directory, decompression hadoop-2.6.0, this article is D:\yangjm\Code\study\hadoop\hadoop-2.6.0 (the following is represented by $hadoop_home)
1.2 Add several environment variables to the Win7
hadoop_home=d:\yangjm\code\study\hadoop\hadoop-2.6.0
Hadoop_bin_path=%hadoop_home%\bin
hadoop_prefix=d:\yangjm\code\study\hadoop\hadoop-2.6.0
In addition, the path variable is appended at the end;%hadoop_home%\bin
Second, Eclipse remote debugging
1.1 Download Hadoop-eclipse-plugin Plugin
Hadoop-eclipse-plugin is a Hadoop plug-in dedicated to eclipse, which allows you to view the contents of HDFs directories and files directly in the IDE environment. Its source code is hosted on the GitHub, the official website address is Https://github.com/winghc/hadoop2x-eclipse-plugin
Interested can download the source code compiled, Baidu click n more articles, but if just use https://github.com/winghc/hadoop2x-eclipse-plugin/tree/master/release% 20 here has provided a variety of compiled versions, directly with the line, the downloaded Hadoop-eclipse-plugin-2.6.0.jar copy to the Eclipse/plugins directory, and then restart Eclipse is finished
1.2 Download the hadoop2.6 plug-in package for the WINDOWS64 bit platform (Hadoop.dll,winutils.exe)
Under the hadoop2.6.0 source Hadoop-common-project\hadoop-common\src\main\winutils, there is a vs.net project that compiles this project to get this pile of files, the output of the file,
Hadoop.dll, Winutils.exe These two most useful, copy Winutils.exe to $hadoop_home\bin directory, copy Hadoop.dll to%windir%\system32 directory ( The main is to prevent plug-ins to report a variety of inexplicable errors, such as the null object reference what the
Note: If you do not want to compile, you can download the compiled file directly hadoop2.6 (x64) V0.2.rar
1.3 Configure Hadoop-eclipse-plugin Plugin
Start Eclipse,windows->show View->other
Window->preferences->hadoop map/reduce Specifies the Hadoop root directory on Win7 (that is, $HADOOP _home)
Then, in the Map/reduce locations panel, click on the image icon
Add a location
This interface is often important, explaining a few parameters:
Location Name Here's a name, whatever.
map/reduce (V2) Master Host This is the IP address of the Hadoop master in the virtual machine, and the following port corresponds The port specified by the Dfs.datanode.ipc.address property in the Hdfs-site.xml
DFS Master Port: ports here, corresponding to the port specified in the Core-site.xml fs.defaultfs
The last user name is the same as the username running Hadoop in the virtual machine, and I'm running Hadoop 2.6.0 with a Hadoop, so fill out Hadoop here, if you're installing with root, change it to root.
When these parameters are specified, click Finish,eclipse to know how to connect to Hadoop, and everything goes well, in the Project Explorer panel, you can see the directories and files in the HDFs.
You can right-click on a file, select Delete try, usually the first time is unsuccessful, will prompt a bunch of things, the effect is insufficient permissions, the reason is that the current Win7 login user is not a virtual machine running Hadoop users, the solution has many, For example, you can create a new Hadoop administrator user on Win7, then switch to a Hadoop login win7, then use eclipse development, but this is too annoying, the easiest way:
Add in Hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false</value>
</ Property>
Then run the Hadoop dfsadmin-safemode in the virtual machine leave
To be on the safe side, one more Hadoop fs-chmod 777/
All in all, it's all about turning off the security of Hadoop (the learning phase does not need these, do not do this in the formal production, do not do so), finally restart Hadoop, and then go to eclipse, repeat the delete file operation just try, should be OK.
1.4 Creating the Woldcount sample project
Create a new item, select Map/reduce Project
Next on the line, then put a Wodcount.java, the code is as follows:
Package yjmyzz;
Import java.io.IOException;
Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {public static class Tokenizermapper extends Mapper<object, text, text, intwritable> {
Private final static intwritable one = new intwritable (1);
Private Text Word = new text (); public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {stringtokenizer ITR
= New StringTokenizer (value.tostring ());
while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (woRd, one); }} public static class Intsumreducer extends Reducer<text, intwritable, Text, intwritable> {private Intwrit
Able result = new intwritable (); public void reduce (Text key, iterable<intwritable> values, context context) throws IOException,
interruptedexception {int sum = 0;
for (intwritable val:values) {sum + = Val.get ();
} result.set (sum);
Context.write (key, result);
} public static void Main (string[] args) throws Exception {Configuration conf = new Configuration ();
string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (Otherargs.length < 2) {System.err.println ("Usage:wordcount <in> [<in> ...]
<out> ");
System.exit (2);
Job Job = job.getinstance (conf, word count);
Job.setjarbyclass (Wordcount.class);
Job.setmapperclass (Tokenizermapper.class);
Job.setcombinerclass (Intsumreducer.class);
Job.setreducerclass (Intsumreducer.class);
Job.setoutputkeyclass (Text.class); Job.setOutputvalueclass (Intwritable.class);
for (int i = 0; i < otherargs.length-1 ++i) {fileinputformat.addinputpath (Job, New Path (otherargs[i));
} fileoutputformat.setoutputpath (Job, New Path (otherargs[otherargs.length-1));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
And then put a log4j.properties, the contents are as follows: (in order to facilitate the operation, see the various output)
Log4j.rootlogger=info, stdout
#log4j. Logger.org.springframework=info
#log4j. logger.org.apache.activemq= INFO
#log4j. Logger.org.apache.activemq.spring=warn
#log4j. logger.org.apache.activemq.store.journal= INFO
#log4j. Logger.org.activeio.journal=info
Log4j.appender.stdout=org.apache.log4j.consoleappender
Log4j.appender.stdout.layout=org.apache.log4j.patternlayout
Log4j.appender.stdout.layout.conversionpattern=%d{absolute} | %-5.5p | %-16.16t | %-32.32c{1} | %-32.32C%4l | %m%n
The final directory structure is as follows:
Then you can run, of course, will not succeed, because did not give WordCount input parameters, refer to the following figure:
1.5 Setting Run Parameters
Because WordCount is to enter a file for statistical word, and then output to another folder, so give two parameters, refer to the above figure, in the program arguments, input
Hdfs://172.28.20.xxx:9000/jimmy/input/readme.txt
hdfs://172.28.20.xxx:9000/jimmy/output/
We refer to this change (mainly the IP into their own virtual machine in the IP), note that if the Input/readm.txt file does not, please manually upload, and then/output/must not exist, otherwise the program run to the end, found that the target directory exists, will also be an error, this finished , you can make a breakpoint in the right place, and you can finally debug it:
Three, IntelliJ idea remote debugging Hadoop
3.1 Create a Maven WordCount project
The pom file is as follows:
<?xml version= "1.0" encoding= "UTF-8"?> <project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "http ://www.w3.org/2001/XMLSchema-instance "xsi:schemalocation=" http://maven.apache.org/POM/4.0.0 http:// Maven.apache.org/xsd/maven-4.0.0.xsd "> <modelVersion>4.0.0</modelVersion> <groupid>yjmyzz </groupId> <artifactId>mapreduce-helloworld</artifactId> <version>1.0-snapshot</ version> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifacti
d>hadoop-common</artifactid> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactid>hadoop-mapreduce-client-jobclient</ artifactid> <version>2.6.0</version> </dependency> <dependency> <groupId> Commons-cli</groupid> <artifactId>commons-cli</artifactId> <version>1.2</version> </dependency> </dependencies> <build> <finalname>${project.artifactid}
</finalName> </build> </project>
The project structure is as follows:
Right-click on item--open module Settings or press F12 to turn on modules properties
To add a dependent libary reference
and $hadoop_home the corresponding packets under the door.
The imported libary can have a name, such as hadoop2.6
3.2 Setting Run parameters
Note two places :
1 is program aguments, and here's a similar approach to eclipes, specifying input files and output folders
2 is working directory, the working directory, designated as the $hadoop_home directory
And then you can debug it.
IntelliJ the only bad, because there is no similar to the eclipse of the Hadoop plug-in, each run WordCount, the next time to run, you can only manually delete the output directory, and then Debug. To solve this problem, you can improve the WordCount code, delete the output directory before running, see the following code:
Package yjmyzz;
Import java.io.IOException;
Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {public static class Tokenizermapper extends Mapper<object, text, text, intwritable> {
Private final static intwritable one = new intwritable (1);
Private Text Word = new text (); public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {stringtokenizer ITR
= New StringTokenizer (value.tostring ()); while (Itr.hasmoretokens ()) {Word.Set (Itr.nexttoken ());
Context.write (Word, one); }} public static class Intsumreducer extends Reducer<text, intwritable, Text, intwritable> {private Intwrit
Able result = new intwritable (); public void reduce (Text key, iterable<intwritable> values, context context) throws IOException,
interruptedexception {int sum = 0;
for (intwritable val:values) {sum + = Val.get ();
} result.set (sum);
Context.write (key, result); }/** * Delete specified directory * * @param conf * @param dirpath * @throws ioexception/private static void Deletedir (Config
Uration conf, String Dirpath) throws IOException {filesystem fs = Filesystem.get (conf);
Path TargetPath = new Path (Dirpath);
if (fs.exists (TargetPath)) {Boolean delresult = Fs.delete (TargetPath, true);
if (Delresult) {System.out.println (TargetPath + "has been deleted sucessfullly.");
else {System.out.println (TargetPath + "deletion failed."); }} public static void Main (string[] args) throWS Exception {Configuration conf = new Configuration ();
string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (Otherargs.length < 2) {System.err.println ("Usage:wordcount <in> [<in> ...]
<out> ");
System.exit (2);
////delete the output directory Deletedir (conf, otherargs[otherargs.length-1]) first;
Job Job = job.getinstance (conf, word count);
Job.setjarbyclass (Wordcount.class);
Job.setmapperclass (Tokenizermapper.class);
Job.setcombinerclass (Intsumreducer.class);
Job.setreducerclass (Intsumreducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);
for (int i = 0; i < otherargs.length-1 ++i) {fileinputformat.addinputpath (Job, New Path (otherargs[i));
} fileoutputformat.setoutputpath (Job, New Path (otherargs[otherargs.length-1));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
But that's not enough. When you run in an IDE environment, the IDE needs to know which HDFs instance (as if you need to specify DataSource in the configuration XML in DB development) to $hadoop_home\etc\ Hadoop under the Core-site.xml, copied to the Resouces directory, similar to the following:
The contents are as follows:
<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "
text/xsl" href= "configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value> hdfs://172.28.20.***:9000</value>
</property>
</configuration>
The IP above is replaced by the IP in the virtual machine.