This article assumes that the Hadoop environment is on a remote machine (such as a Linux server), and the Hadoop version is 2.5.2
Note: This article Eclipse/intellij idea Remote debugging Hadoop 2.6.0 main reference and on the basis of the adjustment
Since I like to install 32-bit software on the Win7 64-bit, such as 32-bit jdk,32 Eclipse, the operating system in this article is Win7 64-bit, but all software is 32-bit.
Software version:
operating system:Win7 64 bits
Eclipse:eclipse-jee-mars-2-win32
Java:1.8.0_77 32-bit
hadoop:2.5.2
First, install Hadoop
1, in the Win7 to find a directory to extract hadoop-2.5.2.tar.gz, such as D:\app\hadoop-2.5.2\
2. Configure Environment variables
Hadoop_home = D:\app\hadoop-2.5.2\
Ii. Installing the Hadoop Eclipse plugin
1, download Hadoop-eclipse-plugin
Hadoop-eclipse-plugin is a Hadoop plug-in dedicated to eclipse, which allows you to view the contents of HDFs directories and files directly in the IDE environment. Its source code is hosted on the GitHub, the official website address is Https://github.com/winghc/hadoop2x-eclipse-plugin Download the Hadoop-eclipse-plugin-2.6.0.jar in the release folder
2. Download the Hadoop plug-in package for Windows 32-bit platform (Hadoop.dll,winutils.exe)
Since our software environment is 32-bit, so we need to download 32-bit hadoop.dll and winutils.exe, download the address you can Baidu Hadoop.dll 32
For example, download this: Http://xiazai.jb51.net/201607/yuanma/eclipse-hadoop (jb51.net). rar
Copy the Winutils.exe to the $hadoop_home\bin directory and copy the Hadoop.dll to the C:\Windows\SysWOW64 directory (note: Since our operating system is 64-bit and the software is 32-bit, we are handcuffed to this directory, In addition, if your operating system is 32-bit, then copy directly to the C:\windwos\system32 directory.
3. Configure Hadoop-eclipse-plugin Plugin
Start Eclipse,window->preferences->hadoop Map/reduce Specify the Hadoop root directory on Win7 (that is, $HADOOP _home)
Toggle Map/reduce View
Windows->show View->other map/reduce Locations
Then add the new location in the Map/reduce locations panel below
Follow the configuration below
Location Name Here's a name, whatever.
map/reduce (V2) Master Host This is the IP address of the Hadoop master in the virtual machine, and the following port corresponds The port specified by the Dfs.datanode.ipc.address property in the Hdfs-site.xml
The port here in DFS Master port, corresponding to the port specified in the Core-site.xml fs.defaultfs
The last user name is the same as the username running Hadoop in the virtual machine, and I'm running Hadoop 2.6.0 with a Hadoop, so fill out Hadoop here, if you're installing with root, change it to root.
When these parameters are specified, click Finish,eclipse to know how to connect to Hadoop, and everything goes well, in the Project Explorer panel, you can see the directories and files in the HDFs.
You can right-click on a file, select Delete try, usually the first time is unsuccessful, will prompt a bunch of things, the effect is insufficient permissions, the reason is that the current Win7 login user is not a virtual machine running Hadoop users, the solution has many, For example, you can create a new Hadoop administrator user on Win7, then switch to a Hadoop login win7, then use eclipse development, but this is too annoying, the easiest way:
Add in Hdfs-site.xml
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
All in all, it's all about turning off the security of Hadoop (the learning phase does not need these, do not do this in the formal production, do not do so), finally restart Hadoop, and then go to eclipse, repeat the delete file operation just try, should be OK.
Note: If you cannot connect, try Telnet 192.168.1.6 9000 (swap IP and port for your own Hadoop server IP and port) to ensure that the port is accessible.
If Telnet is unsuccessful, it may be a problem with the value of Fs.defaultfs in Core-site.xml, for example, the configuration is localhost:9000, you can consider replacing localhost with host name
Iii. Writing WordCount Examples
1, a new project, select Map/reduce Project
The next one is OK, then a new class Wodcount.java code is as follows:
Import java.io.IOException;
Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {public static class Tokenizermapper extends Mapper<object, text, text, intwritable> {
Private final static intwritable one = new intwritable (1);
Private Text Word = new text (); public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {stringtokenizer ITR
= New StringTokenizer (value.tostring ());
while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());
Context.write (Word, one);
}
} public static class Intsumreducer extends Reducer<text, intwritable, Text, intwritable> {private intwritable
result = new intwritable (); public void reduce (Text key, iterable<intwritable> values, context context) throws IOException,
interruptedexception {int sum = 0;
for (intwritable val:values) {sum + = Val.get ();
} result.set (sum);
Context.write (key, result);
} public static void Main (string[] args) throws Exception {Configuration conf = new Configuration ();
string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (Otherargs.length < 2) {System.err.println ("Usage:wordcount <in> [<in> ...]
<out> ");
System.exit (2);
Job Job = job.getinstance (conf, word count);
Job.setjarbyclass (Wordcount.class);
Job.setmapperclass (Tokenizermapper.class);
Job.setcombinerclass (Intsumreducer.class);
Job.setreducerclass (Intsumreducer.class);
Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Intwritable.class);
for (int i = 0; i < otherargs.length-1 ++i) {fileinputformat.addinputpath (Job, New Path (otherargs[i));
} fileoutputformat.setoutputpath (Job, New Path (otherargs[otherargs.length-1));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
And then the SRC directory to create a log4j.properties, the contents are as follows: (to facilitate the operation, see the various output)
Log4j.rootlogger=info, stdout
#log4j. Logger.org.springframework=info
#log4j. logger.org.apache.activemq= INFO
#log4j. Logger.org.apache.activemq.spring=warn
#log4j. logger.org.apache.activemq.store.journal= INFO
#log4j. Logger.org.activeio.journal=info
Log4j.appender.stdout=org.apache.log4j.consoleappender
Log4j.appender.stdout.layout=org.apache.log4j.patternlayout
Log4j.appender.stdout.layout.conversionpattern=%d{absolute} | %-5.5p | %-16.16t | %-32.32c{1} | %-32.32C%4l | %m%n
The final directory structure is as follows:
2. Configure running parameters
Because WordCount is to enter a file for statistical word, and then output to another folder, so give two parameters, refer to the above figure, in the program arguments, input
Hdfs://192.168.1.6:9000/user/nub1.txt
Hdfs://192.168.1.6:9000/user/output
Note that if the User/nub1.txt file is not available, please upload it manually (using the right key of the DFS location tool in Eclipse), and then/output/must not exist, otherwise the program will run to the end and the target directory is found, and an error is detected.
All right, run it.
The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.