Windows Eclipse builds Hadoop development environment (self-use, memo)

Source: Internet
Author: User
Tags hadoop fs

Note: This article is for personal use, update at any time.
I. System environment WINDOWS7, ecplise 4.4.0,hadoop2.7.2hadoop installation tutorial See: Hadoop2.7.2 Installation TutorialIt is important to note that your Hadoop is running on a remote virtual machine, but you also need to have a Hadoop environment on your windows so that eclipse can be debugged remotely, followed by the tutorial above to install Hadoop on the virtual machine. You also need to download the same version of Hadoop to be pressurized to Windows, and you need to configure the appropriate environment variables. Copy winutil.exe.hadoop.dll files to the Hadoop installation directory Bin folder, you can search the Internet, or you can refer to this article http://blog.csdn.net/fly_leopard/article/ details/51250443. Two. Installation steps 1. Install the Hadoop plugin download hadoop-eclipse-plugin-2.7.2.jar (click Download) and copy to eclipseroot directory under/dropins2. start MyEclipse, open perspective: "Window", "Open Perspective", "other ...", "Map/reduce", "OK"

3. Open a View:" window", "Show View", "Other ...", "MapReduce Tools", "Map/reduce Locations", "OK"
4. Add the Hadoop location:
Click New Hadoop Location
Modify the content: My Hadoop is installed on the virtual machine with the address 192.168.48.129Modify the contents of:Map/reduce Master in this box
Host: This is the cluster machine where Jobtracker is located, write 192.168.48.129
Hort: This is Jobtracker's port, which is written in 9001.
Map/reduce Master in this box: These two parameters are mapred-site.xml inside the mapred.job.tracker inside the IP and port
DFS Master in this box
Host: This is the cluster machine where Namenode is located, write 192.168.48.129
Port: It's Namenode's port, which writes 9000.
DFS Master in this box: These two parameters are the IP and port inside the Core-site.xml fs.defaultfs (or Fs.default.name)User name: This is the username that connects to Hadoopbecause I was using Hadoop as a Hadoop user, and I didn't create other users, I used Hadoop. The following is not required. Then click the Finish button, at which point there is more than one record in this view.
Restart Eclipse and re-edit the connection record that you just created, and now we edit the Advance Parameters tab page(Restart Edit Advance Parameters tab: When you create a new connection, some properties of this advance paramters tab are not displayed and cannot be set, so you must restart eclipse to see it again.)
Most of the properties here have been automatically filled in, in fact, Core-site.xml,hdfs-site.xml,mapred-site.xml Some of the configuration properties inside the display. Because there are changes in the site family configuration file when installing Hadoop, it is also necessary to make the same settings here. The main concerns are the following properties:
Fs.defualt.name (FS.DEFAULTFS): This has been set in the General tab page
Mapred.job.tracker: This is also set on the General tab page
Dfs.replication: The default here is 3, because I set it to 1 in Hdfs-site.xml, so this is set to 1.Hadoop.tmp.dir: Fill in the Hadoop.tmp.dir you set in Core-site.xml
What you need to modify, such as


Then click Finish and then connect (start the SSHD service first, start the Hadoop process), connect the flag

Note 1: I have a small problem here, when I install Hadoop on Ubutun, the address of the Core-site.xml property fs.defaultfs setting is localhost, because of the permission reason, I connected to Hadoop on Windows with Eclipse, and the connection was rejected by Hadoop. As shown below:

In this case, you only need to change the Fs.defaultfs property to your IP address, as follows:
NOTE 2:you can right-click on the file, choose to delete the test, usually the first time is unsuccessful, will prompt a pile of things, the effect is insufficient permissions, and so on, the reason is that the current Win7 login user is not a virtual machine running users of Hadoop, there are many solutions, For example, you can create a new Hadoop admin user on Win7, then switch to Hadoop login win7, then use eclipse development, but this is annoying, the simplest way:

Add in Hdfs-site.xml

1  < Property>2     <name>Dfs.permissions</name>3     <value>False</value>4  </ Property>

Then in the virtual machine, run Hadoop dfsadmin-safemode leave

For the sake of insurance, one more Hadoop fs-chmod 777/

All in all, the security of Hadoop is completely shut down (the learning phase does not need these, the formal production, do not do so), and finally restart Hadoop, and then to eclipse, repeat the deletion of the file operation, should be able to try.

5. Run a wordcount example (1) New map/reduce Project:"File", "New", "Project ...", "Map/reduce", "Map/reduce Project", "Project Name:testhadoop" Configure Hadoop Install directory ... "," Hadoop installation directory:d:\hadoop\hadoop-2.7.2\hadoop-2.7.2 " "Apply", "OK", "Next", "Allow output folders for source folders", "Finish"
As shown in the following:


(2) New WordCount class, the code is copied from the example of Hadoop's own, as follows
Code:
Package Com.wimang.test;import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import org.apache.hadoop.io.IntWritable; Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper ; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser;  public class WordCount {public static class Tokenizermapperextends Mapper<object, text, text, intwritable> {private Final static intwritable one = new intwritable (1);p rivate Text word = new text ();p ublic void map (Object key, text value, Context context) throws IOException, interruptedexception {stringtokenizer ITR = new StringTokenizer (value.tostring ()); while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (Word, one);}}} public static class inTsumreducerextends Reducer<text, Intwritable, Text, intwritable> {private intwritable result = new intwritable (); public void reduce (Text key, iterable<intwritable> Values,context Context) throws IOException, interruptedexception {int sum = 0;for (intwritable val:values) {sum + = Val.get ();} Result.set (sum); Context.write (key, result);}} @SuppressWarnings ("deprecation") public static void main (string[] args) throws Exception {Configuration conf = new Configu Ration (); string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (otherargs.length! = 2) { System.err.println ("Usage:wordcount <in> <out>"); System.exit (2);} Job Job = new Job (conf, "word count"); Job.setjarbyclass (Wordcount.class); Job.setmapperclass (Tokenizermapper.class); Job.setcombinerclass (Intsumreducer.class); Job.setreducerclass (Intsumreducer.class); Job.setOutputKeyClass ( Text.class); Job.setoutputvalueclass (Intwritable.class); Fileinputformat.addinputpath (Job, New Path (Otherargs[0])); FileoUtputformat.setoutputpath (Job, New Path (Otherargs[1])); System.exit (Job.waitforcompletion (true)? 0:1);}}
(3) build some simulation datain order to run the program, you need an input folder and an output folder. The output folder is automatically generated when the program is finished running. We need to give the program an input folder. There are two main ways of doing this: Method One: You can create a folder with the right mouse button on eclipse (first to resolve the permissions, the solution above), an input folder (name you arbitrarily named), upload one or two files with the word into the folder. Method Two: in the Run Hadoop virtual machine directly with the command, please refer to the Hadoop installation Tutorial _ standalone/pseudo-distributed configuration _hadoop2.7.2/ubuntu14.04 (reprint, modify part of the content) of the running Hadoop pseudo-distributed instances part of the content can be. (4) Configure the operating parameters① in the new Project WordCount, click Wordcount.java, right--->run as-->run configurations
② in the pop-up Run Configurations dialog box, click Java Application, right-click-->new, then create a new application named WordCount
③ Configure the run parameters, click Arguments, enter "The input folder you want to pass to the program and the folder where you require the program to save the calculation results", as:

(5) Click Run, and the program runs.

Occurs as if the run was successful.


Or on a virtual machine running on Hadoop by tapping commands, you can use the command in the terminal to see whether to generate folder output:
Bin/hadoop Fs-ls
Use the following command to view the generated file contents:
Bin/hadoop Fs-cat output1/*


(6) Problems in running WordCount. Situation 1 such as:
If you double-click Run Windows install the Winutils.exe in the Bin folder of the Hadoop directory appears as shown in the following:Note: If you do not have these files under the bin, please click Hadoop_dll_winutil_2.7.2.zip to download, extract to the bin directory, note that Hadoop.dll and other files do not conflict with Hadoop. In order to avoid the dependency error, Hadoop.dll can be placed in the C:/windows/system32 next copy. The environment configuration under Hadoop windows can refer to http://blog.csdn.net/fly_leopard/article/details/51250443.

If you have the above problem, just download and install the vc+2013 component: http://www.microsoft.com/en-us/download/confirmation.aspx?id=40784

Scenario 2

There is no permission to write, this time you can refer to the above in accordance with the Eclipse Hadoop plugin mentioned the solution. If it is to install my this process go, should not appear this problem.

Here, you can do the Hadoop on the Eclipse Remote development debug Ubutun on Windows.

Note: This article is for personal use, update at any time.

Windows Eclipse builds Hadoop development environment (self-use, memo)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.