Use win7eclipse to connect to hadoop on the Virtual Machine redhat (below)

Source: Internet
Author: User

Objective: To connect to the VM hadoop through eclipse on the local machine and run the wordcount sample program.

1. Plug-in Installation

In general, the downloaded hadoop-0.20.2 contains the eclipse plug-in, but only supports versions earlier than eclipse 3.2. I rushed to download the plug-in hadoop-eclipse-plugin-0.20.3-SNAPSHOT.

Copy it to the directory F: \ eclipse \ plugins and restart eclipse.

Open eclipse

The project explorer on the left has a DFS Location flag.

In windows-preferences, A hadoop map/reduce option is added. Select this option and select the downloaded hadoop root directory.

2. Configure Parameters

Open map/reduce location in the view

Click the icon in the pop-up window to add parameters.

Location name: Random write, such as hadoop

In the map/reduce master box, host is the Cluster machine where jobtracker is located. Here it is a standalone pseudo-distributed machine, and jobtracker is on this machine, so fill in the ip address of this machine.

Port: the port of jobtracker, Which is 9001.

The two parameters are the ip and port in mapred. job. tracker in the mapred-site.xml.

In the DFS master box, host is the Cluster machine where namenode is located. Here it is a standalone pseudo-distributed machine, and namenode is on this machine, so fill in the ip address of this machine.

Port: The namenode port. Write 9000 here.

These two parameters are the ip and port in fs. default. name in the core-site.xml

(Use M/R master host. If this check box is selected, it is the same as the host in the map/reduce master box by default. If this check box is not selected, you can customize the input, here, jobtracker and namenode are on the same machine, so check them)

Username: This is the user name for connecting to hadoop. Because I installed hadoop with the root user in linux and didn't create other users, I used root.

Do not fill in the following.

Click the finish button. At this time, there is an additional record in this view. Step 3, restart eclipse, and then re-edit the connection record just created ,, in step 2, we enter the General tab page. Now we edit the advance parameters tab page.

Here the attributes of big departments have been automatically filled in, readers can see, here is actually the core-defaulte.xml, hdfs-defaulte.xml, some of the configuration properties in the mapred-defaulte.xml display in this, because when we install hadoop, there are also changes in the site series configuration files, so we need to make the same settings here.

The following attributes are important:

Fs. defualt. name: This is already set on the General tab.

Mapred. job. tracker: This is also set on the General tab page.

Dfs. replication: the default value here is 3, because we set it to 1 in the hdfs-site.xml, so it also needs to be set to 1 here

Hadoop. tmp. dir: The default value is/tmp/hadoop-{user. name}, because we have hadoop in ore-defaulte.xml. tmp. dir is set to/usr/local/hadoop/hadooptmp, so here we also change to/usr/local/hadoop/hadooptmp. Other directory attributes will be automatically changed based on this directory attribute.

Hadoop. job. ugi: Enter "root" and "Tardis". The user connected to hadoop is prior to the comma, and the user is killed after the comma.

Then click finish, and the connection is connected.

There will be an elephant under DFS Locations, and there will be a folder below (2) which is the root directory of hdfs, where the directory structure of the Distributed File System is displayed.

3. Write a wordcount program and execute it in eclipse.

Create a map/reduce project in eclipse called exam

Note that you should go to the configure hadoop install derecloud option and select the root directory for hadoop installation.

Then create a java class under this project as follows:

package org.apache.hadoop.examples;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {  public static class TokenizerMapper        extends Mapper<Object, Text, Text, IntWritable>{        private final static IntWritable one = new IntWritable(1);    private Text word = new Text();          public void map(Object key, Text value, Context context                    ) throws IOException, InterruptedException {      StringTokenizer itr = new StringTokenizer(value.toString());      while (itr.hasMoreTokens()) {        word.set(itr.nextToken());        context.write(word, one);      }    }  }    public static class IntSumReducer        extends Reducer<Text,IntWritable,Text,IntWritable> {    private IntWritable result = new IntWritable();    public void reduce(Text key, Iterable<IntWritable> values,                        Context context                       ) throws IOException, InterruptedException {      int sum = 0;      for (IntWritable val : values) {        sum += val.get();      }      result.set(sum);      context.write(key, result);    }  }  public static void main(String[] args) throws Exception {    Configuration conf = new Configuration();    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();    if (otherArgs.length != 2) {      System.err.println("Usage: wordcount <in> <out>");      System.exit(2);    }    Job job = new Job(conf, "word count");    job.setJarByClass(WordCount.class);    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));    System.exit(job.waitForCompletion(true) ? 0 : 1);  }}
Go to the C:/Windows/System32/drivers/etc directory, open the hosts file, and add: 192.168.125.131 hadoopName ip is the ip address of my linux machine, and hadoopName is the name of the linux machine

Set the input and output files in run triggerations

In the input file, create two folders: file1 and file2 to write several words respectively.

The file name of the output file cannot conflict with the existing file.

Run: run on hadoop

After the execution is completed, view the results in the output file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.