Notes for configuring the hadoop plug-in eclipse

Source: Internet
Author: User
Tags gtk xsl hadoop fs

1. Use the stable version 0.20.203.0rc1 of hadoop.
hadoop-0.20.203.0rc1.tar.gz  

Of course, the plug-in should also choose hadoop-0.20.203.0/contrib/eclipse-plugin

Hadoop-eclipse-plugin-0.20.203.0.jar

Eclipse can be used

Eclipse-jee-indigo-SR1-linux-gtk.tar.gz

2. Set the port number When configuring the hadoop development environment in eclipse

In MAP/reduce master

Port 9001, consistent with the mapred. Job. Tracker port in the hadoop configuration file mapred-site.xml

[hadoop@hdp0 conf]$ more mapred-site.xml<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property><name>mapred.job.tracker</name> <value>hdp0:9001</value></property></configuration>

DFS master

Port 9000, consistent with the port number of FS. Default. name in the hadoop configuration file core-site.xml.

[hadoop@hdp0 conf]$ more core-site.xml<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property><name>fs.default.name</name><value>hdfs://hdp0:9000</value></property></configuration>

Refer to the following description for a detailed article:
Configure the hadoop plug-in eclipse 1. Install the plug-in

Preparation Procedure:

Eclipse-3.3.2(This version of the plug-in can only use this version of eclipse)
Hadoop-0.20.2-eclipse-plugin.jar(Under the hadoop-0.20.2/contrib/eclipse-plugin directory)

Copy the hadoop-0.20.2-eclipse-plugin.jar to the eclipse/Plugins directory and restart eclipse.

2. Open the mapreduce View

Window-> open perspective-> Other: Select MAP/reduce. the icon is blue.

3. Add a mapreduce Environment

At the lower end of Eclipse, there will be a tab next to the console called "map/reduce locations". Right-click the blank area below and select "New hadoop location ...",:

In the displayed dialog box, enter the following content:

Location name(Name)
MAP/reduce master(The IP and port of the job tracker, based on the mapred. Job. Tracker configured in the mapred-site.xml)
DFS master(The IP address and port of the Name node, based on FS. Default. Name configured in the core-site.xml)

4. Use eclipse to modify HDFS content

After the previous step, the configured HDFS should appear in "project Explorer" on the left. Right-click the HDFS to create a folder, delete a folder, upload a file, download a file, and delete a file.

Note: Changes cannot be immediately displayed in eclipse after each operation. You must refresh the changes.

5. Create a mapreduce project 5.1 and configure the hadoop path.

Window-> preferences select "hadoop MAP/reduce" and click "Browse..." to select the path of the hadoop folder.
This step has nothing to do with the running environment. It only means that all the jar packages under the hadoop root directory and lib directory can be automatically imported when the project is created.

5.2 create a project

Select "map/reduce project" for file-> New-> Project, enter the project name, and create the project. The plug-in automatically imports all the jar packages under the hadoop root directory and lib directory.

5.3 create mapper or Reducer

File-> New-> mapper creates Mapper, automatically inherits the mapreducebase in the mapred package, and implements the Mapper interface.
Note: This plug-in automatically inherits the old classes and interfaces in the mapred package. You must write the new mapper by yourself.

The same is true for CER Cer.

6. Run wordcount program 6.1 in eclipse to import wordcount.

 1 import java.io.IOException; 2 import java.util.StringTokenizer; 3  4 import org.apache.hadoop.conf.Configuration; 5 import org.apache.hadoop.fs.Path; 6 import org.apache.hadoop.io.IntWritable; 7 import org.apache.hadoop.io.LongWritable; 8 import org.apache.hadoop.io.Text; 9 import org.apache.hadoop.mapreduce.Job;10 import org.apache.hadoop.mapreduce.Mapper;11 import org.apache.hadoop.mapreduce.Reducer;12 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;13 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;14 15 public class WordCount {16     public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable>{17 18         private final static IntWritable one = new IntWritable(1);19         private Text word = new Text();20 21         public void map(LongWritable key, Text value, Context context)22                 throws IOException, InterruptedException {23             StringTokenizer itr = new StringTokenizer(value.toString());24             while (itr.hasMoreTokens()) {25                 word.set(itr.nextToken());26                 context.write(word, one);27             }28         }29     }30 31     public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {32         private IntWritable result = new IntWritable();33 34         public void reduce(Text key, Iterable<IntWritable> values, Context context)35                 throws IOException, InterruptedException {36             int sum = 0;37             for (IntWritable val : values) {38                 sum += val.get();39             }40             result.set(sum);41             context.write(key, result);42         }43     }44 45     public static void main(String[] args) throws Exception {46         Configuration conf = new Configuration();47         if (args.length != 2) {48             System.err.println("Usage: wordcount  ");49             System.exit(2);50         }51 52         Job job = new Job(conf, "word count");53         job.setJarByClass(WordCount.class);54         job.setMapperClass(TokenizerMapper.class);55         job.setReducerClass(IntSumReducer.class);56         job.setMapOutputKeyClass(Text.class);57         job.setMapOutputValueClass(IntWritable.class);58         job.setOutputKeyClass(Text.class);59         job.setOutputValueClass(IntWritable.class);60 61         FileInputFormat.addInputPath(job, new Path(args[0]));62         FileOutputFormat.setOutputPath(job, new Path(args[1]));63 64         System.exit(job.waitForCompletion(true) ? 0 : 1);65 66     }67 68 }


6.2 configure running parameters

Run as-> open run dialog... select the wordcount program and configure the running parameters in arguments:/mapreduce/wordcount/input/mapreduce/wordcount/output/1

The input directory and output directory under HDFS respectively. The input directory contains several text files, and the output directory must not exist.

6.3 run

Run as-> run on hadoop select the previously configured mapreduce runtime environment, and click "finish" to run.

The console outputs relevant running information.

6.4 view running results

In the output directory/mapreduce/wordcount/output/1, you can see the output file of the wordcount program. In addition, you can see a logs folder containing running logs.

Reference 2:

Full history of hadoop learning-run the first mapreduce program in eclipse
    Blog type:

  • Hadoop
Hadoopeclipsemapreducejavaubuntu

Next article: hadoop Learning Record-hadoop getting started

This is the 2nd record of hadoop learning. In this article, I will introduce how to write the first mapreduce program in eclipse.

My development environment is described as follows:

Operating System: Ubuntu 10.10 is installed with wubi in windows.
Hadoop version: hadoop-0.20.2.tar.gz
Eclipse version: eclipse-jee-helios-SR1-linux-gtk.tar.gz

For the convenience of learning, this example is developed in the "pseudo distributed mode" hadoop installation mode.

Step 1: Start the hadoop daemon.
If you have read my 1st articles, the full history of hadoop learning-the introduction to hadoop should be clear about how to start the hadoop daemon in "pseudo-distributed mode.

Step 2: Install hadoop-plugin in eclipse.

1. Copy the hadoop installation directory/contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar to the eclipse installation directory/plugins.

2. Restart eclipse and configure hadoop installation directory.
If the plugin is successfully installed, open window --> preferens and you will find the hadoop MAP/reduce option. In this option, you need to configure hadoop installation directory. After the configuration is complete, exit.

3. Configure MAP/reduce locations.
Open Map/reduce locations in window --> show view.
Create a new hadoop location in MAP/reduce locations. In this view, right-click --> New hadoop location. In the pop-up dialog box, you need to configure the location name, such as myubuntu, as well as map/reduce master and DFS master. The host and port here are the addresses and ports you configured in the mapred-site.xml and core-site.xml respectively. For example:

MAP/reduce masterjava code

  1. Localhost
  2. 9001
Localhost9001

DFS masterjava code

  1. Localhost
  2. 9000
Localhost9000

Exit after configuration. Click DFS locations --> myubuntu. If the folder is displayed (2), the configuration is correct. If "no connection" is displayed, check your configuration.

Step 3: Create a project.
File --> New --> Other --> MAP/reduce project
Project names can be retrieved as needed, such as hadoop-test.
Copy the hadoop installation directory/src/example/org/Apache/hadoop/example/wordcount. Java to the project you just created.

Step 4: Upload the simulated data folder.
To run the program, we need an Input Folder and an output folder. Output Folder, which is automatically generated after the program runs successfully. We need to input a folder for the program.

1. Create the input folder in the current directory (such as the hadoop installation directory), and create two files file01 and file02 under the folder. The content of these two files is as follows:

File01java code

  1. Hello World bye world
Hello World bye world

File02java code

  1. Hello hadoop goodbye hadoop
Hello hadoop goodbye hadoop

2. Upload the folder input to the distributed file system.

In the hadoop daemon terminal that has started CD to the hadoop installation directory, run the following command: Java code

  1. Bin/hadoop FS-put input input01
Bin/hadoop FS-put input input01

This command uploads the Input Folder to the hadoop file system and adds an input01 folder to the system. You can run the following command to view the code in Java:

  1. Bin/hadoop FS-ls
Bin/hadoop FS-ls

Step 5: run the project.

1. In the newly created hadoop-test project, click wordcount. Java, right-click --> Run as --> RUN deployments
2. In the pop-up run deployments dialog box, click Java application, right-click --> new, and a new application named wordcount will be created.
3. Configure the running parameters, click arguments, and enter "the Input Folder you want to pass to the program and the folder you want the program to save the computing result" in program arguments, for example, Java code

  1. HDFS: // localhost: 9000/user/panhuizhi/input01 HDFS: // localhost: 9000/user/panhuizhi/output01
HDFS: // localhost: 9000/user/panhuizhi/input01 HDFS: // localhost: 9000/user/panhuizhi/output01

Here input01 is the folder you just uploaded. You can enter the folder address as needed.

4. Click Run to run the program.

Click Run to run the program. After a period of time, the program will be executed. After the execution is completed, run the following command on the terminal: Java code.

  1. Bin/hadoop FS-ls
Bin/hadoop FS-ls

Check whether the folder output01 is generated.

Run the following command to view the generated file content: Java code

  1. Bin/hadoop FS-cat output01 /*
Bin/hadoop FS-cat output01 /*

If the following figure is displayed, congratulations! You have successfully run the first mapreduce program in eclipse. Java code

  1. Bye 1
  2. Goodbye 1
  3. Hadoop 2
  4. Hello 2
  5. World 2
Bye1goodbye1hadoop2hello2world2

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.