Basic points for writing a mapreduce program with eclipse

Source: Internet
Author: User
Tags create directory safe mode

1. To write MapReduce on eclipse, you need to install the Hadoop plug-in on eclipse by installing the contrib/eclipse-plugin/in the Hadoop installation directory The Hadoop-0.20.2-eclipse-plugin.jar is copied to the plugins directory in the Eclipse installation directory.

2. After the plug-in installation is complete, you can create a new Map/reduce project, and in the Java file you need to import some of the packages provided by Hadoop, specifically:

Import org.apache.hadoop.conf.*;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.*;
Import org.apache.hadoop.mapreduce.*;
Import org.apache.hadoop.mapreduce.lib.input.*;
Import org.apache.hadoop.mapreduce.lib.output.*;
Import org.apache.hadoop.util.*;

Java file, you need to inherit two classes of mapper and reducer, and then re-implement the map function and the reduce function. In the MapReduce main class, you also implement the Run method and the main method, and the Run method sets the job name and the processing of the parameters.

3. Running the MapReduce program

There are 2 methods to run the MapReduce program on Hadoop, the first one is to run the required parameters directly on eclipse, and the results are printed on the console, which facilitates debugging; The setup parameters are shown in the following figure:


The middle may appear "org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot Create Directory ... Name node is in Safe mode "error when you need to force the cluster to leave safe

Bin/hadoop  Dfsadmin-safemode Leave

Run again as a parameter, the result



ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-ls
Found 4 Items
-rw-r--r--   1 ml\root          SuperGroup         2013-12-28 12:10/user/ml/root/input
drwxr-xr-x   -ml\root          supergroup          0 2013-12-28 12:13/user/ml/root/output
drwxr-xr-x   -ml\administrator supergroup          0 2013-12-28 17:08/ User/ml/root/output_arg
drwxr-xr-x   -ml\root          supergroup          0 2013-12-28 15:45/user/ml/root/ Output_ecl

administrator@ml ~/hadoop-0.20.2
$ bin/hadoop dfs-cat output_arg/part*
I       1
Oh      1
am      1
father  1
Hello   1
shit    1
your    1

The results are consistent with expectations.


The other is to put the program into a jar package and send the resulting package to the Hadoop installation directory.

Then execute the JAR package as a command line,

ADMINISTRATOR@ML ~/hadoop-0.20.2 $ bin/hadoop jar Mywordcount.jar mrtest input output_ecl 13/12/28 15:44:58 INFO Inpu T.fileinputformat:total input paths to process:1 13/12/28 15:45:01 INFO mapred. Jobclient:running job:job_201312281151_0008 13/12/28 15:45:02 INFO mapred. Jobclient:map 0% reduce 0% 13/12/28 15:45:16 INFO mapred. Jobclient:map 100% reduce 0% 13/12/28 15:45:28 INFO mapred. Jobclient:map 100% reduce 100% 13/12/28 15:45:30 INFO mapred. Jobclient:job complete:job_201312281151_0008 13/12/28 15:45:30 INFO mapred. Jobclient:counters:17 13/12/28 15:45:30 INFO mapred. Jobclient:job Counters 13/12/28 15:45:30 INFO mapred. jobclient:launched reduce Tasks=1 13/12/28 15:45:30 INFO mapred. jobclient:launched map Tasks=1 13/12/28 15:45:30 INFO mapred. Jobclient:data-local map Tasks=1 13/12/28 15:45:30 INFO mapred. Jobclient:filesystemcounters 13/12/28 15:45:30 INFO mapred. jobclient:file_bytes_read=160 13/12/28 15:45:30 INFO mapred. Jobclient:hdfs_bytes_read=33 13/12/28 15:45:30 INFO mapred. jobclient:file_bytes_written=271 13/12/28 15:45:30 INFO mapred. jobclient:hdfs_bytes_written=45 13/12/28 15:45:30 INFO mapred. Jobclient:map-reduce Framework 13/12/28 15:45:30 INFO mapred. Jobclient:reduce input groups=7 13/12/28 15:45:30 INFO mapred. Jobclient:combine output records=7 13/12/28 15:45:30 INFO mapred. Jobclient:map input records=2 13/12/28 15:45:30 INFO mapred. Jobclient:reduce Shuffle bytes=0 13/12/28 15:45:30 INFO mapred. Jobclient:reduce output records=7 13/12/28 15:45:30 INFO mapred. jobclient:spilled records=14 13/12/28 15:45:30 INFO mapred. Jobclient:map output bytes=59 13/12/28 15:45:30 INFO mapred. Jobclient:combine input records=7 13/12/28 15:45:30 INFO mapred. Jobclient:map output records=7 13/12/28 15:45:30 INFO mapred.
 Jobclient:reduce input records=7
ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-ls output_ecl/*
drwxr-xr-x   -ml\root supergroup          0 2013-12-28 15:45/user/ml/root/output_ecl/_logs/history
-rw-r--r--   1 ml\root supergroup         45 2013-12-28 15:45/user/ml/root/output_ecl/part-r-00000

ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-cat output_ecl/part*
I       1
Oh      1
am      1
father  1
Hello   1
shit    1
your    1

The program executes perfectly.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.