1. To write MapReduce on eclipse, you need to install the Hadoop plug-in on eclipse by installing the contrib/eclipse-plugin/in the Hadoop installation directory The Hadoop-0.20.2-eclipse-plugin.jar is copied to the plugins directory in the Eclipse installation directory.
2. After the plug-in installation is complete, you can create a new Map/reduce project, and in the Java file you need to import some of the packages provided by Hadoop, specifically:
Import org.apache.hadoop.conf.*;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.*;
Import org.apache.hadoop.mapreduce.*;
Import org.apache.hadoop.mapreduce.lib.input.*;
Import org.apache.hadoop.mapreduce.lib.output.*;
Import org.apache.hadoop.util.*;
Java file, you need to inherit two classes of mapper and reducer, and then re-implement the map function and the reduce function. In the MapReduce main class, you also implement the Run method and the main method, and the Run method sets the job name and the processing of the parameters.
3. Running the MapReduce program
There are 2 methods to run the MapReduce program on Hadoop, the first one is to run the required parameters directly on eclipse, and the results are printed on the console, which facilitates debugging; The setup parameters are shown in the following figure:
The middle may appear "org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot Create Directory ... Name node is in Safe mode "error when you need to force the cluster to leave safe
Bin/hadoop Dfsadmin-safemode Leave
Run again as a parameter, the result
ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-ls
Found 4 Items
-rw-r--r-- 1 ml\root SuperGroup 2013-12-28 12:10/user/ml/root/input
drwxr-xr-x -ml\root supergroup 0 2013-12-28 12:13/user/ml/root/output
drwxr-xr-x -ml\administrator supergroup 0 2013-12-28 17:08/ User/ml/root/output_arg
drwxr-xr-x -ml\root supergroup 0 2013-12-28 15:45/user/ml/root/ Output_ecl
administrator@ml ~/hadoop-0.20.2
$ bin/hadoop dfs-cat output_arg/part*
I 1
Oh 1
am 1
father 1
Hello 1
shit 1
your 1
The results are consistent with expectations.
The other is to put the program into a jar package and send the resulting package to the Hadoop installation directory.
Then execute the JAR package as a command line,
ADMINISTRATOR@ML ~/hadoop-0.20.2 $ bin/hadoop jar Mywordcount.jar mrtest input output_ecl 13/12/28 15:44:58 INFO Inpu T.fileinputformat:total input paths to process:1 13/12/28 15:45:01 INFO mapred. Jobclient:running job:job_201312281151_0008 13/12/28 15:45:02 INFO mapred. Jobclient:map 0% reduce 0% 13/12/28 15:45:16 INFO mapred. Jobclient:map 100% reduce 0% 13/12/28 15:45:28 INFO mapred. Jobclient:map 100% reduce 100% 13/12/28 15:45:30 INFO mapred. Jobclient:job complete:job_201312281151_0008 13/12/28 15:45:30 INFO mapred. Jobclient:counters:17 13/12/28 15:45:30 INFO mapred. Jobclient:job Counters 13/12/28 15:45:30 INFO mapred. jobclient:launched reduce Tasks=1 13/12/28 15:45:30 INFO mapred. jobclient:launched map Tasks=1 13/12/28 15:45:30 INFO mapred. Jobclient:data-local map Tasks=1 13/12/28 15:45:30 INFO mapred. Jobclient:filesystemcounters 13/12/28 15:45:30 INFO mapred. jobclient:file_bytes_read=160 13/12/28 15:45:30 INFO mapred. Jobclient:hdfs_bytes_read=33 13/12/28 15:45:30 INFO mapred. jobclient:file_bytes_written=271 13/12/28 15:45:30 INFO mapred. jobclient:hdfs_bytes_written=45 13/12/28 15:45:30 INFO mapred. Jobclient:map-reduce Framework 13/12/28 15:45:30 INFO mapred. Jobclient:reduce input groups=7 13/12/28 15:45:30 INFO mapred. Jobclient:combine output records=7 13/12/28 15:45:30 INFO mapred. Jobclient:map input records=2 13/12/28 15:45:30 INFO mapred. Jobclient:reduce Shuffle bytes=0 13/12/28 15:45:30 INFO mapred. Jobclient:reduce output records=7 13/12/28 15:45:30 INFO mapred. jobclient:spilled records=14 13/12/28 15:45:30 INFO mapred. Jobclient:map output bytes=59 13/12/28 15:45:30 INFO mapred. Jobclient:combine input records=7 13/12/28 15:45:30 INFO mapred. Jobclient:map output records=7 13/12/28 15:45:30 INFO mapred.
Jobclient:reduce input records=7
ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-ls output_ecl/*
drwxr-xr-x -ml\root supergroup 0 2013-12-28 15:45/user/ml/root/output_ecl/_logs/history
-rw-r--r-- 1 ml\root supergroup 45 2013-12-28 15:45/user/ml/root/output_ecl/part-r-00000
ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-cat output_ecl/part*
I 1
Oh 1
am 1
father 1
Hello 1
shit 1
your 1
The program executes perfectly.