Basic points for writing a mapreduce program with eclipse

Last Update:2018-07-20 Source: Internet

Author: User

Tags create directory safe mode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. To write MapReduce on eclipse, you need to install the Hadoop plug-in on eclipse by installing the contrib/eclipse-plugin/in the Hadoop installation directory The Hadoop-0.20.2-eclipse-plugin.jar is copied to the plugins directory in the Eclipse installation directory.

2. After the plug-in installation is complete, you can create a new Map/reduce project, and in the Java file you need to import some of the packages provided by Hadoop, specifically:

Import org.apache.hadoop.conf.*;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.*;
Import org.apache.hadoop.mapreduce.*;
Import org.apache.hadoop.mapreduce.lib.input.*;
Import org.apache.hadoop.mapreduce.lib.output.*;
Import org.apache.hadoop.util.*;

Java file, you need to inherit two classes of mapper and reducer, and then re-implement the map function and the reduce function. In the MapReduce main class, you also implement the Run method and the main method, and the Run method sets the job name and the processing of the parameters.

3. Running the MapReduce program

There are 2 methods to run the MapReduce program on Hadoop, the first one is to run the required parameters directly on eclipse, and the results are printed on the console, which facilitates debugging; The setup parameters are shown in the following figure:

The middle may appear "org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot Create Directory ... Name node is in Safe mode "error when you need to force the cluster to leave safe

Bin/hadoop  Dfsadmin-safemode Leave

Run again as a parameter, the result

ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-ls
Found 4 Items
-rw-r--r--   1 ml\root          SuperGroup         2013-12-28 12:10/user/ml/root/input
drwxr-xr-x   -ml\root          supergroup          0 2013-12-28 12:13/user/ml/root/output
drwxr-xr-x   -ml\administrator supergroup          0 2013-12-28 17:08/ User/ml/root/output_arg
drwxr-xr-x   -ml\root          supergroup          0 2013-12-28 15:45/user/ml/root/ Output_ecl

administrator@ml ~/hadoop-0.20.2
$ bin/hadoop dfs-cat output_arg/part*
I       1
Oh      1
am      1
father  1
Hello   1
shit    1
your    1

The results are consistent with expectations.

The other is to put the program into a jar package and send the resulting package to the Hadoop installation directory.

Then execute the JAR package as a command line,

ADMINISTRATOR@ML ~/hadoop-0.20.2 $ bin/hadoop jar Mywordcount.jar mrtest input output_ecl 13/12/28 15:44:58 INFO Inpu T.fileinputformat:total input paths to process:1 13/12/28 15:45:01 INFO mapred. Jobclient:running job:job_201312281151_0008 13/12/28 15:45:02 INFO mapred. Jobclient:map 0% reduce 0% 13/12/28 15:45:16 INFO mapred. Jobclient:map 100% reduce 0% 13/12/28 15:45:28 INFO mapred. Jobclient:map 100% reduce 100% 13/12/28 15:45:30 INFO mapred. Jobclient:job complete:job_201312281151_0008 13/12/28 15:45:30 INFO mapred. Jobclient:counters:17 13/12/28 15:45:30 INFO mapred. Jobclient:job Counters 13/12/28 15:45:30 INFO mapred. jobclient:launched reduce Tasks=1 13/12/28 15:45:30 INFO mapred. jobclient:launched map Tasks=1 13/12/28 15:45:30 INFO mapred. Jobclient:data-local map Tasks=1 13/12/28 15:45:30 INFO mapred. Jobclient:filesystemcounters 13/12/28 15:45:30 INFO mapred. jobclient:file_bytes_read=160 13/12/28 15:45:30 INFO mapred. Jobclient:hdfs_bytes_read=33 13/12/28 15:45:30 INFO mapred. jobclient:file_bytes_written=271 13/12/28 15:45:30 INFO mapred. jobclient:hdfs_bytes_written=45 13/12/28 15:45:30 INFO mapred. Jobclient:map-reduce Framework 13/12/28 15:45:30 INFO mapred. Jobclient:reduce input groups=7 13/12/28 15:45:30 INFO mapred. Jobclient:combine output records=7 13/12/28 15:45:30 INFO mapred. Jobclient:map input records=2 13/12/28 15:45:30 INFO mapred. Jobclient:reduce Shuffle bytes=0 13/12/28 15:45:30 INFO mapred. Jobclient:reduce output records=7 13/12/28 15:45:30 INFO mapred. jobclient:spilled records=14 13/12/28 15:45:30 INFO mapred. Jobclient:map output bytes=59 13/12/28 15:45:30 INFO mapred. Jobclient:combine input records=7 13/12/28 15:45:30 INFO mapred. Jobclient:map output records=7 13/12/28 15:45:30 INFO mapred.
 Jobclient:reduce input records=7

ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-ls output_ecl/*
drwxr-xr-x   -ml\root supergroup          0 2013-12-28 15:45/user/ml/root/output_ecl/_logs/history
-rw-r--r--   1 ml\root supergroup         45 2013-12-28 15:45/user/ml/root/output_ecl/part-r-00000

ADMINISTRATOR@ML ~/hadoop-0.20.2
$ bin/hadoop dfs-cat output_ecl/part*
I       1
Oh      1
am      1
father  1
Hello   1
shit    1
your    1

The program executes perfectly.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More