Environmental Requirements
Description: This document writes and runs the documentation for WordCount's mapreduce job.
Operating system: UBUNTU14 x64 bit
Hadoop:hadoop 2.7.0
Hadoop website: http://hadoop.apache.org/releases.html
MapReduce Reference website steps:
Http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/ Mapreducetutorial.html#source_code
This chapter is based on the previous article "hadoop2.7.0 Practice-environment construction".
1. Install Eclipse
1) Download eclipse
Official website: http://www.eclipse.org/
2) unzip the Eclipse pack
$tar-xvf eclipse-jee-mars-R-linux-gtk-x86_64.tar.gz
3) Start eclipse
4) write the test program
publicclass TestMore { publicstaticvoidmain(String[] args) { System.out.println("hello world!"); System.out.println("I‘m so glad to see that"); }}
2. Writing WordCount
1) Jar Package Introduction
Jar packages introduced in the lib of Eclipse
Each directory under the Share/hadoop under the Hadoop package has a jar package
Hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0.jar
Hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.0.jar
2) Write Worcount program
Corresponding source code
Import Java. IO. IOException;Import Java. Util. StringTokenizer;import org. Apache. Hadoop. conf. Configuration;import org. Apache. Hadoop. FS. Path;import org. Apache. Hadoop. IO. Intwritable;import org. Apache. Hadoop(i). Text;import org. Apache. Hadoop. MapReduce. Job;import org. Apache. Hadoop. MapReduce. Mapper;import org. Apache. Hadoop. MapReduce. Reducer;import org. Apache. Hadoop. MapReduce. Lib. Input. Fileinputformat;import org. Apache. Hadoop. MapReduce. Lib. Output. Fileoutputformat;public class WordCount {public static class Tokenizermapper extends Mapper<object, text, text, intwritable>{ Private final static intwritable one = new Intwritable (1);Private text Word = new text ();public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception { StringTokenizer ITR = new StringTokenizer (value. toString());while (ITR. Hasmoretokens()) {Word. Set(ITR. NextToken());Context. Write(Word, one);}}} public static class Intsumreducer extends Reducer<text,intwritable,text,intwritable> {private I ntwritable result = new Intwritable ();public void reduce (Text key, iterable<intwritable> values, context context ) throws IOException, interruptedexception {int sum =0;for (intwritable val:values) {sum + = Val. Get();} result. Set(sum);Context. Write(Key, result);}} public static void Main (string[] args) throws Exception {configuration conf = new Configuration ();Job Job = Job. getinstance(Conf,"Word Count");Job. Setjarbyclass(WordCount. Class);Job. Setmapperclass(Tokenizermapper. Class);Job. Setcombinerclass(Intsumreducer. Class);Job. Setreducerclass(Intsumreducer. Class);Job. Setoutputkeyclass(Text. Class);Job. Setoutputvalueclass(intwritable. Class);Fileinputformat. Addinputpath(Job, New Path (args[0]));Fileoutputformat. Setoutputpath(Job, New Path (args[1]));System. Exit(Job. WaitForCompletion(true)?0:1);}}
3) Export JAR Package
Named Wc.jar, exported directly to the Hadoop directory.
3. Running WordCount
1) Start the DFS service
Reference Document "hadoop2.7.0 Practice-environment construction".
Go to the Hadoop directory and use the CD command.
$sbin/start-dfs.sh
Corresponding View page: http://localhost:50070/
2) Prepare documents
Hadoop-2.7.0/wctest/input file in the directory to be counted file01
What to enter: Hello World bye World
Create an HDFs directory, action commands similar to local operations
$ bin/hdfs fs -mkdir /user$ bin/hdfs fs -mkdir /user/a
Copying local files into HDFs
$ bin/hdfs fs -put wctest/input /user/a/input
Note: The corresponding directory Delete command is as follows
delete dir:bin/hadoop fs -rm -f -r /user/a/input
corresponding file http://localhost:50070/
3) Start yarn Service
$ sbin/start-yarn.sh
4) Run WordCount program
$ bin/hadoop jar wc.jar WordCount /user/a/input /user/a/output
5) View Results
-cat /user/a/output/part-r-000001hello 1world 2
Common Errors and explanations
1) Run MapReduce program when yarn is not started
Cause: Yarn has been configured, but not triggered by startup
Adjustment: Start yarn
$ sbin/start-yarn.sh
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
hadoop2.7.0 Practice-WordCount