Summary: The MapReduce program makes a word count.
Keywords: MapReduce program word Count
Data Source: Manual construction of English document File1.txt,file2.txt.
File1.txt content
Hello Hadoop
I am studying the Hadoop technology
File2.txt Content
Hello World
The world is very beautiful
I love the Hadoop and world
Problem Description:
Statistics the frequency of words in an artificially constructed English document requires that the results of the output be sorted in the order of the word letters.
Solution:
1 Development tools: vm10+ ubuntu12.04+ Hadoop1.1.2
2 Design ideas: The English document content and divided into words, and then put all the same words together, and finally calculate the frequency of each word.
List of programs:
Package com.wangluqing;
Import java.io.IOException;
Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class Tokenizermapper extends Mapper<object,text,text,intwritable> {
Private final static intwritable one = new intwritable (1);
Private text Word = new text ();
public void Map (Object key, Text value, Context context) throws Ioexception,interruptedexception {
StringTokenizer its = new StringTokenizer (value.tostring ());
while (Its.hasmoretokens ()) {
Word.set (Its.nexttoken ());
Context.write (Word,one);
}
}
}
public static class Intsumreducer extends Reducer<text,intwritable,text,intwritable> {
Private intwritable result = new intwritable ();
public void reduce (Text key, iterable<intwritable> values, context context) throws IOException, interruptedexception {
int sum = 0;
for (intwritable val:values) {
Sum + = Val.get ();
}
Result.set (sum);
Context.write (Key,result);
}
}
public static void Main (string[] args) throws Exception {
Configuration conf = new configuration ();
string[] Otherargs = new Genericoptionsparser (Conf,args). Getremainingargs ();
if (Otherargs.length!=2) {
System.err.println ("usage:wordcount<in><out>");
System.exit (2);
}
Job Job = new Job (conf, "word count");
Job.setjarbyclass (Wordcount.class);
Job.setmapperclass (Tokenizermapper.class);
Job.setcombinerclass (Intsumreducer.class);
Job.setreducerclass (Intsumreducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);
Fileinputformat.addinputpath (Job,new Path (otherargs[0]));
Fileoutputformat.setoutputpath (Job,new Path (otherargs[1]));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
3 Execution procedures
1) Create an input directory
Hadoop Fs-mkdir Wordcount_input
2) upload local English documents
Hadoop fs-put/usr/local/datasource/article/* Wordcount_input
3) Compile the Wordcount.java program and store the results in the WordCount directory of the current directory.
root@hadoop:/usr/local/program/hadoop# Javac-classpath hadoop-core-1.1.2.jar:lib/commons-cli-1.2.jar-d WordCount Wordcount.java
4) make the compilation result into a jar package
JAR-CVF Wordcount.jar wordcount/.
5) Run WordCount program, input directory is wordcount_input, output directory is wordcount_output.
Hadoop jar Wordcount.jar Com.wangluqing.WordCount wordcount_input wordcount_output
6) View the results of each word frequency
root@hadoop:/usr/local/program/hadoop# Hadoop fs-cat wordcount_output/part-r-00000
Hadoop 3
Hello 2
I 2
The 1
AM 1
and 1
Beautiful 1
is 1
Love 1
Studying 1
Technology 1
The 2
Very 1
World 3
Summarize:
WordCount program is the simplest and most representative of the MapReduce program, to a certain extent the original intention of mapreduce design, that is, the analysis of log files.
Resource:
1 http://www.wangluqing.com/2014/03/hadoop-mapreduce-programapp2/
2 "Hadoop Combat second Edition" Lu Jiaheng the 5th chapter of the MapReduce application case