Hadoop's MapReduce Program Application II

Source: Internet
Author: User
Tags static class hadoop fs

Summary: The MapReduce program makes a word count.

Keywords: MapReduce program word Count

Data Source: Manual construction of English document File1.txt,file2.txt.

File1.txt content

Hello Hadoop

I am studying the Hadoop technology

File2.txt Content

Hello World

The world is very beautiful

I love the Hadoop and world

Problem Description:

Statistics the frequency of words in an artificially constructed English document requires that the results of the output be sorted in the order of the word letters.

Solution:

1 Development tools: vm10+ ubuntu12.04+ Hadoop1.1.2

2 Design ideas: The English document content and divided into words, and then put all the same words together, and finally calculate the frequency of each word.

List of programs:

Package com.wangluqing;

Import java.io.IOException;
Import Java.util.StringTokenizer;

Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import Org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
public static class Tokenizermapper extends Mapper<object,text,text,intwritable> {
Private final static intwritable one = new intwritable (1);
Private text Word = new text ();
public void Map (Object key, Text value, Context context) throws Ioexception,interruptedexception {
StringTokenizer its = new StringTokenizer (value.tostring ());

while (Its.hasmoretokens ()) {
Word.set (Its.nexttoken ());
Context.write (Word,one);
}

}
}

public static class Intsumreducer extends Reducer<text,intwritable,text,intwritable> {
Private intwritable result = new intwritable ();
public void reduce (Text key, iterable<intwritable> values, context context) throws IOException, interruptedexception {
int sum = 0;
for (intwritable val:values) {
Sum + = Val.get ();
}
Result.set (sum);
Context.write (Key,result);
}
}

public static void Main (string[] args) throws Exception {
Configuration conf = new configuration ();
string[] Otherargs = new Genericoptionsparser (Conf,args). Getremainingargs ();
if (Otherargs.length!=2) {
System.err.println ("usage:wordcount<in><out>");
System.exit (2);
}
Job Job = new Job (conf, "word count");
Job.setjarbyclass (Wordcount.class);
Job.setmapperclass (Tokenizermapper.class);
Job.setcombinerclass (Intsumreducer.class);
Job.setreducerclass (Intsumreducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);

Fileinputformat.addinputpath (Job,new Path (otherargs[0]));
Fileoutputformat.setoutputpath (Job,new Path (otherargs[1]));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}

3 Execution procedures

1) Create an input directory

Hadoop Fs-mkdir Wordcount_input

2) upload local English documents

Hadoop fs-put/usr/local/datasource/article/* Wordcount_input

3) Compile the Wordcount.java program and store the results in the WordCount directory of the current directory.

root@hadoop:/usr/local/program/hadoop# Javac-classpath hadoop-core-1.1.2.jar:lib/commons-cli-1.2.jar-d WordCount Wordcount.java

4) make the compilation result into a jar package

JAR-CVF Wordcount.jar wordcount/.

5) Run WordCount program, input directory is wordcount_input, output directory is wordcount_output.

Hadoop jar Wordcount.jar Com.wangluqing.WordCount wordcount_input wordcount_output

6) View the results of each word frequency

root@hadoop:/usr/local/program/hadoop# Hadoop fs-cat wordcount_output/part-r-00000

Hadoop 3
Hello 2
I 2
The 1
AM 1
and 1
Beautiful 1
is 1
Love 1
Studying 1
Technology 1
The 2
Very 1
World 3

Summarize:

WordCount program is the simplest and most representative of the MapReduce program, to a certain extent the original intention of mapreduce design, that is, the analysis of log files.

Resource:

1 http://www.wangluqing.com/2014/03/hadoop-mapreduce-programapp2/

2 "Hadoop Combat second Edition" Lu Jiaheng the 5th chapter of the MapReduce application case

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.