Hadoop's MapReduce Program Application II

Last Update:2018-07-20 Source: Internet

Author: User

Tags static class hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summary: The MapReduce program makes a word count.

Keywords: MapReduce program word Count

Data Source: Manual construction of English document File1.txt,file2.txt.

File1.txt content

Hello Hadoop

I am studying the Hadoop technology

File2.txt Content

Hello World

The world is very beautiful

I love the Hadoop and world

Problem Description:

Statistics the frequency of words in an artificially constructed English document requires that the results of the output be sorted in the order of the word letters.

Solution:

1 Development tools: vm10+ ubuntu12.04+ Hadoop1.1.2

2 Design ideas: The English document content and divided into words, and then put all the same words together, and finally calculate the frequency of each word.

List of programs:

Package com.wangluqing;

Import java.io.IOException;
Import Java.util.StringTokenizer;

Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import Org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
public static class Tokenizermapper extends Mapper<object,text,text,intwritable> {
Private final static intwritable one = new intwritable (1);
Private text Word = new text ();
public void Map (Object key, Text value, Context context) throws Ioexception,interruptedexception {
StringTokenizer its = new StringTokenizer (value.tostring ());

while (Its.hasmoretokens ()) {
Word.set (Its.nexttoken ());
Context.write (Word,one);
}

}
}

public static class Intsumreducer extends Reducer<text,intwritable,text,intwritable> {
Private intwritable result = new intwritable ();
public void reduce (Text key, iterable<intwritable> values, context context) throws IOException, interruptedexception {
int sum = 0;
for (intwritable val:values) {
Sum + = Val.get ();
}
Result.set (sum);
Context.write (Key,result);
}
}

public static void Main (string[] args) throws Exception {
Configuration conf = new configuration ();
string[] Otherargs = new Genericoptionsparser (Conf,args). Getremainingargs ();
if (Otherargs.length!=2) {
System.err.println ("usage:wordcount<in><out>");
System.exit (2);
}
Job Job = new Job (conf, "word count");
Job.setjarbyclass (Wordcount.class);
Job.setmapperclass (Tokenizermapper.class);
Job.setcombinerclass (Intsumreducer.class);
Job.setreducerclass (Intsumreducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);

Fileinputformat.addinputpath (Job,new Path (otherargs[0]));
Fileoutputformat.setoutputpath (Job,new Path (otherargs[1]));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}

3 Execution procedures

1) Create an input directory

Hadoop Fs-mkdir Wordcount_input

2) upload local English documents

Hadoop fs-put/usr/local/datasource/article/* Wordcount_input

3) Compile the Wordcount.java program and store the results in the WordCount directory of the current directory.

root@hadoop:/usr/local/program/hadoop# Javac-classpath hadoop-core-1.1.2.jar:lib/commons-cli-1.2.jar-d WordCount Wordcount.java

4) make the compilation result into a jar package

JAR-CVF Wordcount.jar wordcount/.

5) Run WordCount program, input directory is wordcount_input, output directory is wordcount_output.

Hadoop jar Wordcount.jar Com.wangluqing.WordCount wordcount_input wordcount_output

6) View the results of each word frequency

root@hadoop:/usr/local/program/hadoop# Hadoop fs-cat wordcount_output/part-r-00000

Hadoop 3
Hello 2
I 2
The 1
AM 1
and 1
Beautiful 1
is 1
Love 1
Studying 1
Technology 1
The 2
Very 1
World 3

Summarize:

WordCount program is the simplest and most representative of the MapReduce program, to a certain extent the original intention of mapreduce design, that is, the analysis of log files.

Resource:

1 http://www.wangluqing.com/2014/03/hadoop-mapreduce-programapp2/

2 "Hadoop Combat second Edition" Lu Jiaheng the 5th chapter of the MapReduce application case

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More