WordCount of the Hadoop program MapReduce

Last Update:2016-08-06 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Requirements: Count the number of occurrences of all the words in a file.

Boilerplate: Hadoop hive hbase Hadoop hive in Word.log file

Output: Hadoop 2

Hive 2

HBase 1

MapReduce Design Method:

First, the map process <k,v> key value team design:

1, the text file will be cut into <K1,V1>,K1 representative: the location of the line in the file, V1 representative: A row of data. How many times a <k1,v1> calls the map () method.

2, in the map () method to continue to divide a row of data into <K2,V2>,K2 representative: a divided Word, V2 representative: A word number, here is 1.

Second, reduce process <k,v> key value team design:

3, here will go through a series of processing: such as combine,partition,shuffle, such as the key in reduce the team <k3,v3>, K3 representative: The same key is merged together, V3 represents: The value of the same key list< Values>, here are all 1. How many <k3,v3> calls the reduce () method.

4, statistics out the number of words output format <K4,V4>,K4 representative: words, V4 representative: The total number of words.

Program implementation:

Wordcountmapper class

 Packagecom.cn;Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Mapper; Public classWordcountmapperextendsMapper<object, text, text, intwritable> {Private Final StaticIntwritable one =NewIntwritable (1);PrivateText Word =NewText (); Public voidMap (Object key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (value.tostring ()); while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (Word, one); }}}

Wordcountreducer class

 Packagecom.cn;Importjava.io.IOException;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Reducer; Public classWordcountreducerextendsReducer<text, Intwritable, Text, intwritable> {    Privateintwritable result =Newintwritable ();  Public voidReduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {sum+=Val.get ();        } result.set (sum);    Context.write (key, result); }}

WordCount class

 Packagecom.cn;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.util.GenericOptionsParser; Public classWordCount { Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); String[] Otherargs=Newgenericoptionsparser (conf, args). Getremainingargs (); if(Otherargs.length! = 2) {System.err.println ("Usage:wordcount"); System.exit (2); }             /**Create a job, name it to track the performance of the task **/Job Job=NewJob (conf, "word count"); /**when running a job on a Hadoop cluster, you need to package the code into a jar file (Hadoop distributes the file in the cluster), set a class through the setjarbyclass of the job, and Hadoop finds the jar file in this class **/Job.setjarbyclass (WordCount1.class); /**set the map, combiner, reduce type to use **/Job.setmapperclass (wordcountmapper.class); Job.setcombinerclass (wordcountreducer.class); Job.setreducerclass (wordcountreducer.class); /**set the input type of the map and reduce functions, there is no code because we use the default Textinputformat, for the text file, the line will be cut into inputsplits text file, and Linerecordreader Inputsplit parse to <key,value>: Yes, key is the position of the line in the file, and value is a line in the file **/            /**set output and output value types for map and reduce functions **/Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); /**setting input and output paths **/Fileinputformat.addinputpath (Job,NewPath (otherargs[0])); Fileoutputformat.setoutputpath (Job,NewPath (otherargs[1])); /**submit the job and wait for it to complete **/System.exit (Job.waitforcompletion (true) ? 0:1); }    }

Hadoop commands to use:

Hadoop fs-mkdir/tmp/input

Hadoop fs-put/tmp/log/word.log/tmp/input/

Hadoop jar/tep/hadoop/wordcount.jar/tmp/input/tmp/output

Keep track of every learning process. It is best to try to analyze the run process.

WordCount of the Hadoop program MapReduce

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More