WordCount of the Hadoop program MapReduce

Source: Internet
Author: User
Tags hadoop fs

Requirements: Count the number of occurrences of all the words in a file.

Boilerplate: Hadoop hive hbase Hadoop hive in Word.log file

Output: Hadoop 2

Hive 2

HBase 1

MapReduce Design Method:

First, the map process <k,v> key value team design:

1, the text file will be cut into <K1,V1>,K1 representative: the location of the line in the file, V1 representative: A row of data. How many times a <k1,v1> calls the map () method.

2, in the map () method to continue to divide a row of data into <K2,V2>,K2 representative: a divided Word, V2 representative: A word number, here is 1.

Second, reduce process <k,v> key value team design:

3, here will go through a series of processing: such as combine,partition,shuffle, such as the key in reduce the team <k3,v3>, K3 representative: The same key is merged together, V3 represents: The value of the same key list< Values>, here are all 1. How many <k3,v3> calls the reduce () method.

4, statistics out the number of words output format <K4,V4>,K4 representative: words, V4 representative: The total number of words.

Program implementation:

Wordcountmapper class

 Packagecom.cn;Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Mapper; Public classWordcountmapperextendsMapper<object, text, text, intwritable> {Private Final StaticIntwritable one =NewIntwritable (1);PrivateText Word =NewText (); Public voidMap (Object key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (value.tostring ()); while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (Word, one); }}}

Wordcountreducer class

 Packagecom.cn;Importjava.io.IOException;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Reducer; Public classWordcountreducerextendsReducer<text, Intwritable, Text, intwritable> {    Privateintwritable result =Newintwritable ();  Public voidReduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {sum+=Val.get ();        } result.set (sum);    Context.write (key, result); }}

WordCount class

 Packagecom.cn;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.util.GenericOptionsParser; Public classWordCount { Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); String[] Otherargs=Newgenericoptionsparser (conf, args). Getremainingargs (); if(Otherargs.length! = 2) {System.err.println ("Usage:wordcount"); System.exit (2); }             /**Create a job, name it to track the performance of the task **/Job Job=NewJob (conf, "word count"); /**when running a job on a Hadoop cluster, you need to package the code into a jar file (Hadoop distributes the file in the cluster), set a class through the setjarbyclass of the job, and Hadoop finds the jar file in this class **/Job.setjarbyclass (WordCount1.class); /**set the map, combiner, reduce type to use **/Job.setmapperclass (wordcountmapper.class); Job.setcombinerclass (wordcountreducer.class); Job.setreducerclass (wordcountreducer.class); /**set the input type of the map and reduce functions, there is no code because we use the default Textinputformat, for the text file, the line will be cut into inputsplits text file, and Linerecordreader Inputsplit parse to <key,value>: Yes, key is the position of the line in the file, and value is a line in the file **/            /**set output and output value types for map and reduce functions **/Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); /**setting input and output paths **/Fileinputformat.addinputpath (Job,NewPath (otherargs[0])); Fileoutputformat.setoutputpath (Job,NewPath (otherargs[1])); /**submit the job and wait for it to complete **/System.exit (Job.waitforcompletion (true) ? 0:1); }    }

Hadoop commands to use:

Hadoop fs-mkdir/tmp/input

Hadoop fs-put/tmp/log/word.log/tmp/input/

Hadoop jar/tep/hadoop/wordcount.jar/tmp/input/tmp/output

Keep track of every learning process. It is best to try to analyze the run process.

WordCount of the Hadoop program MapReduce

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.