Requirements: Count the number of occurrences of all the words in a file.
Boilerplate: Hadoop hive hbase Hadoop hive in Word.log file
Output: Hadoop 2
Hive 2
HBase 1
MapReduce Design Method:
First, the map process <k,v> key value team design:
1, the text file will be cut into <K1,V1>,K1 representative: the location of the line in the file, V1 representative: A row of data. How many times a <k1,v1> calls the map () method.
2, in the map () method to continue to divide a row of data into <K2,V2>,K2 representative: a divided Word, V2 representative: A word number, here is 1.
Second, reduce process <k,v> key value team design:
3, here will go through a series of processing: such as combine,partition,shuffle, such as the key in reduce the team <k3,v3>, K3 representative: The same key is merged together, V3 represents: The value of the same key list< Values>, here are all 1. How many <k3,v3> calls the reduce () method.
4, statistics out the number of words output format <K4,V4>,K4 representative: words, V4 representative: The total number of words.
Program implementation:
Wordcountmapper class
Packagecom.cn;Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Mapper; Public classWordcountmapperextendsMapper<object, text, text, intwritable> {Private Final StaticIntwritable one =NewIntwritable (1);PrivateText Word =NewText (); Public voidMap (Object key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (value.tostring ()); while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (Word, one); }}}
Wordcountreducer class
Packagecom.cn;Importjava.io.IOException;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Reducer; Public classWordcountreducerextendsReducer<text, Intwritable, Text, intwritable> { Privateintwritable result =Newintwritable (); Public voidReduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {intsum = 0; for(intwritable val:values) {sum+=Val.get (); } result.set (sum); Context.write (key, result); }}
WordCount class
Packagecom.cn;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.util.GenericOptionsParser; Public classWordCount { Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); String[] Otherargs=Newgenericoptionsparser (conf, args). Getremainingargs (); if(Otherargs.length! = 2) {System.err.println ("Usage:wordcount"); System.exit (2); } /**Create a job, name it to track the performance of the task **/Job Job=NewJob (conf, "word count"); /**when running a job on a Hadoop cluster, you need to package the code into a jar file (Hadoop distributes the file in the cluster), set a class through the setjarbyclass of the job, and Hadoop finds the jar file in this class **/Job.setjarbyclass (WordCount1.class); /**set the map, combiner, reduce type to use **/Job.setmapperclass (wordcountmapper.class); Job.setcombinerclass (wordcountreducer.class); Job.setreducerclass (wordcountreducer.class); /**set the input type of the map and reduce functions, there is no code because we use the default Textinputformat, for the text file, the line will be cut into inputsplits text file, and Linerecordreader Inputsplit parse to <key,value>: Yes, key is the position of the line in the file, and value is a line in the file **/ /**set output and output value types for map and reduce functions **/Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); /**setting input and output paths **/Fileinputformat.addinputpath (Job,NewPath (otherargs[0])); Fileoutputformat.setoutputpath (Job,NewPath (otherargs[1])); /**submit the job and wait for it to complete **/System.exit (Job.waitforcompletion (true) ? 0:1); } }
Hadoop commands to use:
Hadoop fs-mkdir/tmp/input
Hadoop fs-put/tmp/log/word.log/tmp/input/
Hadoop jar/tep/hadoop/wordcount.jar/tmp/input/tmp/output
Keep track of every learning process. It is best to try to analyze the run process.
WordCount of the Hadoop program MapReduce