MapReduce implements a simple word counting function.
One, get ready: Eclipse installs the Hadoop plugin:
Download the relevant version of Hadoop-eclipse-plugin-2.2.0.jar to Eclipse/plugins.
Second, realize:
New MapReduce Project
Map is used for word segmentation, reduce count.
PackageTank.demo;Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * @authorTank * @date: January 5, 2015 Morning 10:03:43 * @description: Word -Register *@version: 0.1*/ Public classWordCount { Public Static classTokenizermapperextendsmapper<longwritable, text, text, intwritable> { Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); Public voidMap (longwritable key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (value.tostring ()); while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (Word, one); } } } Public Static classIntsumreducerextendsReducer<text, Intwritable, Text, intwritable> { Privateintwritable result =Newintwritable (); Public voidReduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {intsum = 0; for(intwritable val:values) {sum+=Val.get (); } result.set (sum); Context.write (key, result); } } Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); if(Args.length! = 2) {System.err.println ("Usage:wordcount"); System.exit (2); } Job Job=NewJob (conf, "word count"); //Main classJob.setjarbyclass (WordCount.class); Job.setmapperclass (tokenizermapper.class); Job.setreducerclass (intsumreducer.class); //Map output FormatJob.setmapoutputkeyclass (Text.class); Job.setmapoutputvalueclass (intwritable.class); //output FormatJob.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ? 0:1); }}
Packing World-count.jar
Three, prepare to enter data
Hadoop fs-mkdir/user/hadoop/input//Build the input directory.
Write some data files.
echo Hello my Hadoop This is my first application>file1
echo Hello world my deer my applicaiton >file2
Copy to HDFs
Hadoop fs-put file*/user/hadoop/input
Hadoop fs-ls/user/hadoop/input//view
Four, run
Upload to a clustered environment:
Hadoop jar World-count.jar WordCount Input Output
Intercept a section of output such as:
15/01/05 11:14:36 INFO mapred. Task:Task:attempt_local1938802295_0001_r_000000_0 is done. and is in the process of committing
15/01/05 11:14:36 INFO mapred. Localjobrunner:
15/01/05 11:14:36 INFO mapred. Task:task Attempt_local1938802295_0001_r_000000_0 is allowed to commit now
15/01/05 11:14:36 INFO output. Fileoutputcommitter:Saved output of Task ' attempt_local1938802295_0001_r_000000_0 ' to Hdfs://192.168.183.130:9000/user/hadoop/output /_temporary/0/task_local1938802295_0001_r_000000
15/01/05 11:14:36 INFO mapred. Localjobrunner:reduce > Reduce
15/01/05 11:14:36 INFO mapred. Task:task ' Attempt_local1938802295_0001_r_000000_0 ' done.
15/01/05 11:14:36 INFO MapReduce. Job:job job_local1938802295_0001 running in Uber Mode:false
15/01/05 11:14:36 INFO MapReduce. Job:map 100% Reduce 100%
15/01/05 11:14:36 INFO MapReduce. Job:job JOB_LOCAL1938802295_0001 completed successfully
15/01/05 11:14:36 INFO MapReduce. Job:counters:32
File System Counters
File:number of bytes read=17706
File:number of bytes written=597506
File:number of Read operations=0
File:number of Large Read operations=0
File:number of Write Operations=0
Hdfs:number of bytes read=205
Hdfs:number of bytes written=85
Hdfs:number of Read operations=25
Hdfs:number of Large Read operations=0
Hdfs:number of Write Operations=5
Map-reduce Framework
Map input records=2
Map Output records=14
Map Output bytes=136
Map output materialized bytes=176
Input Split bytes=232
Combine input Records=0
Combine Output Records=0
Reduce input groups=10
Reduce Shuffle bytes=0
Reduce input records=14
Reduce Output records=10
Spilled records=28
Shuffled Maps =0
Failed shuffles=0
Merged Map Outputs=0
GC time Elapsed (ms) =67
CPU Time Spent (ms) =0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes) =456536064
File Input Format Counters
Bytes read=80
File Output Format Counters
Bytes written=85
View the files in the output directory
[Email protected] ~]$ Hadoop fs-cat/user/hadoop/output/part-r-00000
Applicaiton 1
Application 1
Deer 1
First 1
Hadoop 1
Hello 2
is 1
My 4
This 1
World 1
The number of words has been correctly counted!
Hadoop MapReduce Base Instance one word