Hadoop MapReduce Base Instance one word

Last Update:2015-01-05 Source: Internet

Author: User

Tags hadoop mapreduce hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MapReduce implements a simple word counting function.

One, get ready: Eclipse installs the Hadoop plugin:

Download the relevant version of Hadoop-eclipse-plugin-2.2.0.jar to Eclipse/plugins.

Second, realize:

New MapReduce Project

Map is used for word segmentation, reduce count.

 PackageTank.demo;Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * @authorTank * @date: January 5, 2015 Morning 10:03:43 * @description: Word -Register *@version: 0.1*/ Public classWordCount { Public Static classTokenizermapperextendsmapper<longwritable, text, text, intwritable> {        Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText ();  Public voidMap (longwritable key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (value.tostring ());  while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());            Context.write (Word, one); }        }    }     Public Static classIntsumreducerextendsReducer<text, Intwritable, Text, intwritable> {        Privateintwritable result =Newintwritable ();  Public voidReduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {sum+=Val.get ();            } result.set (sum);        Context.write (key, result); }    }     Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); if(Args.length! = 2) {System.err.println ("Usage:wordcount"); System.exit (2); } Job Job=NewJob (conf, "word count"); //Main classJob.setjarbyclass (WordCount.class); Job.setmapperclass (tokenizermapper.class); Job.setreducerclass (intsumreducer.class); //Map output FormatJob.setmapoutputkeyclass (Text.class); Job.setmapoutputvalueclass (intwritable.class); //output FormatJob.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ? 0:1); }}

Packing World-count.jar

Three, prepare to enter data

Hadoop fs-mkdir/user/hadoop/input//Build the input directory.

Write some data files.

echo Hello my Hadoop This is my first application>file1

echo Hello world my deer my applicaiton >file2

Copy to HDFs

Hadoop fs-put file*/user/hadoop/input

Hadoop fs-ls/user/hadoop/input//view

Four, run

Upload to a clustered environment:

Hadoop jar World-count.jar WordCount Input Output

Intercept a section of output such as:

15/01/05 11:14:36 INFO mapred. Task:Task:attempt_local1938802295_0001_r_000000_0 is done. and is in the process of committing
15/01/05 11:14:36 INFO mapred. Localjobrunner:
15/01/05 11:14:36 INFO mapred. Task:task Attempt_local1938802295_0001_r_000000_0 is allowed to commit now
15/01/05 11:14:36 INFO output. Fileoutputcommitter:Saved output of Task ' attempt_local1938802295_0001_r_000000_0 ' to Hdfs://192.168.183.130:9000/user/hadoop/output /_temporary/0/task_local1938802295_0001_r_000000
15/01/05 11:14:36 INFO mapred. Localjobrunner:reduce > Reduce
15/01/05 11:14:36 INFO mapred. Task:task ' Attempt_local1938802295_0001_r_000000_0 ' done.
15/01/05 11:14:36 INFO MapReduce. Job:job job_local1938802295_0001 running in Uber Mode:false
15/01/05 11:14:36 INFO MapReduce. Job:map 100% Reduce 100%
15/01/05 11:14:36 INFO MapReduce. Job:job JOB_LOCAL1938802295_0001 completed successfully
15/01/05 11:14:36 INFO MapReduce. Job:counters:32
File System Counters
File:number of bytes read=17706
File:number of bytes written=597506
File:number of Read operations=0
File:number of Large Read operations=0
File:number of Write Operations=0
Hdfs:number of bytes read=205
Hdfs:number of bytes written=85
Hdfs:number of Read operations=25
Hdfs:number of Large Read operations=0
Hdfs:number of Write Operations=5
Map-reduce Framework
Map input records=2
Map Output records=14
Map Output bytes=136
Map output materialized bytes=176
Input Split bytes=232
Combine input Records=0
Combine Output Records=0
Reduce input groups=10
Reduce Shuffle bytes=0
Reduce input records=14
Reduce Output records=10
Spilled records=28
Shuffled Maps =0
Failed shuffles=0
Merged Map Outputs=0
GC time Elapsed (ms) =67
CPU Time Spent (ms) =0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes) =456536064
File Input Format Counters
Bytes read=80
File Output Format Counters
Bytes written=85

View the files in the output directory

[Email protected] ~]$ Hadoop fs-cat/user/hadoop/output/part-r-00000
Applicaiton 1
Application 1
Deer 1
First 1
Hadoop 1
Hello 2
is 1
My 4
This 1
World 1

The number of words has been correctly counted!

Hadoop MapReduce Base Instance one word

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More