Hadoop MapReduce Base Instance one word

Source: Internet
Author: User
Tags hadoop mapreduce hadoop fs

MapReduce implements a simple word counting function.

One, get ready: Eclipse installs the Hadoop plugin:

Download the relevant version of Hadoop-eclipse-plugin-2.2.0.jar to Eclipse/plugins.

Second, realize:

New MapReduce Project

Map is used for word segmentation, reduce count.

 PackageTank.demo;Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * @authorTank * @date: January 5, 2015 Morning 10:03:43 * @description: Word -Register *@version: 0.1*/ Public classWordCount { Public Static classTokenizermapperextendsmapper<longwritable, text, text, intwritable> {        Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText ();  Public voidMap (longwritable key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (value.tostring ());  while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());            Context.write (Word, one); }        }    }     Public Static classIntsumreducerextendsReducer<text, Intwritable, Text, intwritable> {        Privateintwritable result =Newintwritable ();  Public voidReduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {sum+=Val.get ();            } result.set (sum);        Context.write (key, result); }    }     Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); if(Args.length! = 2) {System.err.println ("Usage:wordcount"); System.exit (2); } Job Job=NewJob (conf, "word count"); //Main classJob.setjarbyclass (WordCount.class); Job.setmapperclass (tokenizermapper.class); Job.setreducerclass (intsumreducer.class); //Map output FormatJob.setmapoutputkeyclass (Text.class); Job.setmapoutputvalueclass (intwritable.class); //output FormatJob.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ? 0:1); }}

Packing World-count.jar

Three, prepare to enter data

Hadoop fs-mkdir/user/hadoop/input//Build the input directory.

Write some data files.

echo Hello my Hadoop This is my first application>file1

echo Hello world my deer my applicaiton >file2

Copy to HDFs

Hadoop fs-put file*/user/hadoop/input

Hadoop fs-ls/user/hadoop/input//view

Four, run

Upload to a clustered environment:

Hadoop jar World-count.jar WordCount Input Output

Intercept a section of output such as:

15/01/05 11:14:36 INFO mapred. Task:Task:attempt_local1938802295_0001_r_000000_0 is done. and is in the process of committing
15/01/05 11:14:36 INFO mapred. Localjobrunner:
15/01/05 11:14:36 INFO mapred. Task:task Attempt_local1938802295_0001_r_000000_0 is allowed to commit now
15/01/05 11:14:36 INFO output. Fileoutputcommitter:Saved output of Task ' attempt_local1938802295_0001_r_000000_0 ' to Hdfs://192.168.183.130:9000/user/hadoop/output /_temporary/0/task_local1938802295_0001_r_000000
15/01/05 11:14:36 INFO mapred. Localjobrunner:reduce > Reduce
15/01/05 11:14:36 INFO mapred. Task:task ' Attempt_local1938802295_0001_r_000000_0 ' done.
15/01/05 11:14:36 INFO MapReduce. Job:job job_local1938802295_0001 running in Uber Mode:false
15/01/05 11:14:36 INFO MapReduce. Job:map 100% Reduce 100%
15/01/05 11:14:36 INFO MapReduce. Job:job JOB_LOCAL1938802295_0001 completed successfully
15/01/05 11:14:36 INFO MapReduce. Job:counters:32
File System Counters
File:number of bytes read=17706
File:number of bytes written=597506
File:number of Read operations=0
File:number of Large Read operations=0
File:number of Write Operations=0
Hdfs:number of bytes read=205
Hdfs:number of bytes written=85
Hdfs:number of Read operations=25
Hdfs:number of Large Read operations=0
Hdfs:number of Write Operations=5
Map-reduce Framework
Map input records=2
Map Output records=14
Map Output bytes=136
Map output materialized bytes=176
Input Split bytes=232
Combine input Records=0
Combine Output Records=0
Reduce input groups=10
Reduce Shuffle bytes=0
Reduce input records=14
Reduce Output records=10
Spilled records=28
Shuffled Maps =0
Failed shuffles=0
Merged Map Outputs=0
GC time Elapsed (ms) =67
CPU Time Spent (ms) =0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes) =456536064
File Input Format Counters
Bytes read=80
File Output Format Counters
Bytes written=85

View the files in the output directory

[Email protected] ~]$ Hadoop fs-cat/user/hadoop/output/part-r-00000
Applicaiton 1
Application 1
Deer 1
First 1
Hadoop 1
Hello 2
is 1
My 4
This 1
World 1

The number of words has been correctly counted!

Hadoop MapReduce Base Instance one word

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.