Use Hadoop to count the number of words per word in multiple text

Source: Internet
Author: User
Tags hadoop fs

Program source Code

Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.input.TextInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.TextOutputFormat; Public classWordCount { Public Static classWordcountmapextendsMapper<longwritable, text, text, intwritable> {        Private FinalIntwritable one =NewIntwritable (1); PrivateText Word =NewText ();  Public voidmap (longwritable key, Text value, context context)throwsIOException, interruptedexception {String line=value.tostring (); StringTokenizer Token=NewStringTokenizer (line);  while(Token.hasmoretokens ()) {Word.set (Token.nexttoken ());            Context.write (Word, one); }        }    }     Public Static classWordcountreduceextendsReducer<text, Intwritable, Text, intwritable> {         Public voidReduce (Text key, iterable<intwritable>values, context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {sum+=Val.get (); } context.write (Key,Newintwritable (sum)); }    }     Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); Job Job=NewJob (conf); Job.setjarbyclass (WordCount.class); Job.setjobname ("WordCount"); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Job.setmapperclass (Wordcountmap.class); Job.setreducerclass (wordcountreduce.class); Job.setinputformatclass (Textinputformat.class); Job.setoutputformatclass (Textoutputformat.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); Job.waitforcompletion (true); }}
View Code

1 Compiling the source code

javac-classpath/opt/hadoop-1.2.1/hadoop-core-1.2.1.jar:/opt/hadoop-1.2.1/lib/commons-cli-1.2.jar-d./word_count _class/wordcount.java
Compiling the source code into a class file and placing it in the Word_count_class directory under the current folder, of course, you first need to create the directory

2 make the source code into a jar package

Enter the source directory

JAR-CVF Wordcount.jar *

3 Uploading input files

Create an input file directory for this task in Hadoop first

Hadoop Fs-mkdir Input_wordcount

Upload all text files from the input directory to the Input_wordcount directory in Hadoop

Hadoop fs-put input/* input_wordcount/

4 Upload jar and execute

Hadoop jar Word_count_class/wordcount.jar WordCount input_wordcount output_wordcount

5 View Calculation results

Program Output Directory

Hadoop Fs-ls Output_wordcount

Program Output Content

Hadoop Fs-cat output_wordcount/part-r-00000

Use Hadoop to count the number of words per word in multiple text

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.