Program source Code
Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.input.TextInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.TextOutputFormat; Public classWordCount { Public Static classWordcountmapextendsMapper<longwritable, text, text, intwritable> { Private FinalIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); Public voidmap (longwritable key, Text value, context context)throwsIOException, interruptedexception {String line=value.tostring (); StringTokenizer Token=NewStringTokenizer (line); while(Token.hasmoretokens ()) {Word.set (Token.nexttoken ()); Context.write (Word, one); } } } Public Static classWordcountreduceextendsReducer<text, Intwritable, Text, intwritable> { Public voidReduce (Text key, iterable<intwritable>values, context context)throwsIOException, interruptedexception {intsum = 0; for(intwritable val:values) {sum+=Val.get (); } context.write (Key,Newintwritable (sum)); } } Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); Job Job=NewJob (conf); Job.setjarbyclass (WordCount.class); Job.setjobname ("WordCount"); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Job.setmapperclass (Wordcountmap.class); Job.setreducerclass (wordcountreduce.class); Job.setinputformatclass (Textinputformat.class); Job.setoutputformatclass (Textoutputformat.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); Job.waitforcompletion (true); }}
View Code
1 Compiling the source code
javac-classpath/opt/hadoop-1.2.1/hadoop-core-1.2.1.jar:/opt/hadoop-1.2.1/lib/commons-cli-1.2.jar-d./word_count _class/wordcount.java
Compiling the source code into a class file and placing it in the Word_count_class directory under the current folder, of course, you first need to create the directory
2 make the source code into a jar package
Enter the source directory
JAR-CVF Wordcount.jar *
3 Uploading input files
Create an input file directory for this task in Hadoop first
Hadoop Fs-mkdir Input_wordcount
Upload all text files from the input directory to the Input_wordcount directory in Hadoop
Hadoop fs-put input/* input_wordcount/
4 Upload jar and execute
Hadoop jar Word_count_class/wordcount.jar WordCount input_wordcount output_wordcount
5 View Calculation results
Program Output Directory
Hadoop Fs-ls Output_wordcount
Program Output Content
Hadoop Fs-cat output_wordcount/part-r-00000
Use Hadoop to count the number of words per word in multiple text