1, WordCount source
Place the source file Wordcount.java in the Hadoop2.6.0 folder.
Importjava.io.IOException;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Public classWordCount { Public Static classTokenizermapperextendsMapper<object, text, text, intwritable>{ Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); Public voidmap (Object key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (value.tostring ()); while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (Word, one); } } } Public Static classIntsumreducerextendsReducer<text,intwritable,text,intwritable> { Privateintwritable result =Newintwritable (); Public voidReduce (Text key, iterable<intwritable>values, context context)throwsIOException, interruptedexception {intsum = 0; for(intwritable val:values) {sum+=Val.get (); } result.set (sum); Context.write (key, result); } } Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); Job Job= Job.getinstance (conf, "word count"); Job.setjarbyclass (WordCount.class); Job.setmapperclass (tokenizermapper.class); Job.setcombinerclass (intsumreducer.class); Job.setreducerclass (intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ? 0:1); }}2. Compiling the source code
$ bin/hadoop com.sun.tools.javac.Main Wordcount.java #将WordCount. Java compiled into three. class files $ jar CF Wc.jar wordcount*. Class #将三个. class files packaged as jar files
3. Operation
Creates a new input folder to hold the text that needs to be counted.
Cd/opt/hadoop-2.6.0mkdir input
Copy the TXT file under the hadoop-2.6.0 folder into the input folder.
CP *.txt/opt/hadoop-2.6.0/input
Run the command.
Bin/hadoop jar Wc.jar wordcount/opt/hadoop-2.6.0/input/opt/hadoop-2.6.0/output #自动生成output文件夹 for storing word segmentation statistics results.
4. View Results
Bin/hdfs dfs-cat/opt/hadoop-2.6.0/output/part-r-00000
At this point, the WordCount segmentation statistics run successfully, the Hadoop environment was built successfully.
Beginner of Hadoop WordCount participle statistics