MapReduce version: 0.2.0 ago
Description
This comment is an article that was found in previous studies and is now only modified and added to this comment after getting started.
Because of the version issue, the code does not run in a clustered environment, just as a reference to understanding MapReduce.
Remember, this version is the 0.2.0 version, please distinguish it clearly!
Body:
PackageOrg.apache.hadoop.examples;Importjava.io.IOException;ImportJava.util.Iterator;ImportJava.util.StringTokenizer;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapred.FileInputFormat;ImportOrg.apache.hadoop.mapred.FileOutputFormat;Importorg.apache.hadoop.mapred.JobClient;Importorg.apache.hadoop.mapred.JobConf;Importorg.apache.hadoop.mapred.MapReduceBase;ImportOrg.apache.hadoop.mapred.Mapper;ImportOrg.apache.hadoop.mapred.OutputCollector;ImportOrg.apache.hadoop.mapred.Reducer;ImportOrg.apache.hadoop.mapred.Reporter;ImportOrg.apache.hadoop.mapred.TextInputFormat;ImportOrg.apache.hadoop.mapred.TextOutputFormat; Public classWordCount {//The map class inherits from Mapreducebase and implements the Mapper interface, which is a canonical type. //It has 4 kinds of parameters, which are used to specify the input key, value type, output key, value type of the map, respectively . Public Static classMapextendsMapreducebaseImplementsmapper<longwritable, text, text, intwritable> { Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); //implement the Map method to process the input values. (used here to remove spaces) Public voidmap (longwritable key, Text value, Outputcollector<text, intwritable>output, Reporter Reporter)throwsIOException {String line=value.tostring (); StringTokenizer Tokenizer=NewStringTokenizer (line); while(Tokenizer.hasmoretokens ()) {Word.set (Tokenizer.nexttoken ()); Output.collect (Word, one); } } } /*the//reduce class is also inherited from Mapreducebase and needs to implement the Reducer interface. The reduce class takes the output of the map as input, so the input type of reduce is <Text,Intwritable>. The output of reduce is a word and its number, so its output type is <Text,IntWritable>. The reduce class also implements the reduce method, in which the reduce function takes the input key value as the key value of the output, and then gets multiple value values added together as the output value. */ Public Static classReduceextendsMapreducebaseImplementsReducer<text, Intwritable, Text, intwritable> { Public voidReduce (Text key, iterator<intwritable>values, Outputcollector<text, intwritable>output, Reporter Reporter)throwsIOException {intsum = 0; while(Values.hasnext ()) {sum+=Values.next (). get (); } output.collect (Key,Newintwritable (sum)); } } Public Static voidMain (string[] args)throwsException {//1. Initializing a MapReduce job with the Jobconf classjobconf conf =NewJobconf (WordCount.class); //Call the Setjobname () method to name the jobConf.setjobname ("WordCount"); //setup2: Sets the key and value data types in the job output <key,value>, because the result is < Word, number >//so the key is set to the "Text" type, which is equivalent to the string type in Java. Conf.setoutputkeyclass (Text.class); //The value is set to "intwritable", which is equivalent to the int type in Java. Conf.setoutputvalueclass (intwritable.class); //SETUP3: Specifies the job's mapreduce, and Combiner//set map for job processing (split)Conf.setmapperclass (Map.class); //set the combiner of job processing (intermediate result merging, where the reduce class is used to merge the intermediate results generated by the map to avoid the pressure on the network data transmission. )You can also not set (already default) Conf.setcombinerclass (Reduce.class); //set reduce (merge) for job processingConf.setreducerclass (Reduce.class); //specifies the input and output path, which can be configured on the project by right-clicking->run as->run configuration->arguments->program argumentsthat is, the string[] args assignment in main (string[] args)//Specify InputpathsEg:hdfs://master:9000/input1/Fileinputformat.setinputpaths (Conf,NewPath (args[0])); //Specify OutputPathsEg:hdfs://master:9000/input1/Fileoutputformat.setoutputpath (Conf,NewPath (args[1])); Jobclient.runjob (conf); }}
Introduction to the MapReduce wordcount comment