In the previous article we learned about MapReduce optimization, and now we have learned how to optimize mapreduce performance through simple projects.
1. Project Introduction
we use a simple score data set to count the highest scores of male and female students in the three age groups of 0~20, 20~50, and 50~100 .
2. Data set
Name Age gender score
Alice Female 45
BOB Male 89
Chris Male 97
Kristine Female 53
Connor Male 27
Daniel Male 95
James Male 79
Alex Male 69
3. Analysis
Based on the requirements, we do this in the following steps:
1. Write the Mapper class, resolve the data set to Key=gender,value=name+age+score on demand, and then output
2, write Partitioner class, according to age, the results are assigned to different reduce execution
3, the preparation of reducer category, respectively, the highest score of male and female students
4. Write the Run method to execute the MapReduce job
4. Realize
PackageCom.buaa;Importjava.io.IOException;Importorg.apache.hadoop.conf.Configuration;Importorg.apache.hadoop.conf.Configured;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Partitioner;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.util.Tool;ImportOrg.apache.hadoop.util.ToolRunner;/*** @ProjectName bestscorecount* @PackageName com.buaa* @ClassName gender* @Description statistics the highest score * @Author Liu Jishu for men and women in different age groups * @Date 2016-05-09 09:49:50*/ Public classGenderextendsConfiguredImplementsTool {Private StaticString tab_separator = "\ T"; Public Static classGendermapperextendsmapper<longwritable, text, text, text> { /** Call map to parse a row of data that is stored in the value parameter, and then resolves the name, age, gender, and score according to the \ t delimiter*/ Public voidMap (longwritable key, Text value, context context)throwsIOException, interruptedexception {/** Name Age gender score * Alice female 45 * The delimiter for each field is the TAB key*/ //use \ t to split the dataString[] Tokens =value.tostring (). Split (Tab_separator); //SexString gender = tokens[2]; //Name Age scoreString Nameagescore = tokens[0] + tab_separator + tokens[1] + tab_separator + tokens[3]; //Output Key=gender Value=name+age+scoreContext.write (NewText (gender),NewText (Nameagescore)); } } /** Merge Mapper output results*/ Public Static classGendercombinerextendsReducer<text, text, text, text> { Public voidReduce (Text key, iterable<text> values, context context)throwsIOException, interruptedexception {intMaxscore =Integer.min_value; intScore = 0; String name= " "; String Age= " "; for(Text val:values) {string[] Valtokens=val.tostring (). Split (Tab_separator); Score= Integer.parseint (valtokens[2]); if(Score >Maxscore) {Name= Valtokens[0]; Age= Valtokens[1]; Maxscore=score; }} context.write (Key,NewText (name + Tab_separator + age + Tab_separator +maxscore)); } } /** Map output is distributed evenly on reduce based on age*/ Public Static classGenderpartitionerextendsPartitioner<text, text>{@Override Public intGetpartition (text key, text value,intnumreducetasks) {string[] Nameagescore=value.tostring (). Split (Tab_separator); //Student Age intAge = Integer.parseint (nameagescore[1]); //default specified partition 0 if(Numreducetasks = = 0) return0; //age less than or equal to 20, specify partition 0 if(Age <= 20) { return0; }Else if(Age <= 50) {//age greater than 20, less than or equal to 50, specifying partition 1 return1%Numreducetasks; }Else //remaining age, specify partition 2 return2%Numreducetasks; } } /** Statistics of the highest scores of different genders*/ Public Static classGenderreducerextendsReducer<text, text, text, text>{@Override Public voidReduce (Text key, iterable<text> values, context context)throwsIOException, interruptedexception {intMaxscore =Integer.min_value; intScore = 0; String name= " "; String Age= " "; String Gender= " "; //based on key, iterate the values collection to find the highest score for(Text val:values) {string[] Valtokens=val.tostring (). Split (Tab_separator); Score= Integer.parseint (valtokens[2]); if(Score >Maxscore) {Name= Valtokens[0]; Age= Valtokens[1]; Gender=key.tostring (); Maxscore=score; }} context.write (NewText (name),NewText ("Age:" + age + Tab_separator + "Gender:" + gender + Tab_separator + "score:" +maxscore)); }} @SuppressWarnings ("Deprecation") @Override Public intRun (string[] args)throwsException {//reading configuration FilesConfiguration conf =NewConfiguration (); Path MyPath=NewPath (args[1]); FileSystem HDFs=mypath.getfilesystem (conf); if(Hdfs.isdirectory (MyPath)) {Hdfs.delete (MyPath,true); } //Create a new taskJob Job =NewJob (conf, "gender"); //Main classJob.setjarbyclass (Gender.class); //MapperJob.setmapperclass (Gendermapper.class); //ReducerJob.setreducerclass (Genderreducer.class); //map output Key typeJob.setmapoutputkeyclass (Text.class); //Map Output value typeJob.setmapoutputvalueclass (Text.class); //reduce output Key typeJob.setoutputkeyclass (Text.class); //reduce output value typeJob.setoutputvalueclass (Text.class); //Set the Combiner classJob.setcombinerclass (Gendercombiner.class); //Set the Partitioner classJob.setpartitionerclass (Genderpartitioner.class); //the number of reduce is set to 3Job.setnumreducetasks (3); //Input PathFileinputformat.addinputpath (Job,NewPath (args[0])); //Output PathFileoutputformat.setoutputpath (Job,NewPath (args[1])); //Submit a Task returnJob.waitforcompletion (true)? 0:1; } Public Static voidMain (string[] args)throwsException {string[] args0= { "Hdfs://ljc:9000/buaa/gender/gender.txt", "Hdfs://ljc:9000/buaa/gender/out/" }; intEC = Toolrunner.run (NewConfiguration (),NewGender (), ARGS0); System.exit (EC); }}
5. Operation effect
If you think reading this blog gives you something to gain, you might want to click " recommend " in the lower right corner.
If you want to find my new blog more easily, click on " Follow me " in the lower left corner.
If you are interested in what my blog is talking about, please keep following my follow-up blog, I am " Liu Chao ★ljc".
This article is copyright to the author and the blog Park, Welcome to reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.
Implementing code and data: Downloading
MapReduce best results statistics, boys and girls compare look