MapReduce best results statistics, boys and girls compare look

Source: Internet
Author: User
Tags iterable

In the previous article we learned about MapReduce optimization, and now we have learned how to optimize mapreduce performance through simple projects.

1. Project Introduction

we use a simple score data set to count the highest scores of male and female students in the three age groups of 0~20, 20~50, and 50~100 .

2. Data set

Name Age gender score

Alice Female 45

BOB Male 89

Chris Male 97

Kristine Female 53

Connor Male 27

Daniel Male 95

James Male 79

Alex Male 69

3. Analysis

Based on the requirements, we do this in the following steps:

1. Write the Mapper class, resolve the data set to Key=gender,value=name+age+score on demand, and then output

2, write Partitioner class, according to age, the results are assigned to different reduce execution

3, the preparation of reducer category, respectively, the highest score of male and female students

4. Write the Run method to execute the MapReduce job

4. Realize

 PackageCom.buaa;Importjava.io.IOException;Importorg.apache.hadoop.conf.Configuration;Importorg.apache.hadoop.conf.Configured;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Partitioner;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.util.Tool;ImportOrg.apache.hadoop.util.ToolRunner;/*** @ProjectName bestscorecount* @PackageName com.buaa* @ClassName gender* @Description statistics the highest score * @Author Liu Jishu for men and women in different age groups * @Date 2016-05-09 09:49:50*/ Public classGenderextendsConfiguredImplementsTool {Private StaticString tab_separator = "\ T";  Public Static classGendermapperextendsmapper<longwritable, text, text, text> {        /** Call map to parse a row of data that is stored in the value parameter, and then resolves the name, age, gender, and score according to the \ t delimiter*/         Public voidMap (longwritable key, Text value, context context)throwsIOException, interruptedexception {/** Name Age gender score * Alice female 45 * The delimiter for each field is the TAB key*/            //use \ t to split the dataString[] Tokens =value.tostring (). Split (Tab_separator); //SexString gender = tokens[2]; //Name Age scoreString Nameagescore = tokens[0] + tab_separator + tokens[1] + tab_separator + tokens[3]; //Output Key=gender Value=name+age+scoreContext.write (NewText (gender),NewText (Nameagescore)); }    }        /** Merge Mapper output results*/     Public Static classGendercombinerextendsReducer<text, text, text, text>  {         Public voidReduce (Text key, iterable<text> values, context context)throwsIOException, interruptedexception {intMaxscore =Integer.min_value; intScore = 0; String name= " "; String Age= " ";  for(Text val:values) {string[] Valtokens=val.tostring (). Split (Tab_separator); Score= Integer.parseint (valtokens[2]); if(Score >Maxscore) {Name= Valtokens[0]; Age= Valtokens[1]; Maxscore=score; }} context.write (Key,NewText (name + Tab_separator + age + Tab_separator +maxscore)); }    }        /** Map output is distributed evenly on reduce based on age*/     Public Static classGenderpartitionerextendsPartitioner<text, text>{@Override Public intGetpartition (text key, text value,intnumreducetasks) {string[] Nameagescore=value.tostring (). Split (Tab_separator); //Student Age            intAge = Integer.parseint (nameagescore[1]); //default specified partition 0            if(Numreducetasks = = 0)                return0; //age less than or equal to 20, specify partition 0            if(Age <= 20) {                return0; }Else if(Age <= 50) {//age greater than 20, less than or equal to 50, specifying partition 1                return1%Numreducetasks; }Else //remaining age, specify partition 2                return2%Numreducetasks; }    }    /** Statistics of the highest scores of different genders*/     Public Static classGenderreducerextendsReducer<text, text, text, text>{@Override Public voidReduce (Text key, iterable<text> values, context context)throwsIOException, interruptedexception {intMaxscore =Integer.min_value; intScore = 0; String name= " "; String Age= " "; String Gender= " "; //based on key, iterate the values collection to find the highest score             for(Text val:values) {string[] Valtokens=val.tostring (). Split (Tab_separator); Score= Integer.parseint (valtokens[2]); if(Score >Maxscore) {Name= Valtokens[0]; Age= Valtokens[1]; Gender=key.tostring (); Maxscore=score; }} context.write (NewText (name),NewText ("Age:" + age + Tab_separator + "Gender:" + gender + Tab_separator + "score:" +maxscore)); }} @SuppressWarnings ("Deprecation") @Override Public intRun (string[] args)throwsException {//reading configuration FilesConfiguration conf =NewConfiguration (); Path MyPath=NewPath (args[1]); FileSystem HDFs=mypath.getfilesystem (conf); if(Hdfs.isdirectory (MyPath)) {Hdfs.delete (MyPath,true); }                //Create a new taskJob Job =NewJob (conf, "gender"); //Main classJob.setjarbyclass (Gender.class); //MapperJob.setmapperclass (Gendermapper.class); //ReducerJob.setreducerclass (Genderreducer.class); //map output Key typeJob.setmapoutputkeyclass (Text.class); //Map Output value typeJob.setmapoutputvalueclass (Text.class); //reduce output Key typeJob.setoutputkeyclass (Text.class); //reduce output value typeJob.setoutputvalueclass (Text.class); //Set the Combiner classJob.setcombinerclass (Gendercombiner.class); //Set the Partitioner classJob.setpartitionerclass (Genderpartitioner.class); //the number of reduce is set to 3Job.setnumreducetasks (3); //Input PathFileinputformat.addinputpath (Job,NewPath (args[0])); //Output PathFileoutputformat.setoutputpath (Job,NewPath (args[1])); //Submit a Task        returnJob.waitforcompletion (true)? 0:1; }         Public Static voidMain (string[] args)throwsException {string[] args0= {                "Hdfs://ljc:9000/buaa/gender/gender.txt",                "Hdfs://ljc:9000/buaa/gender/out/"        }; intEC = Toolrunner.run (NewConfiguration (),NewGender (), ARGS0);    System.exit (EC); }}

5. Operation effect

If you think reading this blog gives you something to gain, you might want to click " recommend " in the lower right corner.
If you want to find my new blog more easily, click on " Follow me " in the lower left corner.
If you are interested in what my blog is talking about, please keep following my follow-up blog, I am " Liu Chao ★ljc".

This article is copyright to the author and the blog Park, Welcome to reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.

Implementing code and data: Downloading

MapReduce best results statistics, boys and girls compare look

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.