Partitioner Components of MapReduce

Source: Internet
Author: User
Tags hdfs dfs

Brief Introduction

The Partitioner component allows the map to partition the key so that it can be distributed to different reduce processes depending on the key;

You can customize a distribution rule for key, such as data files containing different universities, and the output requirement is that each university output a file;

The Partitioner component provides a default HashPartitioner .

packageclass HashPartitioner<K, V> extends Partitioner<K, V> {  /** Use {@link Object#hashCode()} to partition. */  public int getPartition(K key, V value,                          int numReduceTasks) {    return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;  }}
Custom Partitioner

1, inherit the abstract class Partitioner, implement the custom getPartition() method;
2, through job.setPartitionerClass(...) to set the custom partitioner;

Partitioner class

Package Org.apache.hadoop.mapreduce;public AbstractclassPartitioner<key, value> {/** * Get thePartition Number  forAgivenKey (henceRecord)given  theTotal * Number  ofpartitions i.e. Number  ofReduce-tasks for  theJob. * * <p>typically a hash function onA AllorA subset of  thekey.</p> * * @param key theKey toBe partioned. * @param value theEntry value. * @param numpartitions theTotal Number  ofPartitions. * @return  thePartition Number  for  the<code>key</code>. */Public abstract int getpartition (key key, value value, int numpartitions);}
Partitioner application Scenarios and examples

Requirements: Statistics on weekly sales of each item separately
Address1 Weekly Sales List (INPUT1):

Shoes 20
Hat 10
Stockings 30
Clothes 40

ADDRESS2 Weekly Sales List (INPUT2):

Shoes 15
Hat 1
Stockings 90
Clothes 80

Summary results (OUTPUT):

Shoes 35
Hat 11
Stockings 120
Clothes 120

 PackageMypartitioner;ImportJava.io.IOException;ImportJava.net.URI;ImportOrg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Partitioner;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class mypartitioner {    Private Finalstatic String Input_path ="Hdfs://liguodong:8020/input";Private Finalstatic String Output_path ="Hdfs://liguodong:8020/output"; public static class mymapper extends Mapper<longwritable, Text,  Text, intwritable>{    PrivateText Word =NewText ();PrivateIntwritable one =NewIntwritable ();@Override    protectedvoid map (longwritable key, Text value, context context)throwsIOException, interruptedexception {string[] str = value.tostring (). Split ("\\s+"); Word.set (str[0]); One.set (Integer.parseint (str[1]));        Context.write (Word, one); }} public static class myreducer extends Reducer<Text, intwritable, Text, intwritable>{        Privateintwritable result =NewIntwritable ();@Override        protectedvoid reduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {int sum =0; for(intwritableVal: values) {sum+=Val. get ();            } result.set (sum);        Context.write (Key,result); }} public static class defpartitioner extends partitioner<Text,intwritable >{        @Overridepublic int getpartition (Text key, intwritable value, int numpartitions) {if(Key.tostring (). Equals ("Shoes")){return 0; }Else if(Key.tostring (). Equals ("Hat")){return 1; }Else if(Key.tostring (). Equals ("Stockings")){return 2; }Else{return 3; }}} public static void Main (string[] args)throwsException {//1, ConfigurationConfiguration conf =NewConfiguration ();FinalFileSystem FileSystem = Filesystem.get (NewURI (Input_path), conf);if(Filesystem.exists (NewPath (Output_path)) {Filesystem.delete (NewPath (Output_path),true); Job Job = job.getinstance (conf,"Define Partitioner");///2, the method that must be executed to run the packageJob.setjarbyclass (Mypartitioner.class);//3, Input pathFileinputformat.addinputpath (Job,NewPath (Input_path));//4, MapJob.setmapperclass (Mymapper.class);//5, Combiner        //job.setcombinerclass (myreducer.class);Job.setpartitionerclass (Defpartitioner.class);//6, ReducerJob.setreducerclass (Myreducer.class); Job.setnumreducetasks (4);//reduce Number default is 1Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class);//7, Output pathFileoutputformat.setoutputpath (Job,NewPath (Output_path));//8, submitting JobsSystem.exit (Job.waitforcompletion (true) ?0:1); }  }
[[Email protected]file]# HDFs Dfs-mkdir/inputupload file [[email protected]file]# HDFs Dfs-put input1/input/[[Email protected]file]# HDFs Dfs-put input2/input/[[Email protected]file]# HDFs dfs-ls/input/Found2Items-rw-r--r--1 root supergroup 2015-06-14 10:22/INPUT/INPUT1-rw-r--r--1 root supergroup 2015-06-14 10:22/input/input2Make a jar package and then execute. [[Email protected]file]# jar TF Partitioner.jarMeta-inf/manifest. Mfmypartitioner/mypartitioner$defpartitioner.classMypartitioner/mypartitioner$mymapper.classMypartitioner/mypartitioner$myreducer.classMypartitioner/mypartitioner.class[[Email protected]file]# yarn Jar Partitioner.jaroutput results [[email protected]file]# HDFs dfs-ls/output/Found5Items-rw-r--r--1 root supergroup 0 2015-06-14 11:08/output/_success-rw-r--r--1 root supergroup 9 2015-06-14 11:08/output/part-r-00000-rw-r--r--1 root supergroup 7 2015-06-14 11:08/output/part-r-00001-rw-r--r--1 root supergroup 0 2015-06-14 11:08/output/part-r-00002-rw-r--r--1 root supergroup 2015-06-14 11:08/output/part-r-00003[[Email protected]file]# HDFs dfs-cat/output/part-r-00000Shoes *[[Email protected]file]# HDFs dfs-cat/output/part-r-00001Hat One[[Email protected]file]# HDFs dfs-cat/output/part-r-00002Stockings -[[Email protected]file]# HDFs dfs-cat/output/part-r-00003Clothes -

Partitioner Components of MapReduce

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.