Brief Introduction
The Partitioner component allows the map to partition the key so that it can be distributed to different reduce processes depending on the key;
You can customize a distribution rule for key, such as data files containing different universities, and the output requirement is that each university output a file;
The Partitioner component provides a default HashPartitioner
.
packageclass HashPartitioner<K, V> extends Partitioner<K, V> { /** Use {@link Object#hashCode()} to partition. */ public int getPartition(K key, V value, int numReduceTasks) { return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks; }}
Custom Partitioner
1, inherit the abstract class Partitioner, implement the custom getPartition()
method;
2, through job.setPartitionerClass(...)
to set the custom partitioner;
Partitioner class
Package Org.apache.hadoop.mapreduce;public AbstractclassPartitioner<key, value> {/** * Get thePartition Number forAgivenKey (henceRecord)given theTotal * Number ofpartitions i.e. Number ofReduce-tasks for theJob. * * <p>typically a hash function onA AllorA subset of thekey.</p> * * @param key theKey toBe partioned. * @param value theEntry value. * @param numpartitions theTotal Number ofPartitions. * @return thePartition Number for the<code>key</code>. */Public abstract int getpartition (key key, value value, int numpartitions);}
Partitioner application Scenarios and examples
Requirements: Statistics on weekly sales of each item separately
Address1 Weekly Sales List (INPUT1):
Shoes 20
Hat 10
Stockings 30
Clothes 40
ADDRESS2 Weekly Sales List (INPUT2):
Shoes 15
Hat 1
Stockings 90
Clothes 80
Summary results (OUTPUT):
Shoes 35
Hat 11
Stockings 120
Clothes 120
PackageMypartitioner;ImportJava.io.IOException;ImportJava.net.URI;ImportOrg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Partitioner;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class mypartitioner { Private Finalstatic String Input_path ="Hdfs://liguodong:8020/input";Private Finalstatic String Output_path ="Hdfs://liguodong:8020/output"; public static class mymapper extends Mapper<longwritable, Text, Text, intwritable>{ PrivateText Word =NewText ();PrivateIntwritable one =NewIntwritable ();@Override protectedvoid map (longwritable key, Text value, context context)throwsIOException, interruptedexception {string[] str = value.tostring (). Split ("\\s+"); Word.set (str[0]); One.set (Integer.parseint (str[1])); Context.write (Word, one); }} public static class myreducer extends Reducer<Text, intwritable, Text, intwritable>{ Privateintwritable result =NewIntwritable ();@Override protectedvoid reduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {int sum =0; for(intwritableVal: values) {sum+=Val. get (); } result.set (sum); Context.write (Key,result); }} public static class defpartitioner extends partitioner<Text,intwritable >{ @Overridepublic int getpartition (Text key, intwritable value, int numpartitions) {if(Key.tostring (). Equals ("Shoes")){return 0; }Else if(Key.tostring (). Equals ("Hat")){return 1; }Else if(Key.tostring (). Equals ("Stockings")){return 2; }Else{return 3; }}} public static void Main (string[] args)throwsException {//1, ConfigurationConfiguration conf =NewConfiguration ();FinalFileSystem FileSystem = Filesystem.get (NewURI (Input_path), conf);if(Filesystem.exists (NewPath (Output_path)) {Filesystem.delete (NewPath (Output_path),true); Job Job = job.getinstance (conf,"Define Partitioner");///2, the method that must be executed to run the packageJob.setjarbyclass (Mypartitioner.class);//3, Input pathFileinputformat.addinputpath (Job,NewPath (Input_path));//4, MapJob.setmapperclass (Mymapper.class);//5, Combiner //job.setcombinerclass (myreducer.class);Job.setpartitionerclass (Defpartitioner.class);//6, ReducerJob.setreducerclass (Myreducer.class); Job.setnumreducetasks (4);//reduce Number default is 1Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class);//7, Output pathFileoutputformat.setoutputpath (Job,NewPath (Output_path));//8, submitting JobsSystem.exit (Job.waitforcompletion (true) ?0:1); } }
[[Email protected]file]# HDFs Dfs-mkdir/inputupload file [[email protected]file]# HDFs Dfs-put input1/input/[[Email protected]file]# HDFs Dfs-put input2/input/[[Email protected]file]# HDFs dfs-ls/input/Found2Items-rw-r--r--1 root supergroup 2015-06-14 10:22/INPUT/INPUT1-rw-r--r--1 root supergroup 2015-06-14 10:22/input/input2Make a jar package and then execute. [[Email protected]file]# jar TF Partitioner.jarMeta-inf/manifest. Mfmypartitioner/mypartitioner$defpartitioner.classMypartitioner/mypartitioner$mymapper.classMypartitioner/mypartitioner$myreducer.classMypartitioner/mypartitioner.class[[Email protected]file]# yarn Jar Partitioner.jaroutput results [[email protected]file]# HDFs dfs-ls/output/Found5Items-rw-r--r--1 root supergroup 0 2015-06-14 11:08/output/_success-rw-r--r--1 root supergroup 9 2015-06-14 11:08/output/part-r-00000-rw-r--r--1 root supergroup 7 2015-06-14 11:08/output/part-r-00001-rw-r--r--1 root supergroup 0 2015-06-14 11:08/output/part-r-00002-rw-r--r--1 root supergroup 2015-06-14 11:08/output/part-r-00003[[Email protected]file]# HDFs dfs-cat/output/part-r-00000Shoes *[[Email protected]file]# HDFs dfs-cat/output/part-r-00001Hat One[[Email protected]file]# HDFs dfs-cat/output/part-r-00002Stockings -[[Email protected]file]# HDFs dfs-cat/output/part-r-00003Clothes -
Partitioner Components of MapReduce