Partitioner Components of MapReduce

Last Update:2015-06-14 Source: Internet

Author: User

Tags hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief Introduction

The Partitioner component allows the map to partition the key so that it can be distributed to different reduce processes depending on the key;

You can customize a distribution rule for key, such as data files containing different universities, and the output requirement is that each university output a file;

The Partitioner component provides a default HashPartitioner .

packageclass HashPartitioner<K, V> extends Partitioner<K, V> {  /** Use {@link Object#hashCode()} to partition. */  public int getPartition(K key, V value,                          int numReduceTasks) {    return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;  }}

Custom Partitioner

1, inherit the abstract class Partitioner, implement the custom getPartition() method;
2, through job.setPartitionerClass(...) to set the custom partitioner;

Partitioner class

Package Org.apache.hadoop.mapreduce;public AbstractclassPartitioner<key, value> {/** * Get thePartition Number  forAgivenKey (henceRecord)given  theTotal * Number  ofpartitions i.e. Number  ofReduce-tasks for  theJob. * * <p>typically a hash function onA AllorA subset of  thekey.</p> * * @param key theKey toBe partioned. * @param value theEntry value. * @param numpartitions theTotal Number  ofPartitions. * @return  thePartition Number  for  the<code>key</code>. */Public abstract int getpartition (key key, value value, int numpartitions);}

Partitioner application Scenarios and examples

Requirements: Statistics on weekly sales of each item separately
Address1 Weekly Sales List (INPUT1):

Shoes 20
Hat 10
Stockings 30
Clothes 40

ADDRESS2 Weekly Sales List (INPUT2):

Shoes 15
Hat 1
Stockings 90
Clothes 80

Summary results (OUTPUT):

Shoes 35
Hat 11
Stockings 120
Clothes 120

 PackageMypartitioner;ImportJava.io.IOException;ImportJava.net.URI;ImportOrg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Partitioner;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class mypartitioner {    Private Finalstatic String Input_path ="Hdfs://liguodong:8020/input";Private Finalstatic String Output_path ="Hdfs://liguodong:8020/output"; public static class mymapper extends Mapper<longwritable, Text,  Text, intwritable>{    PrivateText Word =NewText ();PrivateIntwritable one =NewIntwritable ();@Override    protectedvoid map (longwritable key, Text value, context context)throwsIOException, interruptedexception {string[] str = value.tostring (). Split ("\\s+"); Word.set (str[0]); One.set (Integer.parseint (str[1]));        Context.write (Word, one); }} public static class myreducer extends Reducer<Text, intwritable, Text, intwritable>{        Privateintwritable result =NewIntwritable ();@Override        protectedvoid reduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {int sum =0; for(intwritableVal: values) {sum+=Val. get ();            } result.set (sum);        Context.write (Key,result); }} public static class defpartitioner extends partitioner<Text,intwritable >{        @Overridepublic int getpartition (Text key, intwritable value, int numpartitions) {if(Key.tostring (). Equals ("Shoes")){return 0; }Else if(Key.tostring (). Equals ("Hat")){return 1; }Else if(Key.tostring (). Equals ("Stockings")){return 2; }Else{return 3; }}} public static void Main (string[] args)throwsException {//1, ConfigurationConfiguration conf =NewConfiguration ();FinalFileSystem FileSystem = Filesystem.get (NewURI (Input_path), conf);if(Filesystem.exists (NewPath (Output_path)) {Filesystem.delete (NewPath (Output_path),true); Job Job = job.getinstance (conf,"Define Partitioner");///2, the method that must be executed to run the packageJob.setjarbyclass (Mypartitioner.class);//3, Input pathFileinputformat.addinputpath (Job,NewPath (Input_path));//4, MapJob.setmapperclass (Mymapper.class);//5, Combiner        //job.setcombinerclass (myreducer.class);Job.setpartitionerclass (Defpartitioner.class);//6, ReducerJob.setreducerclass (Myreducer.class); Job.setnumreducetasks (4);//reduce Number default is 1Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class);//7, Output pathFileoutputformat.setoutputpath (Job,NewPath (Output_path));//8, submitting JobsSystem.exit (Job.waitforcompletion (true) ?0:1); }  }

[[Email protected]file]# HDFs Dfs-mkdir/inputupload file [[email protected]file]# HDFs Dfs-put input1/input/[[Email protected]file]# HDFs Dfs-put input2/input/[[Email protected]file]# HDFs dfs-ls/input/Found2Items-rw-r--r--1 root supergroup 2015-06-14 10:22/INPUT/INPUT1-rw-r--r--1 root supergroup 2015-06-14 10:22/input/input2Make a jar package and then execute. [[Email protected]file]# jar TF Partitioner.jarMeta-inf/manifest. Mfmypartitioner/mypartitioner$defpartitioner.classMypartitioner/mypartitioner$mymapper.classMypartitioner/mypartitioner$myreducer.classMypartitioner/mypartitioner.class[[Email protected]file]# yarn Jar Partitioner.jaroutput results [[email protected]file]# HDFs dfs-ls/output/Found5Items-rw-r--r--1 root supergroup 0 2015-06-14 11:08/output/_success-rw-r--r--1 root supergroup 9 2015-06-14 11:08/output/part-r-00000-rw-r--r--1 root supergroup 7 2015-06-14 11:08/output/part-r-00001-rw-r--r--1 root supergroup 0 2015-06-14 11:08/output/part-r-00002-rw-r--r--1 root supergroup 2015-06-14 11:08/output/part-r-00003[[Email protected]file]# HDFs dfs-cat/output/part-r-00000Shoes *[[Email protected]file]# HDFs dfs-cat/output/part-r-00001Hat One[[Email protected]file]# HDFs dfs-cat/output/part-r-00002Stockings -[[Email protected]file]# HDFs dfs-cat/output/part-r-00003Clothes -

Partitioner Components of MapReduce

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More