In MapReduce:
The shuffle phase is between map and reduce and can be custom sorted, custom partitioned and custom grouped!
In MapReduce, map data is a key-value pair, and the default is Hashpatitionner to partition the data from the map;
There are several other ways to partition:
Randomsampler<text, text> sampler = new Inputsampler.randomsampler<text, text> (0.5, 3000, Intervalsampler<text, text> sampler2 = new Inputsampler.intervalsampler<text, text> (0 .333, 10); Splitsampler<text, text> sampler3 = new Inputsampler.splitsampler<text, text> (reducenumber );
Implementation and details
public class totalsortmr { @ Suppresswarnings ("deprecation") public static int runtotalsortjob (String [] args) throws Exception { Path inputpath = new path (Args[0]); path outputpath = new path (args[1]); path partitionfile = new path (args[2]); int reducenumber = integer.parseint (args[3]); //three types of sampler randomsampler<text, text> sampler = new inputsampler.randomSampler<text, text> (1, 3000, 10); Intervalsampler<text, text> sampler2 = new inputsampler.intervalsampler<text , text> (0.333, 10); splitsampler<text, Text> sampler3 = new inputsampler.splitsampler<text, text> (ReduceNumber); //Task Initialization configuration conf = new configuration (); job job = job.getinstance (conf); job.setjobname (" Total-sort "); job.setjarbyclass (TotalSortMR.class); &nbSp; job.setinputformatclass (Keyvaluetextinputformat.class); job.setmapoutputkeyclass (Text.class); job.setmapoutputvalueclass (Text.class); job.setnumreducetasks (Reducenumber); //set all the partition classes job.setpartitionerclass ( totalorderpartitioner.class); partition file for //partition class reference totalorderpartitioner.setpartitionfile (Conf, partitionFile); What sampler does the //partition use inputsampler.writepartitionfile (Job, sampler); &nbsThe input and output paths of the P;//job fileinputformat.setinputpaths (job, InputPath); fileoutputformat.setoutputpath (Job, outputpath); outputpath.getfilesystem (conf). Delete (outputpath, true); return job.waitforcompletion (True)? 0 : 1; } public static void main (String[] args) throws Exception{ system.exit (Runtotalsortjob (args)); }}
The job default input format is Textinputformat, this is the form of Key-value, key is the row label for each row, and value is the content of each row. Can change
Job.setinputformatclass (,.... )
In general, the output format of the mapper should be set for later use.
Learning Log---partitioner and samplers