Learning Log---partitioner and samplers

Source: Internet
Author: User

In MapReduce:

The shuffle phase is between map and reduce and can be custom sorted, custom partitioned and custom grouped!


In MapReduce, map data is a key-value pair, and the default is Hashpatitionner to partition the data from the map;

There are several other ways to partition:

Randomsampler<text, text> sampler = new Inputsampler.randomsampler<text, text> (0.5, 3000, Intervalsampler<text, text> sampler2 = new Inputsampler.intervalsampler<text, text> (0 .333, 10); Splitsampler<text, text> sampler3 = new Inputsampler.splitsampler<text, text> (reducenumber );

Implementation and details

public class totalsortmr {           @ Suppresswarnings ("deprecation")     public static int runtotalsortjob (String [] args)  throws Exception {           Path inputpath = new path (Args[0]);           path outputpath = new path (args[1]);           path partitionfile = new path (args[2]);           int reducenumber = integer.parseint (args[3]);                     //three types of sampler          randomsampler<text, text> sampler = new  inputsampler.randomSampler<text, text> (1, 3000, 10);         Intervalsampler<text, text> sampler2 = new inputsampler.intervalsampler<text , text> (0.333, 10);        splitsampler<text,  Text> sampler3 = new inputsampler.splitsampler<text, text> (ReduceNumber);                 //Task Initialization          configuration conf = new configuration ();           job job = job.getinstance (conf);                 job.setjobname (" Total-sort ");           job.setjarbyclass (TotalSortMR.class);      &nbSp;    job.setinputformatclass (Keyvaluetextinputformat.class);           job.setmapoutputkeyclass (Text.class);           job.setmapoutputvalueclass (Text.class);           job.setnumreducetasks (Reducenumber);           //set all the partition classes         job.setpartitionerclass ( totalorderpartitioner.class); partition file for           //partition class reference          totalorderpartitioner.setpartitionfile (Conf, partitionFile); What sampler does the           //partition use          inputsampler.writepartitionfile (Job, sampler);                 &nbsThe input and output paths of the P;//job         fileinputformat.setinputpaths (job,  InputPath);           fileoutputformat.setoutputpath (Job,  outputpath);           outputpath.getfilesystem (conf). Delete (outputpath, true);                     return job.waitforcompletion (True)? 0 : 1;     }            public static  void main (String[] args)  throws Exception{           system.exit (Runtotalsortjob (args));       }}

The job default input format is Textinputformat, this is the form of Key-value, key is the row label for each row, and value is the content of each row. Can change

Job.setinputformatclass (,.... )

In general, the output format of the mapper should be set for later use.

Learning Log---partitioner and samplers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.