Hadoop MapReduce Custom Sort writablecomparable

Last Update:2014-12-26 Source: Internet

Author: User

Tags comparable hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is published in my blog .

today to continue to write exercises, the last time a little understanding of the partition, that according to that step partition, sorting, grouping, the statute, today should be to write a sort of example, that good now start!

When it comes to sorting, we can look at the wordcount example in the Hadoop source code for the Longwritable type definition, which implements the abstract interface Writablecomparable, as follows:

Public interface writablecomparable<t> extends writable, comparable<t> {}public interface writable {  void Write (DataOutput out) throws IOException;  void ReadFields (Datainput in) throws IOException;}

The writable abstract interface defines the write and ReadFields methods, respectively, to write to the data stream and read the data stream. In comparable, there are CompareTo methods to define comparisons. Even if the built-in type of Hadoop has a comparison size function, then it uses this built-in type as the map end output is how to sort, this problem we first to look at the map task class Maptask source code, internal has built-in Mapoutputbuffer class, in spill Accounting note There is a sort field below:

Private final Indexedsorter Sorter;

This field is made up of:

Sorter = reflectionutils.newinstance (Job.getclass ("Map.sort.class", Quicksort.class, Indexedsorter.class), job);

quicksort :

public void sort (final indexedsortable s, int p, int r,final progressable rep);p rivate static void Sortinternal (final Inde xedsortable s, int p, int r,final progressable rep, int depth);

Where the parameter indexsortable is passed with Mapoutputbuffer current, Because this mapoutputbuffer also inherit indexedsortable. This will be compared in quicksort sort sort using the Compare method in the Mapoutputbuffer class, which can be seen in the following source code:

 public int compare (int i, int j) {Final int II = kvoffsets[i% KVOFFSETS.L      Ength];      final int ij = kvoffsets[j% kvoffsets.length]; Sort by partition if (KVINDICES[II + partition]! = Kvindices[ij + partition]) {return kvindices[ii + parti      tion]-kvindices[ij + PARTITION]; }//Sort by key return Comparator.compare (Kvbuffer, Kvindices[ii + Keystart], Kvindices[ii + Valstart]-kvindices[ii + Keystart], Kvbuffer, Kvindices[ij + Keystart], Kvindices[ij + Valsta    RT]-Kvindices[ij + Keystart]); }

However, this method comparator by default is determined by the node "Mapred.output.key.comparator.class", you can also see the source code:

  Public Rawcomparator Getoutputkeycomparator () {    class<? extends rawcomparator> Theclass = GetClass (" Mapred.output.key.comparator.class ",            null, rawcomparator.class);    if (theclass! = null)      return reflectionutils.newinstance (Theclass, this);    Return Writablecomparator.get (Getmapoutputkeyclass (). Assubclass (Writablecomparable.class));  }

This is how to relate the sorting and comparison methods! Now we can implement our own custom type and read, write, compare according to Longwritable's idea. Below writes the code deepens the memory, since is the sort that we prepare the data, below has 2 columns the data request in the first column ascending, the second column descending order:

1    of    2

Custom Type SORTAPI First:

public class Sortapi implements writablecomparable<sortapi> {/** * The first column of data */public Long firstly;        /** * Second column data */public Long second;        Public Sortapi () {} public Sortapi (long First,long second) {this.first = first;    This.second = second;        }/** * Sort right here When: This.first-o.first > 0 Ascending, less than 0 reverse */@Override public int compareTo (Sortapi o) {        Long mis = (this.first-o.first);        if (mis! = 0) {return (int) mis;        } else{return (int) (this.second-o.second);        }} @Override public void write (DataOutput out) throws IOException {Out.writelong (first);    Out.writelong (second);        } @Override public void ReadFields (Datainput in) throws IOException {This.first = In.readlong ();            This.second = In.readlong ();    } @Override public int hashcode () {return This.first.hashCode () + This.second.hashCode (); } @Override   public boolean equals (Object obj) {if (obj instanceof sortapi) {sortapi o = (sortapi) obj;        return This.first = = O.first && This.second = = O.second;    } return false;    } @Override Public String toString () {return "first:" + This.first + "Second:" + This.second; }}

This type overrides CompareTo (Sortapi o), write (DataOutput out), and ReadFields (Datainput in), since there is a comparison then it must be rewritten hashcode (), Equals ( Object obj) method, do not forget this! There is also a need to read and write primarily in the Write method and in the ReadFields method: First write what field first read what field. Second, the CompareTo (Sortapi O) method returns an integer greater than 0, 0, and less than 0 for greater than, equal to, or less than. As to how to determine whether the 2 rows of data are equal, how unequal the logic can be looked at slowly.

Here's a custom mapper, reducer class, and main function:

public class Mymapper extends Mapper<longwritable, Text, Sortapi, longwritable> {            @Override    protected void map (longwritable key, Text Value,context Context) throws IOException, interruptedexception {        string[] splied = VA Lue.tostring (). Split ("\ t");        try {            Long first = Long.parselong (Splied[0]);            Long second = Long.parselong (splied[1]);            Context.write (New Sortapi (First,second), New longwritable (1));        catch (Exception e) {            System.out.println (e.getmessage ());}}}

public class Myreduce extends Reducer<sortapi, longwritable, LongWritable, longwritable> {@Override protected void reduce (Sortapi key, iterable<longwritable> values, context context ) throws IOException, Interruptedexception {context.write (new longwritable (Key.first), New Longwritable (Key.second    )); }    }

    Static final String Output_dir = "hdfs://hadoop-master:9000/sort/output/";        Static final String Input_dir = "Hdfs://hadoop-master:9000/sort/input/test.txt";        public static void Main (string[] args) throws Exception {configuration conf = new Configuration ();                Job Job = new Job (conf, Test.class.getSimpleName ());                Deleteoutputfile (Output_dir);        1 Set Input directory fileinputformat.setinputpaths (job, Input_dir);        2 Set Input Format class Job.setinputformatclass (Textinputformat.class);        3 Set Custom mapper and key value type Job.setmapperclass (Mymapper.class);        Job.setmapoutputkeyclass (Sortapi.class);        Job.setmapoutputvalueclass (Longwritable.class);        4 partition Job.setpartitionerclass (Hashpartitioner.class);                Job.setnumreducetasks (1);        5 Sort grouping//6 set in a certain reduce and key value type Job.setreducerclass (Myreduce.class);        Job.setoutputkeyclass (Longwritable.class); Job.setoutputvalueclass (longwritable.cLASS);        7 Set Output directory Fileoutputformat.setoutputpath (Job, New Path (Output_dir));    8 Submit Job Job.waitforcompletion (TRUE);        } static void Deleteoutputfile (String path) throws exception{Configuration conf = new configuration ();        FileSystem fs = Filesystem.get (new URI (Input_dir), conf);        if (fs.exists (new Path)) {fs.delete (path); }    }

This allows you to run the view results directly under Eclipse:

1       of       2

This results correctly, if the first column is required to flashback to the second column in ascending order, what to do, this only needs to be modified under CompareTo (Sortapi O):

    @Override public    int compareTo (Sortapi o) {        long mis = (this.first-o.first) *-1;        if (mis! = 0) {            return (int) mis;        }        else{            return (int) (this.second-o.second);        }    }

This saves the result in the run:

3       of       2

and it's the right thing to do with this requirement.

Leave a small problem: When was this CompareTo (Sortapi O) method called, and the total number of calls was made several times?

Come here first this time. Keep a record of every bit of drip!

Hadoop MapReduce Custom Sort writablecomparable

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More