Hadoop MapReduce Custom Sort writablecomparable

Source: Internet
Author: User
Tags comparable hadoop mapreduce

This article is published in my blog .

     today to continue to write exercises, the last time a little understanding of the partition, that according to that step partition, sorting, grouping, the statute, today should be to write a sort of example, that good now start!

When it comes to sorting, we can look at the wordcount example in the Hadoop source code for the Longwritable type definition, which implements the abstract interface Writablecomparable, as follows:

Public interface writablecomparable<t> extends writable, comparable<t> {}public interface writable {  void Write (DataOutput out) throws IOException;  void ReadFields (Datainput in) throws IOException;}

The writable abstract interface defines the write and ReadFields methods, respectively, to write to the data stream and read the data stream. In comparable, there are CompareTo methods to define comparisons. Even if the built-in type of Hadoop has a comparison size function, then it uses this built-in type as the map end output is how to sort, this problem we first to look at the map task class Maptask source code, internal has built-in Mapoutputbuffer class, in spill Accounting note There is a sort field below:

Private final Indexedsorter Sorter;

This field is made up of:

Sorter = reflectionutils.newinstance (Job.getclass ("Map.sort.class", Quicksort.class, Indexedsorter.class), job);

quicksort :

public void sort (final indexedsortable s, int p, int r,final progressable rep);p rivate static void Sortinternal (final Inde xedsortable s, int p, int r,final progressable rep, int depth);

Where the parameter indexsortable is passed with Mapoutputbuffer current, Because this mapoutputbuffer also inherit indexedsortable. This will be compared in quicksort sort sort using the Compare method in the Mapoutputbuffer class, which can be seen in the following source code:

 public int compare (int i, int j) {Final int II = kvoffsets[i% KVOFFSETS.L      Ength];      final int ij = kvoffsets[j% kvoffsets.length]; Sort by partition if (KVINDICES[II + partition]! = Kvindices[ij + partition]) {return kvindices[ii + parti      tion]-kvindices[ij + PARTITION]; }//Sort by key return Comparator.compare (Kvbuffer, Kvindices[ii + Keystart], Kvindices[ii + Valstart]-kvindices[ii + Keystart], Kvbuffer, Kvindices[ij + Keystart], Kvindices[ij + Valsta    RT]-Kvindices[ij + Keystart]); }

However, this method comparator by default is determined by the node "Mapred.output.key.comparator.class", you can also see the source code:

  Public Rawcomparator Getoutputkeycomparator () {    class<? extends rawcomparator> Theclass = GetClass (" Mapred.output.key.comparator.class ",            null, rawcomparator.class);    if (theclass! = null)      return reflectionutils.newinstance (Theclass, this);    Return Writablecomparator.get (Getmapoutputkeyclass (). Assubclass (Writablecomparable.class));  }

This is how to relate the sorting and comparison methods! Now we can implement our own custom type and read, write, compare according to Longwritable's idea. Below writes the code deepens the memory, since is the sort that we prepare the data, below has 2 columns the data request in the first column ascending, the second column descending order:

1    of    2

Custom Type SORTAPI First:

public class Sortapi implements writablecomparable<sortapi> {/** * The first column of data */public Long firstly;        /** * Second column data */public Long second;        Public Sortapi () {} public Sortapi (long First,long second) {this.first = first;    This.second = second;        }/** * Sort right here When: This.first-o.first > 0 Ascending, less than 0 reverse */@Override public int compareTo (Sortapi o) {        Long mis = (this.first-o.first);        if (mis! = 0) {return (int) mis;        } else{return (int) (this.second-o.second);        }} @Override public void write (DataOutput out) throws IOException {Out.writelong (first);    Out.writelong (second);        } @Override public void ReadFields (Datainput in) throws IOException {This.first = In.readlong ();            This.second = In.readlong ();    } @Override public int hashcode () {return This.first.hashCode () + This.second.hashCode (); } @Override   public boolean equals (Object obj) {if (obj instanceof sortapi) {sortapi o = (sortapi) obj;        return This.first = = O.first && This.second = = O.second;    } return false;    } @Override Public String toString () {return "first:" + This.first + "Second:" + This.second; }}

This type overrides CompareTo (Sortapi o), write (DataOutput out), and ReadFields (Datainput in), since there is a comparison then it must be rewritten hashcode (), Equals ( Object obj) method, do not forget this! There is also a need to read and write primarily in the Write method and in the ReadFields method: First write what field first read what field. Second, the CompareTo (Sortapi O) method returns an integer greater than 0, 0, and less than 0 for greater than, equal to, or less than. As to how to determine whether the 2 rows of data are equal, how unequal the logic can be looked at slowly.

Here's a custom mapper, reducer class, and main function:

public class Mymapper extends Mapper<longwritable, Text, Sortapi, longwritable> {            @Override    protected void map (longwritable key, Text Value,context Context) throws IOException, interruptedexception {        string[] splied = VA Lue.tostring (). Split ("\ t");        try {            Long first = Long.parselong (Splied[0]);            Long second = Long.parselong (splied[1]);            Context.write (New Sortapi (First,second), New longwritable (1));        catch (Exception e) {            System.out.println (e.getmessage ());}}}    
public class Myreduce extends Reducer<sortapi, longwritable, LongWritable, longwritable> {@Override protected void reduce (Sortapi key, iterable<longwritable> values, context context ) throws IOException, Interruptedexception {context.write (new longwritable (Key.first), New Longwritable (Key.second    )); }    }
    Static final String Output_dir = "hdfs://hadoop-master:9000/sort/output/";        Static final String Input_dir = "Hdfs://hadoop-master:9000/sort/input/test.txt";        public static void Main (string[] args) throws Exception {configuration conf = new Configuration ();                Job Job = new Job (conf, Test.class.getSimpleName ());                Deleteoutputfile (Output_dir);        1 Set Input directory fileinputformat.setinputpaths (job, Input_dir);        2 Set Input Format class Job.setinputformatclass (Textinputformat.class);        3 Set Custom mapper and key value type Job.setmapperclass (Mymapper.class);        Job.setmapoutputkeyclass (Sortapi.class);        Job.setmapoutputvalueclass (Longwritable.class);        4 partition Job.setpartitionerclass (Hashpartitioner.class);                Job.setnumreducetasks (1);        5 Sort grouping//6 set in a certain reduce and key value type Job.setreducerclass (Myreduce.class);        Job.setoutputkeyclass (Longwritable.class); Job.setoutputvalueclass (longwritable.cLASS);        7 Set Output directory Fileoutputformat.setoutputpath (Job, New Path (Output_dir));    8 Submit Job Job.waitforcompletion (TRUE);        } static void Deleteoutputfile (String path) throws exception{Configuration conf = new configuration ();        FileSystem fs = Filesystem.get (new URI (Input_dir), conf);        if (fs.exists (new Path)) {fs.delete (path); }    }

This allows you to run the view results directly under Eclipse:

1       of       2

This results correctly, if the first column is required to flashback to the second column in ascending order, what to do, this only needs to be modified under CompareTo (Sortapi O):

    @Override public    int compareTo (Sortapi o) {        long mis = (this.first-o.first) *-1;        if (mis! = 0) {            return (int) mis;        }        else{            return (int) (this.second-o.second);        }    }

This saves the result in the run:

3       of       2

and it's the right thing to do with this requirement.

Leave a small problem: When was this CompareTo (Sortapi O) method called, and the total number of calls was made several times?

Come here first this time. Keep a record of every bit of drip!


Hadoop MapReduce Custom Sort writablecomparable

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.