Hadoop Submit job custom sorting and grouping

Source: Internet
Author: User

The existing data is as follows:

3 3
3 2
3 1
2 2
2 1
1 1

Requirements are:

Sort from small to large first column, if the first column is the same, sort by second column from small to large


If Hadoop is the default sorting method, you can only compare key, which is the first column, and value is not able to participate in the sort

A custom collation is required.

Solution Ideas:

Custom data type, wrapping the original key and value in

Use this data type as key, so that when you compare keys, you can include the values of the first column and the second column.


The custom data type NewK2 is as follows:

To implement a custom collation, the Writablecomparable interface must be implemented, and the generic parameter is the class itself public class NewK2 implements writablecomparable<newk2> {// Represents the first column and the second column of data. Long Second;public NewK2 () {}public NewK2 (long first, long second) {This.first = First;this.second = Second;} overriding serialization and deserialization methods @overridepublic void ReadFields (Datainput in) throws IOException {This.first = In.readlong (); This.second = In.readlong ();} @Overridepublic void Write (DataOutput out) throws IOException {Out.writelong (first); Out.writelong (second);} This method is called automatically when K2 is sorted. When the first column is not the same, ascending; When the first column is the same, the second column is ascending///If you want to sort in descending order, then you only need to swap the @overridepublic int compareTo (NewK2 o) {if (This.first! =) for this object and O object. O.first) {return (int) (This.first-o.first);} Else{return (int) (This.second-o.second);}} Override Hashcode and Equals method @overridepublic int hashcode () {return This.first.hashCode () + This.second.hashCode ();} @Overridepublic boolean equals (Object obj) {if (!) ( obj instanceof NewK2)) {return false;} NewK2 oK2 = (NewK2) obj;return (This.first = Ok2.first) && (This.second = Ok2.second);}} 


Mymapper Class Code:

public class Mymapper extendsmapper<longwritable, Text, NewK2, longwritable> {protected void map (longwritable key, Text value,org.apache.hadoop.mapreduce.mapper<longwritable, text, NewK2, Longwritable> Context context) throws Java.io.IOException, Interruptedexception {final string[] splited = value.tostring (). Split ("\ T" )////After the completion of the data such as: 3,1  respectively assigned to the K2 object's first and second properties final NewK2 k2 = new NewK2 (Long.parselong (splited[0)), Long.parselong (splited[1])); Final Longwritable v2 = new Longwritable (Long.parselong (splited[1])),//K2 as a key output, so that in order to call NewK2 CompareTo method, It is written in our own collation Context.write (K2, v2);};}


Myreducer Class Code:

public class Myreducer Extendsreducer<newk2, Longwritable, longwritable, longwritable> {protected void reduce ( NewK2 k2,java.lang.iterable<longwritable> V2s,org.apache.hadoop.mapreduce.reducer<newk2, LongWritable, Longwritable, Longwritable> Context context) throws Java.io.IOException, Interruptedexception {context.write (new longwritable (K2.first), new Longwritable (K2.second));};}

The code for the Mysubmit class does not have to be changed as before

Run to get results such as:



If the business requirements change again, such as in the result, the first column is the same, as long as the list of the second column has the lowest value of the option

Then the result should be
1 1

2 1

3 1

But we used to use a custom data type as key

The default grouping policy for Hadoop is that all keys have the same option as a set of

For two NewK2 objects to be equal, you must have both first and second attributes equal.

Then you need to use a custom grouping policy


The custom grouping classes are as follows:

The custom grouping class must implement Rawcomparator, the generic parameter is the class itself public class Mygroupingcomparator implements rawcomparator<newk2> {// Override two comparison methods//Compare by object, which specifies that as long as two NewK2 objects have the same first property, they are considered equal @overridepublic int compare (NewK2 O1, NewK2 O2) {return (int) (O1.first- O2.first);} /** * @param arg0 *            represents the first byte array to participate in the comparison * @param arg1 *            indicates the starting position of the first byte array to participate in the comparison * @param arg2 *            represents the offset of the first byte array participating in the comparison *
   * @param arg3 *            represents the second byte array to participate in the comparison * @param arg4 *            indicates the starting position of the second byte array participating in the comparison * @param ARG5 * Represents the offset of the            second byte array participating in the comparison */@Ove rridepublic int Compare (byte[] arg0, int arg1, int arg2, byte[] arg3,int arg4, int arg5) {return writablecomparator.compar Ebytes (arg0, arg1, 8, Arg3, ARG4, 8);}}

To add a set grouping policy in the Mysubmit code

1.4 TODO Sort, partition job.setgroupingcomparatorclass (Mygroupingcomparator.class);
Run the program again to get the results as follows:





Hadoop Submit job custom sorting and grouping

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.