MapReduce Learning 4----Custom partitioning, custom sorting, custom components

Source: Internet
Author: User

1. Map Task Processing

1.3 Partition the Output key, value.

The purpose of partitioning refers to handing over the same classification of <k,v> to the same reducer task.

public static class Mypartitioner<text, longwritable> extends Partitioner<text, longwritable>{static hashmap<string,integer> map = Null;static{map = new Hashmap<string,integer> () map.put ("Gz1", 0); Map.put (" Gz2 ", 0); Map.put (" Sz1 ", 1); Map.put (" SZ2 ", 1);} /** * Here is the <k2,v2> operation on the output of the mapper task * getpartition function returns how many values, how many reducer tasks *  * "GZ1" and "GZ2" return are all 0, So with the distribution to the same reducer task, but the value of K2 is not the same * so grouping is * <gz1,123> * <gz2,234> * then appears on the different reduce function */@Overridepublic int GETPA Rtition (Text key, longwritable value, int numpartitions) {return (Integer) Map.get (key.tostring ()). Intvalue ();}}


Set partition
Wcjob.setpartitionerclass (Mypartitioner.class);

Custom sort, sort by K2, K2 need to customize type yourself

 private static class Mynewkey implements writablecomparable<mynewkey> {long firstnum;        Long Secondnum;            Public Mynewkey () {} public Mynewkey (long first, long second) {firstnum = first;        Secondnum = second;            } @Override public void write (DataOutput out) throws IOException {Out.writelong (firstnum);        Out.writelong (Secondnum);            } @Override public void ReadFields (Datainput in) throws IOException {firstnum = In.readlong ();        Secondnum = In.readlong (); }/* * The following Compreto method is called when key is sorted */@Override public int compareTo (Mynewkey Anotherk            EY) {Long min = Firstnum-anotherkey.firstnum;            if (min! = 0) {//indicates that the first column is not equal, then returns a small number of two numbers to return (int) min;            } else {return (int) (secondnum-anotherkey.secondnum); }        }    }

Custom Grouping

To group the new key types, we also need to customize the grouping rules:

(1) Write a new grouping comparison type for our groupings:

private static class Mygroupingcomparator implements            rawcomparator<mynewkey> {        /         * * Basic grouping rule: Group by first column Firstnum         */        @Override public        int compare (Mynewkey key1, Mynewkey key2) {            return (int) ( key1.firstnum-key2.firstnum);        }        /*         * @param B1 represents the first byte array to participate in the comparison         *          * @param S1 represents the starting position of the first byte array to participate in the comparison         *          @param L1 Represents the offset of the first byte array participating in the comparison         * *          @param B2 represents the second byte array participating in the comparison         *          @param S2 represents the starting position of the second byte array participating in the comparison         *          @ Param L2 represents the offset of the second byte array participating in the comparison         *        /@Override public        int compare (byte[] b1, int s1, int L1, byte[] b2, int s2, I NT L2) {            return Writablecomparator.comparebytes (B1, S1, 8, B2, S2, 8);        }    }

From the code we can know that we have customized a packet comparator Mygroupingcomparator, which implements the Rawcomparator interface, and Rawcomparator interface realizes the comparator interface, Here's a look at the definitions of these two interfaces:

The first is the definition of the Rawcomparator interface:

Public interface Rawcomparator<t> extends comparator<t> {public  int compare (byte[] b1, int s1, int. L1, by te[] B2, int s2, int l2);}

Next is the definition of the comparator interface:

Public interface comparator<t> {    int compare (t O1, T O2);    Boolean equals (Object obj);}

The definitions in these two interfaces are implemented in Mygroupingcomparator , and the Compare () method inRawcomparator is a byte -based comparison . the Compare () method in Comparator is an object -based comparison.

In the byte-based comparison method, there are six parameters, all of a sudden blurred:

Params:

* @param arg0 represents the first byte array to participate in a comparison
* @param arg1 indicates the starting position of the first byte array to participate in the comparison
* @param arg2 represents the offset of the first byte array participating in the comparison
*
* @param arg3 represents the second byte array to participate in the comparison
* @param ARG4 indicates the starting position of the second byte array participating in the comparison
* @param arg5 represents the offset of the second byte array participating in the comparison

Since there are two long types in Mynewkey, each long type also occupies 8 bytes. This is because the first column of numbers is compared, so the read offset is 8 bytes.

(2) Add the settings for the grouping rule:

Set custom grouping rules   Job.setgroupingcomparatorclass (Mygroupingcomparator.class);

MapReduce Learning 4----Custom partitioning, custom sorting, custom components

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.