MapReduce Learning 4----Custom partitioning, custom sorting, custom components

Last Update:2016-10-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Map Task Processing

1.3 Partition the Output key, value.

The purpose of partitioning refers to handing over the same classification of <k,v> to the same reducer task.

public static class Mypartitioner<text, longwritable> extends Partitioner<text, longwritable>{static hashmap<string,integer> map = Null;static{map = new Hashmap<string,integer> () map.put ("Gz1", 0); Map.put (" Gz2 ", 0); Map.put (" Sz1 ", 1); Map.put (" SZ2 ", 1);} /** * Here is the <k2,v2> operation on the output of the mapper task * getpartition function returns how many values, how many reducer tasks *  * "GZ1" and "GZ2" return are all 0, So with the distribution to the same reducer task, but the value of K2 is not the same * so grouping is * <gz1,123> * <gz2,234> * then appears on the different reduce function */@Overridepublic int GETPA Rtition (Text key, longwritable value, int numpartitions) {return (Integer) Map.get (key.tostring ()). Intvalue ();}}


Set partition
Wcjob.setpartitionerclass (Mypartitioner.class);

Custom sort, sort by K2, K2 need to customize type yourself

 private static class Mynewkey implements writablecomparable<mynewkey> {long firstnum;        Long Secondnum;            Public Mynewkey () {} public Mynewkey (long first, long second) {firstnum = first;        Secondnum = second;            } @Override public void write (DataOutput out) throws IOException {Out.writelong (firstnum);        Out.writelong (Secondnum);            } @Override public void ReadFields (Datainput in) throws IOException {firstnum = In.readlong ();        Secondnum = In.readlong (); }/* * The following Compreto method is called when key is sorted */@Override public int compareTo (Mynewkey Anotherk            EY) {Long min = Firstnum-anotherkey.firstnum;            if (min! = 0) {//indicates that the first column is not equal, then returns a small number of two numbers to return (int) min;            } else {return (int) (secondnum-anotherkey.secondnum); }        }    }

Custom Grouping

To group the new key types, we also need to customize the grouping rules:

(1) Write a new grouping comparison type for our groupings:

private static class Mygroupingcomparator implements            rawcomparator<mynewkey> {        /         * * Basic grouping rule: Group by first column Firstnum         */        @Override public        int compare (Mynewkey key1, Mynewkey key2) {            return (int) ( key1.firstnum-key2.firstnum);        }        /*         * @param B1 represents the first byte array to participate in the comparison         *          * @param S1 represents the starting position of the first byte array to participate in the comparison         *          @param L1 Represents the offset of the first byte array participating in the comparison         * *          @param B2 represents the second byte array participating in the comparison         *          @param S2 represents the starting position of the second byte array participating in the comparison         *          @ Param L2 represents the offset of the second byte array participating in the comparison         *        /@Override public        int compare (byte[] b1, int s1, int L1, byte[] b2, int s2, I NT L2) {            return Writablecomparator.comparebytes (B1, S1, 8, B2, S2, 8);        }    }

From the code we can know that we have customized a packet comparator Mygroupingcomparator, which implements the Rawcomparator interface, and Rawcomparator interface realizes the comparator interface, Here's a look at the definitions of these two interfaces:

The first is the definition of the Rawcomparator interface:

Public interface Rawcomparator<t> extends comparator<t> {public  int compare (byte[] b1, int s1, int. L1, by te[] B2, int s2, int l2);}

Next is the definition of the comparator interface:

Public interface comparator<t> {    int compare (t O1, T O2);    Boolean equals (Object obj);}

The definitions in these two interfaces are implemented in Mygroupingcomparator , and the Compare () method inRawcomparator is a byte -based comparison . the Compare () method in Comparator is an object -based comparison.

In the byte-based comparison method, there are six parameters, all of a sudden blurred:

Params:

* @param arg0 represents the first byte array to participate in a comparison
* @param arg1 indicates the starting position of the first byte array to participate in the comparison
* @param arg2 represents the offset of the first byte array participating in the comparison
*
* @param arg3 represents the second byte array to participate in the comparison
* @param ARG4 indicates the starting position of the second byte array participating in the comparison
* @param arg5 represents the offset of the second byte array participating in the comparison

Since there are two long types in Mynewkey, each long type also occupies 8 bytes. This is because the first column of numbers is compared, so the read offset is 8 bytes.

(2) Add the settings for the grouping rule:

Set custom grouping rules   Job.setgroupingcomparatorclass (Mygroupingcomparator.class);

MapReduce Learning 4----Custom partitioning, custom sorting, custom components

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MapReduce Learning 4----Custom partitioning, custom sorting, custom components

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MapReduce Learning 4----Custom partitioning, custom sorting, custom components

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support