1. Map Task Processing
1.3 Partition the Output key, value.
The purpose of partitioning refers to handing over the same classification of <k,v> to the same reducer task.
public static class Mypartitioner<text, longwritable> extends Partitioner<text, longwritable>{static hashmap<string,integer> map = Null;static{map = new Hashmap<string,integer> () map.put ("Gz1", 0); Map.put (" Gz2 ", 0); Map.put (" Sz1 ", 1); Map.put (" SZ2 ", 1);} /** * Here is the <k2,v2> operation on the output of the mapper task * getpartition function returns how many values, how many reducer tasks * * "GZ1" and "GZ2" return are all 0, So with the distribution to the same reducer task, but the value of K2 is not the same * so grouping is * <gz1,123> * <gz2,234> * then appears on the different reduce function */@Overridepublic int GETPA Rtition (Text key, longwritable value, int numpartitions) {return (Integer) Map.get (key.tostring ()). Intvalue ();}}
Set partition
Wcjob.setpartitionerclass (Mypartitioner.class);
Custom sort, sort by K2, K2 need to customize type yourself
private static class Mynewkey implements writablecomparable<mynewkey> {long firstnum; Long Secondnum; Public Mynewkey () {} public Mynewkey (long first, long second) {firstnum = first; Secondnum = second; } @Override public void write (DataOutput out) throws IOException {Out.writelong (firstnum); Out.writelong (Secondnum); } @Override public void ReadFields (Datainput in) throws IOException {firstnum = In.readlong (); Secondnum = In.readlong (); }/* * The following Compreto method is called when key is sorted */@Override public int compareTo (Mynewkey Anotherk EY) {Long min = Firstnum-anotherkey.firstnum; if (min! = 0) {//indicates that the first column is not equal, then returns a small number of two numbers to return (int) min; } else {return (int) (secondnum-anotherkey.secondnum); } } }
Custom Grouping
To group the new key types, we also need to customize the grouping rules:
(1) Write a new grouping comparison type for our groupings:
private static class Mygroupingcomparator implements rawcomparator<mynewkey> { / * * Basic grouping rule: Group by first column Firstnum */ @Override public int compare (Mynewkey key1, Mynewkey key2) { return (int) ( key1.firstnum-key2.firstnum); } /* * @param B1 represents the first byte array to participate in the comparison * * @param S1 represents the starting position of the first byte array to participate in the comparison * @param L1 Represents the offset of the first byte array participating in the comparison * * @param B2 represents the second byte array participating in the comparison * @param S2 represents the starting position of the second byte array participating in the comparison * @ Param L2 represents the offset of the second byte array participating in the comparison * /@Override public int compare (byte[] b1, int s1, int L1, byte[] b2, int s2, I NT L2) { return Writablecomparator.comparebytes (B1, S1, 8, B2, S2, 8); } }
From the code we can know that we have customized a packet comparator Mygroupingcomparator, which implements the Rawcomparator interface, and Rawcomparator interface realizes the comparator interface, Here's a look at the definitions of these two interfaces:
The first is the definition of the Rawcomparator interface:
Public interface Rawcomparator<t> extends comparator<t> {public int compare (byte[] b1, int s1, int. L1, by te[] B2, int s2, int l2);}
Next is the definition of the comparator interface:
Public interface comparator<t> { int compare (t O1, T O2); Boolean equals (Object obj);}
The definitions in these two interfaces are implemented in Mygroupingcomparator , and the Compare () method inRawcomparator is a byte -based comparison . the Compare () method in Comparator is an object -based comparison.
In the byte-based comparison method, there are six parameters, all of a sudden blurred:
Params:
* @param arg0 represents the first byte array to participate in a comparison
* @param arg1 indicates the starting position of the first byte array to participate in the comparison
* @param arg2 represents the offset of the first byte array participating in the comparison
*
* @param arg3 represents the second byte array to participate in the comparison
* @param ARG4 indicates the starting position of the second byte array participating in the comparison
* @param arg5 represents the offset of the second byte array participating in the comparison
Since there are two long types in Mynewkey, each long type also occupies 8 bytes. This is because the first column of numbers is compared, so the read offset is 8 bytes.
(2) Add the settings for the grouping rule:
Set custom grouping rules Job.setgroupingcomparatorclass (Mygroupingcomparator.class);
MapReduce Learning 4----Custom partitioning, custom sorting, custom components