Hadoop Auxiliary Sorting sample One

Source: Internet
Author: User

1. Sample Data
011990-99999sihccajavri012650-99999tynset-hansmoen


012650-99999194903241200111012650-9999919490324180078011990-999991950051507000011990-9999919500515120022011990-9999919500 5151800-11


2. Requirements


3. Ideas, Code
The meteorological station information and weather information of the same weather station ID are processed by the same Reducer, and the weather station information is guaranteed to arrive first; then the reduce () function obtains the meteorological observatory name from the first row, and the second line begins with the information and outputs.
Import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writablecomparable;import Org.apache.hadoop.io.writablecomparator;import Org.apache.hadoop.io.writableutils;import Java.io.DataInput; Import java.io.dataoutput;import java.io.ioexception;/** * Key combination, this example is used for auxiliary sorting, including the station ID and "tag". * "Tag" is a virtual field whose sole purpose is to sort records so that the weather station's records arrive first than the synoptic records. * Although you can not specify the data transfer order and cache the pending records in memory, you should try to avoid this situation, because the number of records in any of these groups can be very large, far beyond the reducer available */public class Textpair    Implements writablecomparable<textpair> {private Text first;    Private Text second;    Public Textpair () {Set (new text (), new text ());    Public Textpair (string first, string second) {Set (new text (first), new text (second));    Public Textpair (text first, text second) {Set (first, second);        } public void set (text first, text second) {This.first = first;    This.second = second;    } public Text GetFirst () {return first;    } public Text Getsecond () {return second; } public void Write (DataOutput out) throws IOException {first.write (out);    Second.write (out);        } public void ReadFields (Datainput in) throws IOException {First.readfields (in);    Second.readfields (in);    } @Override public int hashcode () {return First.hashcode () * 163 + second.hashcode (); } @Override public boolean equals (Object obj) {if (obj instanceof textpair) {textpair TP = (Text            Pair) obj;        Return First.equals (Tp.first) && second.equals (Tp.second);    } return false;    } @Override Public String toString () {return first + "\ T" + second;        } public int compareTo (Textpair o) {int cmp = First.compareto (O.first);        if (cmp = = 0) {cmp = Second.compareto (O.second);    } return CMP; }//Rawcomparator allows direct comparison of records in the data stream without first deserializing the data stream, thus avoiding the overhead of creating new objects//Writablecomparator is for inheriting from the Writablecomparable class of Raw    A generic implementation of the Comparator. Public StatiC Class Firstcomparator extends Writablecomparator {private static final text.comparator Text_comparator = new Tex        T.comparator ();        Public Firstcomparator () {super (Textpair.class); } @Override public int compare (byte[] b1, int s1, int L1, byte[] b2, int s2, int l 2) {try {//firstL1, firstL2 represents the length of the first Text field in each byte stream int firstL1 = writableutils.                Decodevintsize (B1[S1]) + Readvint (B1, S1);                int firstL2 = Writableutils.decodevintsize (B2[s2]) + readvint (B2, S2);            Return Text_comparator.compare (B1, S1, firstL1, B2, S2, firstL2);            } catch (IOException e) {throw new IllegalArgumentException (e); }} @Override public int compare (writablecomparable A, writablecomparable b) {if (a Insta nceof Textpair && b instanceof Textpair) {return ((Textpair) a). First.compareto (((TexTpair) (b). first);        } return Super.compare (A, b); }    }}


Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.mapper;import java.io.ioexception;/** * Flag Station recorded Mapper */public class Joinstationmapper Extends mapper<longwritable, text, Textpair, text> {    @Override    protected void Map (longwritable key, text Value, Context context) throws IOException, interruptedexception {        string[] val = value.tostring (). Split ("\\t");        if (val.length = = 2) {            context.write (new Textpair (val[0], "0"), new Text (val[1]));}}    


Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.mapper;import java.io.ioexception;/** * Flag Weather record Mapper */public class Joinrecordmapper Extends mapper<longwritable, text, Textpair, text> {    @Override    protected void Map (longwritable key, text Value, Context context) throws IOException, interruptedexception {        string[] val = value.tostring (). Split ("\\t");        if (val.length = = 3) {            context.write (new Textpair (Val[0], "1"), new Text (val[1] + "\ T" + val[2]));}}    


Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.reducer;import Java.io.ioexception;import java.util.iterator;/** * Connection of tagged weather station records and weather records reducer */public class Joinreducer extends Reducer<textpair, text, text, Te xt> {    @Override    protected void reduce (Textpair key, iterable<text> values, context context) throws IOException, interruptedexception {        iterator<text> iter = Values.iterator ();        Text StationName = new text (Iter.next ()); Reducer will receive the weather station records first (it must not be written as Text StationName = Iter.next ();        while (Iter.hasnext ()) {            Text record = Iter.next ();            Text Outvalue = new text (stationname.tostring () + "\ T" + record.tostring ());            Context.write (Key.getfirst (), outvalue);}}}    


Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import org.apache.hadoop.io.Text; Import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.partitioner;import Org.apache.hadoop.mapreduce.lib.input.multipleinputs;import Org.apache.hadoop.mapreduce.lib.input.textinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser;        public class Joinrecordwithstationname {static Class Keypartitioner extends Partitioner<textpair, text> { @Override public int getpartition (Textpair textpair, text text, int numpartitions) {return (TEXTPAIR.G        Etfirst (). Hashcode () & integer.max_value)% Numpartitions;        }} public static void Main (string[] args) throws Exception {Configuration conf = new configuration ();        string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (otherargs.length! = 3) {System. ERR.PRINTLN ("Parameter number is wrong, please enter three PARAMETERS:&LT;NCDC input> <station input> &LT;OUTPU            T> ");        System.exit (-1);        } Path Ncdcinputpath = new Path (otherargs[0]);        Path Stationinputpath = new Path (otherargs[1]);        Path OutputPath = new Path (otherargs[2]);        Conf.set ("Fs.defaultfs", "hdfs://vmnode.zhch:9000");        Job Job = job.getinstance (conf, "Joinrecordwithstationname");        Job.setjar ("F:/workspace/assistranking/target/assistranking-1.0-snapshot.jar");        Multipleinputs.addinputpath (Job, Ncdcinputpath, Textinputformat.class, Joinrecordmapper.class);        Multipleinputs.addinputpath (Job, Stationinputpath, Textinputformat.class, Joinstationmapper.class);        Fileoutputformat.setoutputpath (Job, OutputPath); Only by first (meteorological station ID) partitioning, grouping (records of the same partition will be processed by the same reducer, records of the same group in the same area will be processed by the same reducer in the same reduce () function call) Job.setpartitionerclass (        Keypartitioner.class); Job.setgroupingcomparatorclass (TeXtPair.FirstComparator.class);        Job.setmapoutputkeyclass (Textpair.class);        Job.setreducerclass (Joinreducer.class);        Job.setoutputkeyclass (Text.class);    System.exit (Job.waitforcompletion (true)? 0:1); }}


4. Running Results


Note: This example originates from the third version of the Hadoop authoritative guide 8.3.2

Hadoop Auxiliary Sorting sample One

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.