Hadoop Auxiliary Sorting sample One

Last Update:2015-11-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Sample Data

011990-99999sihccajavri012650-99999tynset-hansmoen

012650-99999194903241200111012650-9999919490324180078011990-999991950051507000011990-9999919500515120022011990-9999919500 5151800-11

2. Requirements

3. Ideas, Code
The meteorological station information and weather information of the same weather station ID are processed by the same Reducer, and the weather station information is guaranteed to arrive first; then the reduce () function obtains the meteorological observatory name from the first row, and the second line begins with the information and outputs.

Import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writablecomparable;import Org.apache.hadoop.io.writablecomparator;import Org.apache.hadoop.io.writableutils;import Java.io.DataInput; Import java.io.dataoutput;import java.io.ioexception;/** * Key combination, this example is used for auxiliary sorting, including the station ID and "tag". * "Tag" is a virtual field whose sole purpose is to sort records so that the weather station's records arrive first than the synoptic records. * Although you can not specify the data transfer order and cache the pending records in memory, you should try to avoid this situation, because the number of records in any of these groups can be very large, far beyond the reducer available */public class Textpair    Implements writablecomparable<textpair> {private Text first;    Private Text second;    Public Textpair () {Set (new text (), new text ());    Public Textpair (string first, string second) {Set (new text (first), new text (second));    Public Textpair (text first, text second) {Set (first, second);        } public void set (text first, text second) {This.first = first;    This.second = second;    } public Text GetFirst () {return first;    } public Text Getsecond () {return second; } public void Write (DataOutput out) throws IOException {first.write (out);    Second.write (out);        } public void ReadFields (Datainput in) throws IOException {First.readfields (in);    Second.readfields (in);    } @Override public int hashcode () {return First.hashcode () * 163 + second.hashcode (); } @Override public boolean equals (Object obj) {if (obj instanceof textpair) {textpair TP = (Text            Pair) obj;        Return First.equals (Tp.first) && second.equals (Tp.second);    } return false;    } @Override Public String toString () {return first + "\ T" + second;        } public int compareTo (Textpair o) {int cmp = First.compareto (O.first);        if (cmp = = 0) {cmp = Second.compareto (O.second);    } return CMP; }//Rawcomparator allows direct comparison of records in the data stream without first deserializing the data stream, thus avoiding the overhead of creating new objects//Writablecomparator is for inheriting from the Writablecomparable class of Raw    A generic implementation of the Comparator. Public StatiC Class Firstcomparator extends Writablecomparator {private static final text.comparator Text_comparator = new Tex        T.comparator ();        Public Firstcomparator () {super (Textpair.class); } @Override public int compare (byte[] b1, int s1, int L1, byte[] b2, int s2, int l 2) {try {//firstL1, firstL2 represents the length of the first Text field in each byte stream int firstL1 = writableutils.                Decodevintsize (B1[S1]) + Readvint (B1, S1);                int firstL2 = Writableutils.decodevintsize (B2[s2]) + readvint (B2, S2);            Return Text_comparator.compare (B1, S1, firstL1, B2, S2, firstL2);            } catch (IOException e) {throw new IllegalArgumentException (e); }} @Override public int compare (writablecomparable A, writablecomparable b) {if (a Insta nceof Textpair && b instanceof Textpair) {return ((Textpair) a). First.compareto (((TexTpair) (b). first);        } return Super.compare (A, b); }    }}

Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.mapper;import java.io.ioexception;/** * Flag Station recorded Mapper */public class Joinstationmapper Extends mapper<longwritable, text, Textpair, text> {    @Override    protected void Map (longwritable key, text Value, Context context) throws IOException, interruptedexception {        string[] val = value.tostring (). Split ("\\t");        if (val.length = = 2) {            context.write (new Textpair (val[0], "0"), new Text (val[1]));}}

Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.mapper;import java.io.ioexception;/** * Flag Weather record Mapper */public class Joinrecordmapper Extends mapper<longwritable, text, Textpair, text> {    @Override    protected void Map (longwritable key, text Value, Context context) throws IOException, interruptedexception {        string[] val = value.tostring (). Split ("\\t");        if (val.length = = 3) {            context.write (new Textpair (Val[0], "1"), new Text (val[1] + "\ T" + val[2]));}}

Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.reducer;import Java.io.ioexception;import java.util.iterator;/** * Connection of tagged weather station records and weather records reducer */public class Joinreducer extends Reducer<textpair, text, text, Te xt> {    @Override    protected void reduce (Textpair key, iterable<text> values, context context) throws IOException, interruptedexception {        iterator<text> iter = Values.iterator ();        Text StationName = new text (Iter.next ()); Reducer will receive the weather station records first (it must not be written as Text StationName = Iter.next ();        while (Iter.hasnext ()) {            Text record = Iter.next ();            Text Outvalue = new text (stationname.tostring () + "\ T" + record.tostring ());            Context.write (Key.getfirst (), outvalue);}}}

Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import org.apache.hadoop.io.Text; Import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.partitioner;import Org.apache.hadoop.mapreduce.lib.input.multipleinputs;import Org.apache.hadoop.mapreduce.lib.input.textinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser;        public class Joinrecordwithstationname {static Class Keypartitioner extends Partitioner<textpair, text> { @Override public int getpartition (Textpair textpair, text text, int numpartitions) {return (TEXTPAIR.G        Etfirst (). Hashcode () & integer.max_value)% Numpartitions;        }} public static void Main (string[] args) throws Exception {Configuration conf = new configuration ();        string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (otherargs.length! = 3) {System. ERR.PRINTLN ("Parameter number is wrong, please enter three PARAMETERS:&LT;NCDC input> <station input> &LT;OUTPU            T> ");        System.exit (-1);        } Path Ncdcinputpath = new Path (otherargs[0]);        Path Stationinputpath = new Path (otherargs[1]);        Path OutputPath = new Path (otherargs[2]);        Conf.set ("Fs.defaultfs", "hdfs://vmnode.zhch:9000");        Job Job = job.getinstance (conf, "Joinrecordwithstationname");        Job.setjar ("F:/workspace/assistranking/target/assistranking-1.0-snapshot.jar");        Multipleinputs.addinputpath (Job, Ncdcinputpath, Textinputformat.class, Joinrecordmapper.class);        Multipleinputs.addinputpath (Job, Stationinputpath, Textinputformat.class, Joinstationmapper.class);        Fileoutputformat.setoutputpath (Job, OutputPath); Only by first (meteorological station ID) partitioning, grouping (records of the same partition will be processed by the same reducer, records of the same group in the same area will be processed by the same reducer in the same reduce () function call) Job.setpartitionerclass (        Keypartitioner.class); Job.setgroupingcomparatorclass (TeXtPair.FirstComparator.class);        Job.setmapoutputkeyclass (Textpair.class);        Job.setreducerclass (Joinreducer.class);        Job.setoutputkeyclass (Text.class);    System.exit (Job.waitforcompletion (true)? 0:1); }}

4. Running Results

Note: This example originates from the third version of the Hadoop authoritative guide 8.3.2

Hadoop Auxiliary Sorting sample One

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop Auxiliary Sorting sample One

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop Auxiliary Sorting sample One

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support