"hadoop"mapreduce the temperature data by custom sorting, grouping, partitioning, etc. _

"hadoop"mapreduce the temperature data by custom sorting, grouping, partitioning, etc. __hadoop

Last Update:2018-08-20 Source: Internet

Author: User

Tags serialization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from http://www.ptbird.cn/mapreduce-tempreture.html

I. Description of requirements 1, data file description

There are some data files stored in the HDFs in the form of text, as shown in the following example:

In the middle of the date and time is a space, as a whole, indicating the detection of site monitoring time, followed by the detection of temperature, the middle through the tab \ t separated.

2, the demand calculation in 1949-1955 years, the annual temperature descending order and each year a separate file output storage

Custom partitions, custom groupings, and custom sorting are required. second, solve 1. Train of Thought

Sort by year in ascending order and then in descending order of annual temperature

Grouped by year, each year corresponds to a reduce task 2, a custom mapper output type KeyPair

As you can see, each row of temperature is called a data, there are two parts of each data, part of the time, the other part is temperature.

Therefore, the map output must be output in a custom format, and the output needs to be customized for sorting and grouping operations, and the default ones are not used.

Define KeyPair

The custom output type is run by putting the map's output into reduce, so you need to implement the Writablecomparable interface of Hadoop, and the template variable for that interface is KeyPair, It's like longwritable a meaning (see longwritable's definition to know)

To implement the Writablecomparable interface, you must override the Write/readfileds/compareto three methods, which in turn Act on serialization/deserialization/comparison

You also need to rewrite ToString and hashcode to avoid the equals problem.

KeyPair is defined as follows

It is worth noting that the serialized output is written, which uses the time conversion of the standard format (the format time shown in the file) to the Datainput and DataOutput

Import org.apache.hadoop.io.WritableComparable;
Import Java.io.DataInput;
Import Java.io.DataOutput;

Import java.io.IOException; /** * PROJECT:HADOOPTEST2 * Package:com.mapreducetest.temp * user:postbird @ http://www.ptbird.cn * TIM e:2017-01-19 21:53//** * For temperature and year package for the year and temp for temperature * * Public class KeyPair implements Writablecompara
    ble<keypair>{/years private int year;

    temperature private int temp;
    public void setyear (int year) {this.year = year;
    public void settemp (int temp) {this.temp = temp;
    public int getyear () {return year;
    public int gettemp () {return temp; @Override public int compareTo (KeyPair o) {//passed objects and current year comparisons are equal to 0 not equal to 1 int result=integer.com
        Pare (Year,o.getyear ());
        If (Result!= 0) {//Two year not equal return result; //If the year equals comparison temperature return Integer.compare (Temp,o.getteMP ()); @Override//serialization of public void write (DataOutput dataoutput) throws IOException {Dataoutput.writeint (ye
       AR);
    Dataoutput.writeint (temp); @Override//deserialization of public void ReadFields (Datainput datainput) throws IOException {THIS.YEAR=DATAINP
        Ut.readint ();
    This.temp=datainput.readint ();
    @Override public String toString () {return year+ "\ t" +TEMP;
    @Override public int hashcode () {return new Integer (year+temp). Hashcode ();
 }
}

3, custom group

The temperature of the same year is put together, so the year needs to be compared.

So compare the years in the input data, notice that the comparison is keypair type, and the output from map is this type.

Because Writablecomparator is inherited, it is necessary to rewrite the Compare method, comparing the KeyPair (KeyPair implements the Writablecomparable interface), actually comparing their year, the same year to get 0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More