"hadoop"mapreduce the temperature data by custom sorting, grouping, partitioning, etc. __hadoop

Source: Internet
Author: User
Tags serialization

Transferred from http://www.ptbird.cn/mapreduce-tempreture.html


I. Description of requirements 1, data file description

There are some data files stored in the HDFs in the form of text, as shown in the following example:

In the middle of the date and time is a space, as a whole, indicating the detection of site monitoring time, followed by the detection of temperature, the middle through the tab \ t separated.

2, the demand calculation in 1949-1955 years, the annual temperature descending order and each year a separate file output storage

Custom partitions, custom groupings, and custom sorting are required. second, solve 1. Train of Thought

Sort by year in ascending order and then in descending order of annual temperature

Grouped by year, each year corresponds to a reduce task 2, a custom mapper output type KeyPair

As you can see, each row of temperature is called a data, there are two parts of each data, part of the time, the other part is temperature.

Therefore, the map output must be output in a custom format, and the output needs to be customized for sorting and grouping operations, and the default ones are not used.

Define KeyPair

The custom output type is run by putting the map's output into reduce, so you need to implement the Writablecomparable interface of Hadoop, and the template variable for that interface is KeyPair, It's like longwritable a meaning (see longwritable's definition to know)

To implement the Writablecomparable interface, you must override the Write/readfileds/compareto three methods, which in turn Act on serialization/deserialization/comparison

You also need to rewrite ToString and hashcode to avoid the equals problem.

KeyPair is defined as follows

It is worth noting that the serialized output is written, which uses the time conversion of the standard format (the format time shown in the file) to the Datainput and DataOutput

Import org.apache.hadoop.io.WritableComparable;
Import Java.io.DataInput;
Import Java.io.DataOutput;

Import java.io.IOException; /** * PROJECT:HADOOPTEST2 * Package:com.mapreducetest.temp * user:postbird @ http://www.ptbird.cn * TIM e:2017-01-19 21:53//** * For temperature and year package for the year and temp for temperature * * Public class KeyPair implements Writablecompara
    ble<keypair>{/years private int year;

    temperature private int temp;
    public void setyear (int year) {this.year = year;
    public void settemp (int temp) {this.temp = temp;
    public int getyear () {return year;
    public int gettemp () {return temp; @Override public int compareTo (KeyPair o) {//passed objects and current year comparisons are equal to 0 not equal to 1 int result=integer.com
        Pare (Year,o.getyear ());
        If (Result!= 0) {//Two year not equal return result; //If the year equals comparison temperature return Integer.compare (Temp,o.getteMP ()); @Override//serialization of public void write (DataOutput dataoutput) throws IOException {Dataoutput.writeint (ye
       AR);
    Dataoutput.writeint (temp); @Override//deserialization of public void ReadFields (Datainput datainput) throws IOException {THIS.YEAR=DATAINP
        Ut.readint ();
    This.temp=datainput.readint ();
    @Override public String toString () {return year+ "\ t" +TEMP;
    @Override public int hashcode () {return new Integer (year+temp). Hashcode ();
 }
}
3, custom group

The temperature of the same year is put together, so the year needs to be compared.

So compare the years in the input data, notice that the comparison is keypair type, and the output from map is this type.

Because Writablecomparator is inherited, it is necessary to rewrite the Compare method, comparing the KeyPair (KeyPair implements the Writablecomparable interface), actually comparing their year, the same year to get 0

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.