Transferred from http://www.ptbird.cn/mapreduce-tempreture.html
I. Description of requirements 1, data file description
There are some data files stored in the HDFs in the form of text, as shown in the following example:
In the middle of the date and time is a space, as a whole, indicating the detection of site monitoring time, followed by the detection of temperature, the middle through the tab \ t separated.
2, the demand calculation in 1949-1955 years, the annual temperature descending order and each year a separate file output storage
Custom partitions, custom groupings, and custom sorting are required. second, solve 1. Train of Thought
Sort by year in ascending order and then in descending order of annual temperature
Grouped by year, each year corresponds to a reduce task 2, a custom mapper output type KeyPair
As you can see, each row of temperature is called a data, there are two parts of each data, part of the time, the other part is temperature.
Therefore, the map output must be output in a custom format, and the output needs to be customized for sorting and grouping operations, and the default ones are not used.
Define KeyPair
The custom output type is run by putting the map's output into reduce, so you need to implement the Writablecomparable interface of Hadoop, and the template variable for that interface is KeyPair, It's like longwritable a meaning (see longwritable's definition to know)
To implement the Writablecomparable interface, you must override the Write/readfileds/compareto three methods, which in turn Act on serialization/deserialization/comparison
You also need to rewrite ToString and hashcode to avoid the equals problem.
KeyPair is defined as follows
It is worth noting that the serialized output is written, which uses the time conversion of the standard format (the format time shown in the file) to the Datainput and DataOutput
Import org.apache.hadoop.io.WritableComparable;
Import Java.io.DataInput;
Import Java.io.DataOutput;
Import java.io.IOException; /** * PROJECT:HADOOPTEST2 * Package:com.mapreducetest.temp * user:postbird @ http://www.ptbird.cn * TIM e:2017-01-19 21:53//** * For temperature and year package for the year and temp for temperature * * Public class KeyPair implements Writablecompara
ble<keypair>{/years private int year;
temperature private int temp;
public void setyear (int year) {this.year = year;
public void settemp (int temp) {this.temp = temp;
public int getyear () {return year;
public int gettemp () {return temp; @Override public int compareTo (KeyPair o) {//passed objects and current year comparisons are equal to 0 not equal to 1 int result=integer.com
Pare (Year,o.getyear ());
If (Result!= 0) {//Two year not equal return result; //If the year equals comparison temperature return Integer.compare (Temp,o.getteMP ()); @Override//serialization of public void write (DataOutput dataoutput) throws IOException {Dataoutput.writeint (ye
AR);
Dataoutput.writeint (temp); @Override//deserialization of public void ReadFields (Datainput datainput) throws IOException {THIS.YEAR=DATAINP
Ut.readint ();
This.temp=datainput.readint ();
@Override public String toString () {return year+ "\ t" +TEMP;
@Override public int hashcode () {return new Integer (year+temp). Hashcode ();
}
}
3, custom group
The temperature of the same year is put together, so the year needs to be compared.
So compare the years in the input data, notice that the comparison is keypair type, and the output from map is this type.
Because Writablecomparator is inherited, it is necessary to rewrite the Compare method, comparing the KeyPair (KeyPair implements the Writablecomparable interface), actually comparing their year, the same year to get 0