The moving average typically processes time series data and what is a data series. The so-called data series refers to the data is very close to the time, such as stock data, the price of each stock according to seconds, minutes, hours, days change and change, time confusion, will lead to complete invalid data, such as monitoring data, there are some industrial equipment data. So there is now a dedicated time series database Opentsdb.
The specific moving average concept does not make a specific introduction, what more information can be queried. This shows only the code that deals with the moving average algorithm through MapReduce. The following is a simple moving average process.
The original data is as follows:
gold,2017-11-11,89
gold,2017-11-12,189
gold,2017-11-13,289
gold,2017-11-14,389
Gold, 2017-11-15,489
gold,2017-11-16,589
gold,2017-11-17,689
gold,2017-11-18,789
gold,2017-11-19,889
gold,2017-11-20,989
dog,2017-11-13,19
dog,2017-11-14,29
dog,2017-11-15,39
Dog, 2017-11-16,49
dog,2017-11-17,59
dog,2017-11-18,69
dog,2017-11-19,89
dog,2017-11-20,99
We assume that the above is 2 stocks, namely gold, dog, and then the top right of the above data is the closing price of each day, now to calculate the average of the 2 stocks in 3-day units. Take Gold for example:
The first day, 89, the next day 89+189/2 the third day 89+189+289/3 fourth day 189+289+389/3 and so on.
Moving average just now, time order is very important, so all the data must be sorted by time first. To achieve this through MapReduce, the first to achieve two order, in the two order of the description, MapReduce will only sort key. So to sort the time, we also have to achieve two order, and then take the ordered data to do the moving average.
The code is as follows:
Package com.isesol.mapreduce;
Import Java.io.DataInput;
Import Java.io.DataOutput;
Import java.io.IOException;
Import java.util.LinkedList;
Import Java.util.Queue;
Import Org.apache.curator.framework.recipes.barriers.DistributedDoubleBarrier;
Import org.apache.hadoop.conf.Configurable;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.DoubleWritable;
Import Org.apache.hadoop.io.Text;
Import org.apache.hadoop.io.WritableComparable;
Import Org.apache.hadoop.io.WritableComparator;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Partitioner;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.Mapper.Context;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Com.cloudera.io.netty.handler.codec.http.HttpContentEncoder.Result; Import Com.sun.org.apache.bcel. Internal.generic.NEW; public class MovingAverage {public static class Tokenizermapper extends Mapper<object, Text, Compositekey, text>
{Private Text data = new text ();
Private Compositekey Newkey = new Compositekey (); public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception {string[] val = valu
E.tostring (). Split (",");
String stock = val[0];
String time = val[1];
String price = val[2];
Newkey.setstock (stock);
Newkey.settime (val[1]);
Context.write (Newkey, New Text (price)); }} public static class Twopartitions extends Partitioner<compositekey, text> implements configurable {Publi c int Getpartition (Compositekey key, Text value, int numpartitions) {//TODO auto-generated method stub return (Ke
Y.getstock (). Hashcode () & integer.max_value)% Numpartitions; } public void setconf (configuration conf) {//TODO auto-generated method stubs} public Configuration getconf ()
{ TODO auto-generated method stub return null; }} public static class Compositekeycomparator extends Writablecomparator {public compositekeycomparator () {su
Per (Compositekey.class, true);
} public int Compare (writablecomparable A, writablecomparable b) {Compositekey A1 = (compositekey) A;
Compositekey B1 = (compositekey) b;
int compare = A1.getstock (). CompareTo (B1.getstock ());
if (compare! = 0) {return compare;
} else {return A1.gettime (). CompareTo (B1.gettime ());
}}} public static class Compositekey implements Writablecomparable<compositekey> {private String stock;
Private String time;
public void Setstock (String stock) {this.stock = stock;
} public String Getstock () {return this.stock;
public void SetTime (String time) {this.time = time;
} public String GetTime () {return time; public void Write (DataOutput out) throws IOException {//TODO auto-generated Method StUB Out.writeutf (This.getstock ());
Out.writeutf (This.gettime ()); } public void ReadFields (Datainput in) throws IOException {//TODO auto-generated method Stub stock = In.readut
F ();
Time = In.readutf ();
} public String toString () {return stock + "," + time;
} public int compareTo (Compositekey o) {//TODO auto-generated method stub return 0; }} public static class Definedgroupsort extends Writablecomparator {protected definedgroupsort () {Super (compo
Sitekey.class, True);
} @Override public int compare (writablecomparable A, writablecomparable b) {Compositekey A1 = (compositekey) A;
Compositekey B1 = (compositekey) b;
Return A1.getstock (). CompareTo (B1.getstock ()); }} public static class Intsumreducer extends Reducer<compositekey, text, text, doublewritable> {//private
int windowsize = 3;
Private double result = 0.0; public void reduce (Compositekey key, iterable<text> value, Context Context) throws IOException, interruptedexception {simplemovingaverage SMG = new Simplemovingaverage (); for (Text values:value) {//each value is added to the queue and then the moving average is calculated, and the smg.addnewnumber is returned directly (Integer.parseint (values
. toString ()));
result = Smg.getmovingaverage ();
Context.write (New Text (Key.getstock ()), new doublewritable (result));
}//Context.write (New Text (Key.getstock ()), new doublewritable (result));
}}//Implement a moving average algorithm that implements public static class Simplemovingaverage {private double sum = 0.0 through the queue;
private int period = 3;
Private final queue<double> window = new linkedlist<double> ();
public void Addnewnumber (double number) {sum + = number;
Window.add (number);
if (Window.size () > Period) {sum-=window.remove ();
}} public double Getmovingaverage () {if (Window.isempty ()) {throw new IllegalArgumentException ("undefined");
} return Sum/window.size (); }} public STAtic void Main (string[] args) throws IOException, ClassNotFoundException, interruptedexception {Configuration conf =
New Configuration ();
Job Job = job.getinstance (conf, "movingaverage");
Job.setjarbyclass (Movingaverage.class);
Job.setmapperclass (Tokenizermapper.class);
Job.setmapoutputkeyclass (Compositekey.class);
Job.setmapoutputvalueclass (Text.class);
Job.setreducerclass (Intsumreducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Doublewritable.class);
Job.setpartitionerclass (Twopartitions.class);
Job.setsortcomparatorclass (Compositekeycomparator.class);
Job.setgroupingcomparatorclass (Definedgroupsort.class);
Job.setnumreducetasks (1);
Fileinputformat.addinputpath (Job, New Path (Args[0]));
Fileoutputformat.setoutputpath (Job, New Path (Args[1]));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
The results are as follows:
Dog 19.0
dog 24.0
dog 29.0
dog 39.0
dog 49.0
dog 59.0
Dog 72.33333333333333
Dog 85.66666666666667
gold 89.0
gold 139.0
Gold 189.0
Gold 289.0
gold 389.0
gold 489.0
gold 589.0
Gold 689.0
Gold 789.0
Gold 889.0