Requirement: Sort the data in the file.
Sample: Sort.log
10
13
10
20
Output: 1 10
2 10
3 13
4 20
Analysis section:
Mapper Analysis:
1, <K1,V1>K1 representative: Line position number, V1 representative: one row of data
2, <K2,V2>K2 representative: A row of data, V2 representative: here is 1.
Reduce analysis:
3, <K3,V3>K3 representative: The same KEY,V3 representative:list<int>
4, Combined output: <k4,v4>k4: Increment number, V4: Key value.
Program section:
Sortmapper class:
PackageCom.cn.sort;Importjava.io.IOException;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Mapper; Public classSortmapperextendsMapper<object, Text, intwritable, intwritable>{String line=NULL; @Overrideprotected voidmap (Object key, Text value, context context)throwsIOException, interruptedexception { line=value.tostring (); intLinevalue =Integer.parseint (line); Context.write (NewIntwritable (Linevalue),NewIntwritable (1)); }}
Sortreduce class
PackageCom.cn.sort;Importjava.io.IOException;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.mapreduce.Reducer; Public classSortreduceextendsReducer<intwritable, Intwritable, intwritable, intwritable>{intwritable LineNum=NewIntwritable (1); @Overrideprotected voidReduce (intwritable key, iterable<intwritable>values,context Context)throwsIOException, interruptedexception { for(intwritable value:values) {context.write (linenum, key); LineNum=NewIntwritable (Linenum.get () +1); } }}
Datasort class
PackageCom.cn.sort;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.util.GenericOptionsParser;/*** Data Sorting *@authorRoot **/ Public classDatasort { Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); String[] Otherargs=Newgenericoptionsparser (conf, args). Getremainingargs (); if(Otherargs.length! = 2) {System.err.println ("Usage:datasort"); System.exit (2); } Job Job=NewJob (conf, "Data Sort"); Job.setjarbyclass (Datasort.class); //set the input and output file directoryFileinputformat.addinputpath (Job,NewPath (otherargs[0])); Fileoutputformat.setoutputpath (Job,NewPath (otherargs[1])); //set mapper and reduce processing logic classesJob.setmapperclass (Sortmapper.class); Job.setreducerclass (sortreduce.class); //Setting the output Key-value typeJob.setoutputkeyclass (intwritable.class); Job.setoutputvalueclass (intwritable.class); //submit the job and wait for it to completeSystem.exit (Job.waitforcompletion (true) ? 0:1); }}
It's okay to summarize the code you've written.
Datasort of the Hadoop program MapReduce