Hadoop Reading Notes series article:http://blog.csdn.net/caicongyang/article/category/2166855 (series of articles will be gradually trimmed to complete, add data file format expected related comments)
1. Description:
From the given file, find the maximum of 100 values, given the data file format as follows:
5331656517800292911374982668522067918224212228227533691229525338221001067312284316342740518015 ...
2. Use the TreeMap class in the code below, so write a demo first
Treemapdemo.java
Package Suanfa;import Java.util.map.entry;import Java.util.treemap;public class Treemapdemo {public static void main ( String[] args) {treemap<long, long> tree = new Treemap<long, long> () tree.put (1333333L, 1333333L); Tree.put ( 1222222L, 1222222L), Tree.put (1555555L, 1555555L), Tree.put (1444444L, 1444444L); for (Entry<long, long> Entry: Tree.entryset ()) {System.out.println (Entry.getkey () + ":" +entry.getvalue ());} System.out.println (Tree.firstentry (). GetValue ()); Minimum value System.out.println (Tree.lastentry (). GetValue ()); The maximum value is System.out.println (Tree.navigablekeyset ());//small to large positive sequence key set System.out.println (Tree.descendingkeyset ());// From large to small in reverse key collection}}
3.MapReduce Code
Topkaapp.java
Package Suanfa;import Java.io.ioexception;import Java.net.uri;import java.util.treemap;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.nullwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.mapreduce.lib.output.textoutputformat;import org.apache.hadoop.mapreduce.lib.partition.hashpartitioner;/** * * <p> * Title:TopKAapp.java Package SUANFA * < /p> * <p> * Description: Find the largest 100 numbers from 1000w data * <p> * * @author Tom.cai * @created 2014-12-10 10:56:44 * @version V1.0 * */public class Topkaapp {private static final String Input_path = "Hdfs://192.168.80.100:9000/topk_inpUT ";p rivate static final String Out_path =" Hdfs://192.168.80.100:9000/topk_out ";p ublic static void Main (string[] args) t Hrows Exception {Configuration conf = new Configuration (); final FileSystem FileSystem = Filesystem.get (New URI (Input_path ), conf), final path Outpath = new Path (Out_path), if (Filesystem.exists (Outpath)) {Filesystem.delete (Outpath, True);} Final Job Job = new Job (conf, TopKAapp.class.getSimpleName ()); Fileinputformat.setinputpaths (Job, Input_path); Job.setmapperclass (Mymapper.class); Job.setpartitionerclass ( Hashpartitioner.class); job.setnumreducetasks (1); Job.setreducerclass (Myreducer.class); Job.setOutputKeyClass ( Nullwritable.class); Job.setoutputvalueclass (Longwritable.class); Fileoutputformat.setoutputpath (Job, New Path (Out_path)); Job.setoutputformatclass (Textoutputformat.class); Job.waitforcompletion (TRUE);} Static class Mymapper extends Mapper<longwritable, Text, nullwritable, longwritable> {public static final int K = 10 0;private Treemap<long, long> tree = newTreemap<long, long> ();p ublic void map (longwritable key, text text, context context) throws IOException, interrupted Exception {Long temp = Long.parselong (text.tostring ()); Tree.put (temp, temp); if (Tree.size () > K) tree.remove ( Tree.firstkey ());} @Overrideprotected void Cleanup (context context) throws IOException, interruptedexception {for (Long text:tree.values () {Context.write (Nullwritable.get (), new longwritable (text));}}} Static class Myreducer extends Reducer<nullwritable, longwritable, nullwritable, longwritable> {public static Final int K = 100;private Treemap<long, long> tree = new Treemap<long, long> (), @Overrideprotected void cleanup (context context) throws IOException, interruptedexception {for (Long Val:tree.descendingKeySet ()) {Context.write ( Nullwritable.get (), New Longwritable (Val));}} @Overrideprotected void reduce (nullwritable key, iterable<longwritable> values, context context) throws IOException, interruptedexception {for (longwritable ValUe:values) {tree.put (Value.get (), Value.get ()), if (Tree.size () > K) tree.remove (Tree.firstkey ());}}}
Welcome everybody to discuss the study together! Useful Self-collection!
Record and share, let you and I grow together!
Welcome to see my other blogs;
My personal blog: http://blog.caicongyang.com;
My csdn Blog address: Http://blog.csdn.net/caicongyang ;
Hadoop reading notes (14) TOPK algorithm in MapReduce (Top100 algorithm)