Hadoop reading notes (14) TOPK algorithm in MapReduce (Top100 algorithm)

Source: Internet
Author: User

Hadoop Reading Notes series article:http://blog.csdn.net/caicongyang/article/category/2166855 (series of articles will be gradually trimmed to complete, add data file format expected related comments)

1. Description:

From the given file, find the maximum of 100 values, given the data file format as follows:

5331656517800292911374982668522067918224212228227533691229525338221001067312284316342740518015 ...

2. Use the TreeMap class in the code below, so write a demo first

Treemapdemo.java

Package Suanfa;import Java.util.map.entry;import Java.util.treemap;public class Treemapdemo {public static void main ( String[] args) {treemap<long, long> tree = new Treemap<long, long> () tree.put (1333333L, 1333333L); Tree.put ( 1222222L, 1222222L), Tree.put (1555555L, 1555555L), Tree.put (1444444L, 1444444L); for (Entry<long, long> Entry: Tree.entryset ()) {System.out.println (Entry.getkey () + ":" +entry.getvalue ());} System.out.println (Tree.firstentry (). GetValue ()); Minimum value System.out.println (Tree.lastentry (). GetValue ()); The maximum value is System.out.println (Tree.navigablekeyset ());//small to large positive sequence key set System.out.println (Tree.descendingkeyset ());// From large to small in reverse key collection}}

3.MapReduce Code

Topkaapp.java

Package Suanfa;import Java.io.ioexception;import Java.net.uri;import java.util.treemap;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.nullwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.mapreduce.lib.output.textoutputformat;import org.apache.hadoop.mapreduce.lib.partition.hashpartitioner;/** * * <p> * Title:TopKAapp.java Package SUANFA * < /p> * <p> * Description: Find the largest 100 numbers from 1000w data * <p> * * @author Tom.cai * @created 2014-12-10 10:56:44 * @version V1.0 * */public class Topkaapp {private static final String Input_path = "Hdfs://192.168.80.100:9000/topk_inpUT ";p rivate static final String Out_path =" Hdfs://192.168.80.100:9000/topk_out ";p ublic static void Main (string[] args) t Hrows Exception {Configuration conf = new Configuration (); final FileSystem FileSystem = Filesystem.get (New URI (Input_path ), conf), final path Outpath = new Path (Out_path), if (Filesystem.exists (Outpath)) {Filesystem.delete (Outpath, True);} Final Job Job = new Job (conf, TopKAapp.class.getSimpleName ()); Fileinputformat.setinputpaths (Job, Input_path); Job.setmapperclass (Mymapper.class); Job.setpartitionerclass ( Hashpartitioner.class); job.setnumreducetasks (1); Job.setreducerclass (Myreducer.class); Job.setOutputKeyClass ( Nullwritable.class); Job.setoutputvalueclass (Longwritable.class); Fileoutputformat.setoutputpath (Job, New Path (Out_path)); Job.setoutputformatclass (Textoutputformat.class); Job.waitforcompletion (TRUE);} Static class Mymapper extends Mapper<longwritable, Text, nullwritable, longwritable> {public static final int K = 10 0;private Treemap<long, long> tree = newTreemap<long, long> ();p ublic void map (longwritable key, text text, context context) throws IOException, interrupted Exception {Long temp = Long.parselong (text.tostring ()); Tree.put (temp, temp); if (Tree.size () > K) tree.remove ( Tree.firstkey ());} @Overrideprotected void Cleanup (context context) throws IOException, interruptedexception {for (Long text:tree.values () {Context.write (Nullwritable.get (), new longwritable (text));}}} Static class Myreducer extends Reducer<nullwritable, longwritable, nullwritable, longwritable> {public static Final int K = 100;private Treemap<long, long> tree = new Treemap<long, long> (), @Overrideprotected void cleanup (context context) throws IOException, interruptedexception {for (Long Val:tree.descendingKeySet ()) {Context.write ( Nullwritable.get (), New Longwritable (Val));}} @Overrideprotected void reduce (nullwritable key, iterable<longwritable> values, context context) throws IOException, interruptedexception {for (longwritable ValUe:values) {tree.put (Value.get (), Value.get ()), if (Tree.size () > K) tree.remove (Tree.firstkey ());}}} 

Welcome everybody to discuss the study together! Useful Self-collection!

Record and share, let you and I grow together!

Welcome to see my other blogs;

My personal blog: http://blog.caicongyang.com;

My csdn Blog address: Http://blog.csdn.net/caicongyang ;



Hadoop reading notes (14) TOPK algorithm in MapReduce (Top100 algorithm)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.