Hadoop reading notes (14) TOPK algorithm in MapReduce (Top100 algorithm)

Last Update:2014-12-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop Reading Notes series article:http://blog.csdn.net/caicongyang/article/category/2166855 (series of articles will be gradually trimmed to complete, add data file format expected related comments)

1. Description:

From the given file, find the maximum of 100 values, given the data file format as follows:

5331656517800292911374982668522067918224212228227533691229525338221001067312284316342740518015 ...

2. Use the TreeMap class in the code below, so write a demo first

Treemapdemo.java

Package Suanfa;import Java.util.map.entry;import Java.util.treemap;public class Treemapdemo {public static void main ( String[] args) {treemap<long, long> tree = new Treemap<long, long> () tree.put (1333333L, 1333333L); Tree.put ( 1222222L, 1222222L), Tree.put (1555555L, 1555555L), Tree.put (1444444L, 1444444L); for (Entry<long, long> Entry: Tree.entryset ()) {System.out.println (Entry.getkey () + ":" +entry.getvalue ());} System.out.println (Tree.firstentry (). GetValue ()); Minimum value System.out.println (Tree.lastentry (). GetValue ()); The maximum value is System.out.println (Tree.navigablekeyset ());//small to large positive sequence key set System.out.println (Tree.descendingkeyset ());// From large to small in reverse key collection}}

3.MapReduce Code

Topkaapp.java

Package Suanfa;import Java.io.ioexception;import Java.net.uri;import java.util.treemap;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.nullwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.mapreduce.lib.output.textoutputformat;import org.apache.hadoop.mapreduce.lib.partition.hashpartitioner;/** * * <p> * Title:TopKAapp.java Package SUANFA * < /p> * <p> * Description: Find the largest 100 numbers from 1000w data * <p> * * @author Tom.cai * @created 2014-12-10 10:56:44 * @version V1.0 * */public class Topkaapp {private static final String Input_path = "Hdfs://192.168.80.100:9000/topk_inpUT ";p rivate static final String Out_path =" Hdfs://192.168.80.100:9000/topk_out ";p ublic static void Main (string[] args) t Hrows Exception {Configuration conf = new Configuration (); final FileSystem FileSystem = Filesystem.get (New URI (Input_path ), conf), final path Outpath = new Path (Out_path), if (Filesystem.exists (Outpath)) {Filesystem.delete (Outpath, True);} Final Job Job = new Job (conf, TopKAapp.class.getSimpleName ()); Fileinputformat.setinputpaths (Job, Input_path); Job.setmapperclass (Mymapper.class); Job.setpartitionerclass ( Hashpartitioner.class); job.setnumreducetasks (1); Job.setreducerclass (Myreducer.class); Job.setOutputKeyClass ( Nullwritable.class); Job.setoutputvalueclass (Longwritable.class); Fileoutputformat.setoutputpath (Job, New Path (Out_path)); Job.setoutputformatclass (Textoutputformat.class); Job.waitforcompletion (TRUE);} Static class Mymapper extends Mapper<longwritable, Text, nullwritable, longwritable> {public static final int K = 10 0;private Treemap<long, long> tree = newTreemap<long, long> ();p ublic void map (longwritable key, text text, context context) throws IOException, interrupted Exception {Long temp = Long.parselong (text.tostring ()); Tree.put (temp, temp); if (Tree.size () > K) tree.remove ( Tree.firstkey ());} @Overrideprotected void Cleanup (context context) throws IOException, interruptedexception {for (Long text:tree.values () {Context.write (Nullwritable.get (), new longwritable (text));}}} Static class Myreducer extends Reducer<nullwritable, longwritable, nullwritable, longwritable> {public static Final int K = 100;private Treemap<long, long> tree = new Treemap<long, long> (), @Overrideprotected void cleanup (context context) throws IOException, interruptedexception {for (Long Val:tree.descendingKeySet ()) {Context.write ( Nullwritable.get (), New Longwritable (Val));}} @Overrideprotected void reduce (nullwritable key, iterable<longwritable> values, context context) throws IOException, interruptedexception {for (longwritable ValUe:values) {tree.put (Value.get (), Value.get ()), if (Tree.size () > K) tree.remove (Tree.firstkey ());}}}

Welcome everybody to discuss the study together! Useful Self-collection!

Record and share, let you and I grow together!

Welcome to see my other blogs;

My personal blog: http://blog.caicongyang.com;

My csdn Blog address: Http://blog.csdn.net/caicongyang ;

Hadoop reading notes (14) TOPK algorithm in MapReduce (Top100 algorithm)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop reading notes (14) TOPK algorithm in MapReduce (Top100 algorithm)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop reading notes (14) TOPK algorithm in MapReduce (Top100 algorithm)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support