The advent of the Internet era has made the image of celebrities more alive and closer to the distance between stars and fans. Celebrities such as singers, movie stars, sports stars, and writers can easily interact with fans through the Internet, making money easier than ever before. At the same time, the rapid development of the Internet itself has created a number of Internet stars, these people with the help of new means to maximize the energy and role of the fan economy, in the internet era to make a full pot.
It is based on such a big background that today we do a small project to analyze the star Weibo data
1. Project Requirements
custom input format, after the star microblogging data sorted by the number of followers of the number of micro-bo number of digital output to different files.
2. Data set
Star star Weibo name fans number of followers focus on the number of micro-bo
Yu Yu 10591367 206 558
Lee Min Ho Lee min Ho 22898071 11 268
The forest heart is like a forest heart 57488649 214 5940
Huang Xiaoming Huang xiaoming 22616497 506 2011
Jane Zhang Jane Zhang 27878708 238 3846
Li Na 23309493 81 631
Xu Xiaoping Xu Xiaoping 11659926 1929 13795
RMB RMB 24301532 200 2391
There is a Netherfield king 8779383 577 4251
3. Analysis
Custom InputFormat read star Weibo data and sort the star's fan, followers, microblogs data by customizing the Getsortedhashtablebyvalue method, Then use multipleoutputs to output different items to different files.
4. Realize
1, define Weibo entity class, implement Writablecomparable interface
Package Com.buaa;import Java.io.datainput;import Java.io.dataoutput;import java.io.ioexception;import org.apache.hadoop.io.writablecomparable;/** * @ProjectName microblogstar* @PackageName com.buaa* @ClassName weibo* @ Description todo* @Author Liu Jishu * @Date 2016-05-07 14:54:29*/public class WeiBo implements Writablecomparable<object> {//fan private int fan;//concern private int followers;//microblog number private int microblogs;public WeiBo () {};p ublic WeiBo (int fan,int fo Llowers,int microblogs) {This.fan = Fan;this.followers = Followers;this.microblogs = microblogs;} public void Set (int fan,int followers,int microblogs) {This.fan = Fan;this.followers = Followers;this.microblogs = Microbl OGs;} Implement the Writablecomparable ReadFields () method so that the data can be serialized after the network transfer or file input @overridepublic void ReadFields (Datainput in) throws IOException {fan = In.readint (); followers = In.readint (); microblogs = In.readint ();} Implement the Write () method of writablecomparable so that the data can be serialized after the network transfer or file output @Overridepublic void write (DataOutput out) throws IOException {out.writeint (fan); Out.writeint (followers); Out.writeint (microblogs);} @Overridepublic int compareTo (Object o) {//TODO auto-generated method Stubreturn 0;} public int Getfan () {return fan;} public void Setfan (int fan) {this.fan = fan;} public int getfollowers () {return followers;} public void setfollowers (int followers) {this.followers = followers;} public int getmicroblogs () {return microblogs;} public void setmicroblogs (int microblogs) {this.microblogs = microblogs;}}
2, Custom Weiboinputformat, inheriting Fileinputformat abstract class
Package Com.buaa;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.fsdatainputstream;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.inputsplit;import Org.apache.hadoop.mapreduce.recordreader;import Org.apache.hadoop.mapreduce.taskattemptcontext;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.input.FileSplit; Import org.apache.hadoop.util.linereader;/** * @ProjectName microblogstar* @PackageName com.buaa* @ClassName weiboinputformat* @Description todo* @Author Liu Jishu * @Date 2016-05-07 10:23:28*/public class Weiboinputformat extends Filein putformat<text,weibo>{@Override public Recordreader<text, weibo> Createrecordreader (InputSplit arg0, Taskattemptcontext arg1) throws IOException, interruptedexception {//Custom Weiborecordreader class, read return by row New Weiborecordreader (); } public class Weiborecordreader extends Recordreader<text, weibo>{public linereader in; Declares the key type public text Linekey = new text (); Declares the value type public WeiBo linevalue = new WeiBo (); @Override public void Initialize (Inputsplit input, Taskattemptcontext context) throws IOException, Interruptedexception {//Get split filesplit split = (filesplit) input; Get config configuration job = context.getconfiguration (); Shard path, path file = Split.getpath (); FileSystem fs = File.getfilesystem (Job); Open file Fsdatainputstream Filein = fs.open (file); in = new Linereader (filein,job); } @Override public Boolean nextkeyvalue () throws IOException, interruptedexception {//row of data Text line = new T Ext (); int linesize = In.readline (line); if (linesize = = 0) return false; by delimiter ' \ t ', the data for each row is parsed into the array string[] pieces = line.tostring (). Split ("\ t"); if (pieces.length! = 5) {throW New IOException ("Invalid Record received"); } int a,b,c; try{//Fan A = Integer.parseint (Pieces[2].trim ()); Attention B = Integer.parseint (Pieces[3].trim ()); Number of Weibo C = Integer.parseint (Pieces[4].trim ()); }catch (NumberFormatException nfe) {throw new IOException ("Error parsing floating poing value in record"); }//Custom key and Value Linekey.set (Pieces[0]); Linevalue.set (A, B, c); return true; } @Override public void Close () throws IOException {if (in = null) {in.close (); }} @Override public Text Getcurrentkey () throws IOException, interruptedexception {return linekey; } @Override Public WeiBo GetCurrentValue () throws IOException, interruptedexception {return linevalue; } @Override Public float getprogress () throws IOException, interruptedexception {return 0; } }}
3, writing Mr Program
Package Com.buaa;import Java.io.ioexception;import Java.util.arrays;import java.util.comparator;import Java.util.hashmap;import Java.util.map;import Java.util.map.entry;import org.apache.hadoop.conf.Configuration; Import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Job; Import Org.apache.hadoop.mapreduce.mapper;import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.mapreduce.lib.output.lazyoutputformat;import Org.apache.hadoop.mapreduce.lib.output.multipleoutputs;import Org.apache.hadoop.mapreduce.lib.output.textoutputformat;import Org.apache.hadoop.util.tool;import org.apache.hadoop.util.toolrunner;/** * @ProjectName microblogstar* @PackageName com.buaa* @ClassName weibocount* @ DescriptIon todo* @Author Liu Jishu * @Date 2016-05-07 09:07:36*/public class Weibocount extends configured implements Tool {//tab delimiter PRI vate static string tab_separator = "\ t";//fan private static string fan = "fan";//focus on private static string followers = "fol Lowers "///Weibo number private static String microblogs =" microblogs ";p ublic static class Weibomapper extends Mapper<text, Weib o, text, text> {@Overrideprotected void map (text key, WeiBo value, Context context) throws IOException, Interruptedexce ption {//fan Context.write (new text (fan), new text (key.tostring () + Tab_separator + Value.getfan ()));//Focus on Context.write ( New text (followers), new text (key.tostring () + Tab_separator + value.getfollowers ()));//Weibo number Context.write (new text ( microblogs), New Text (key.tostring () + Tab_separator + value.getmicroblogs ()));}} public static class Weiboreducer extends Reducer<text, text, text, intwritable> {private Multipleoutputs<text, I ntwritable> mos;protected void Setup (context context) throws IOException, interruptedexception {mos = new Multipleoutputs<text, intwritable> (context);} protected void reduce (Text Key, iterable<text> values,context Context) throws IOException, Interruptedexception { map<string,integer> map = new hashmap< string,integer> (); for (Text value:values) {//value = name + (number of followers or followers or Weibo) string[] records = Value.tostring (). Split (Tab_separator); Map.put (Records[0], Integer.parseint (records[1). ToString ()));} Sort the data within the map map.entry<string, integer>[] entries = getsortedhashtablebyvalue (map), for (int i = 0; i < entries. length;i++) {mos.write (key.tostring (), Entries[i].getkey (), Entries[i].getvalue ());} }protected void Cleanup (context context) throws IOException, interruptedexception {mos.close ();}} @SuppressWarnings ("deprecation") @Overridepublic int run (string[] args) throws Exception {//config file object configuration conf = New Configuration ();//Determine if the path exists, if present, delete path MyPath = new Path (args[1]); FileSystem HDFs = Mypath.getfilesystem (conf);if (Hdfs.isdirectory (MyPath)) {Hdfs.delete (MyPath, True);} Construct task Job Job = new Job (conf, "Weibo");//Main class Job.setjarbyclass (Weibocount.class);//Mapperjob.setmapperclass ( Weibomapper.class);//Mapper Key Output Type Job.setmapoutputkeyclass (Text.class);//Mapper Value output Type Job.setmapoutputvalueclass (Text.class); Reducerjob.setreducerclass (Weiboreducer.class);//Reducer Key Output Type Job.setoutputkeyclass (Text.class);//Reducer Value output Type Job.setoutputvalueclass (Intwritable.class);//input path Fileinputformat.addinputpath (Job, New Path (args[0])); /Output path Fileoutputformat.setoutputpath (Job, New Path (ARGS[1)),//Custom input format Job.setinputformatclass ( Weiboinputformat.class)///Custom file output category Multipleoutputs.addnamedoutput (Job, FAN, Textoutputformat.class, Text.class, Intwritable.class); Multipleoutputs.addnamedoutput (Job, followers, Textoutputformat.class, Text.class, Intwritable.class); Multipleoutputs.addnamedoutput (Job, microblogs, Textoutputformat.class, Text.class, intwritable.class);// Remove the job settings Outputformatclass, instead of using the lazyOutputFormat set Lazyoutputformat.setoutputformatclass (Job, Textoutputformat.class); Commit Task return Job.waitforcompletion (true)? 0:1;} Sort the data within the map (only for small data volumes) @SuppressWarnings ("unchecked") public static entry<string, integer>[] Getsortedhashtablebyvalue (map<string, integer> h) {entry<string, integer>[] entries = (Entry<String, Integer>[]) H.entryset (). ToArray (new entry[0]); Sort Arrays.sort (entries, new comparator<entry<string, integer>> () {public int compare (entry<string, Integer> Entry1, entry<string, integer> entry2) {return Entry2.getvalue (). CompareTo (Entry1.getvalue ());}}); return entries; }public static void Main (string[] args) throws Exception {string[] args0 = {"Hdfs://ljc:9000/buaa/microblog/weibo.txt", " hdfs://ljc:9000/buaa/microblog/out/"};int EC = Toolrunner.run (New Configuration (), New Weibocount (), ARGS0); System.exit (EC);}}
5. Operation Result
This article is copyrighted by the author and Csdn, welcome reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.
Implementing code and data: Downloading
MapReduce Analytics star Weibo data