MapReduce Analytics star Weibo data

Source: Internet
Author: User

The advent of the Internet era has made the image of celebrities more alive and closer to the distance between stars and fans. Celebrities such as singers, movie stars, sports stars, and writers can easily interact with fans through the Internet, making money easier than ever before. At the same time, the rapid development of the Internet itself has created a number of Internet stars, these people with the help of new means to maximize the energy and role of the fan economy, in the internet era to make a full pot.

It is based on such a big background that today we do a small project to analyze the star Weibo data

1. Project Requirements

custom input format, after the star microblogging data sorted by the number of followers of the number of micro-bo number of digital output to different files.

2. Data set

Star star Weibo name fans number of followers focus on the number of micro-bo

Yu Yu 10591367 206 558

Lee Min Ho Lee min Ho 22898071 11 268

The forest heart is like a forest heart 57488649 214 5940

Huang Xiaoming Huang xiaoming 22616497 506 2011

Jane Zhang Jane Zhang 27878708 238 3846

Li Na 23309493 81 631

Xu Xiaoping Xu Xiaoping 11659926 1929 13795

RMB RMB 24301532 200 2391

There is a Netherfield king 8779383 577 4251

3. Analysis

Custom InputFormat read star Weibo data and sort the star's fan, followers, microblogs data by customizing the Getsortedhashtablebyvalue method, Then use multipleoutputs to output different items to different files.

4. Realize

  1, define Weibo entity class, implement Writablecomparable interface

Package Com.buaa;import Java.io.datainput;import Java.io.dataoutput;import java.io.ioexception;import org.apache.hadoop.io.writablecomparable;/** * @ProjectName microblogstar* @PackageName com.buaa* @ClassName weibo* @ Description todo* @Author Liu Jishu * @Date 2016-05-07 14:54:29*/public class WeiBo implements Writablecomparable<object> {//fan private int fan;//concern private int followers;//microblog number private int microblogs;public WeiBo () {};p ublic WeiBo (int fan,int fo Llowers,int microblogs) {This.fan = Fan;this.followers = Followers;this.microblogs = microblogs;} public void Set (int fan,int followers,int microblogs) {This.fan = Fan;this.followers = Followers;this.microblogs = Microbl OGs;} Implement the Writablecomparable ReadFields () method so that the data can be serialized after the network transfer or file input @overridepublic void ReadFields (Datainput in) throws IOException {fan = In.readint (); followers = In.readint (); microblogs = In.readint ();} Implement the Write () method of writablecomparable so that the data can be serialized after the network transfer or file output @Overridepublic void write (DataOutput out) throws IOException {out.writeint (fan); Out.writeint (followers); Out.writeint (microblogs);} @Overridepublic int compareTo (Object o) {//TODO auto-generated method Stubreturn 0;} public int Getfan () {return fan;} public void Setfan (int fan) {this.fan = fan;} public int getfollowers () {return followers;} public void setfollowers (int followers) {this.followers = followers;} public int getmicroblogs () {return microblogs;} public void setmicroblogs (int microblogs) {this.microblogs = microblogs;}}

  2, Custom Weiboinputformat, inheriting Fileinputformat abstract class

Package Com.buaa;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.fsdatainputstream;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.inputsplit;import Org.apache.hadoop.mapreduce.recordreader;import Org.apache.hadoop.mapreduce.taskattemptcontext;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.input.FileSplit; Import org.apache.hadoop.util.linereader;/** * @ProjectName microblogstar* @PackageName com.buaa* @ClassName weiboinputformat* @Description todo* @Author Liu Jishu * @Date 2016-05-07 10:23:28*/public class Weiboinputformat extends Filein putformat<text,weibo>{@Override public Recordreader<text, weibo> Createrecordreader (InputSplit arg0,  Taskattemptcontext arg1) throws IOException, interruptedexception {//Custom Weiborecordreader class, read return by row New Weiborecordreader ();     } public class Weiborecordreader extends Recordreader<text, weibo>{public linereader in;    Declares the key type public text Linekey = new text ();        Declares the value type public WeiBo linevalue = new WeiBo ();  @Override public void Initialize (Inputsplit input, Taskattemptcontext context) throws IOException, Interruptedexception    {//Get split filesplit split = (filesplit) input;    Get config configuration job = context.getconfiguration ();        Shard path, path file = Split.getpath ();     FileSystem fs = File.getfilesystem (Job);        Open file Fsdatainputstream Filein = fs.open (file);     in = new Linereader (filein,job); } @Override public Boolean nextkeyvalue () throws IOException, interruptedexception {//row of data Text line = new T        Ext ();        int linesize = In.readline (line);         if (linesize = = 0) return false;        by delimiter ' \ t ', the data for each row is parsed into the array string[] pieces = line.tostring (). Split ("\ t"); if (pieces.length! = 5) {throW New IOException ("Invalid Record received");    } int a,b,c;    try{//Fan A = Integer.parseint (Pieces[2].trim ());    Attention B = Integer.parseint (Pieces[3].trim ());    Number of Weibo C = Integer.parseint (Pieces[4].trim ());      }catch (NumberFormatException nfe) {throw new IOException ("Error parsing floating poing value in record");      }//Custom key and Value Linekey.set (Pieces[0]);        Linevalue.set (A, B, c);    return true;    } @Override public void Close () throws IOException {if (in = null) {in.close ();    }} @Override public Text Getcurrentkey () throws IOException, interruptedexception {return linekey;    } @Override Public WeiBo GetCurrentValue () throws IOException, interruptedexception {return linevalue;    } @Override Public float getprogress () throws IOException, interruptedexception {return 0; }        }}

  3, writing Mr Program

Package Com.buaa;import Java.io.ioexception;import Java.util.arrays;import java.util.comparator;import Java.util.hashmap;import Java.util.map;import Java.util.map.entry;import org.apache.hadoop.conf.Configuration; Import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Job; Import Org.apache.hadoop.mapreduce.mapper;import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.mapreduce.lib.output.lazyoutputformat;import Org.apache.hadoop.mapreduce.lib.output.multipleoutputs;import Org.apache.hadoop.mapreduce.lib.output.textoutputformat;import Org.apache.hadoop.util.tool;import org.apache.hadoop.util.toolrunner;/** * @ProjectName microblogstar* @PackageName com.buaa* @ClassName weibocount* @ DescriptIon todo* @Author Liu Jishu * @Date 2016-05-07 09:07:36*/public class Weibocount extends configured implements Tool {//tab delimiter PRI vate static string tab_separator = "\ t";//fan private static string fan = "fan";//focus on private static string followers = "fol Lowers "///Weibo number private static String microblogs =" microblogs ";p ublic static class Weibomapper extends Mapper<text, Weib o, text, text> {@Overrideprotected void map (text key, WeiBo value, Context context) throws IOException, Interruptedexce ption {//fan Context.write (new text (fan), new text (key.tostring () + Tab_separator + Value.getfan ()));//Focus on Context.write ( New text (followers), new text (key.tostring () + Tab_separator + value.getfollowers ()));//Weibo number Context.write (new text ( microblogs), New Text (key.tostring () + Tab_separator + value.getmicroblogs ()));}} public static class Weiboreducer extends Reducer<text, text, text, intwritable> {private Multipleoutputs<text, I ntwritable> mos;protected void Setup (context context) throws IOException, interruptedexception {mos = new Multipleoutputs<text, intwritable> (context);} protected void reduce (Text Key, iterable<text> values,context Context) throws IOException, Interruptedexception {  map<string,integer> map = new hashmap< string,integer> (); for (Text value:values) {//value = name + (number of followers or followers or Weibo) string[] records = Value.tostring (). Split (Tab_separator); Map.put (Records[0], Integer.parseint (records[1). ToString ()));} Sort the data within the map map.entry<string, integer>[] entries = getsortedhashtablebyvalue (map), for (int i = 0; i < entries.               length;i++) {mos.write (key.tostring (), Entries[i].getkey (), Entries[i].getvalue ());} }protected void Cleanup (context context) throws IOException, interruptedexception {mos.close ();}} @SuppressWarnings ("deprecation") @Overridepublic int run (string[] args) throws Exception {//config file object configuration conf = New Configuration ();//Determine if the path exists, if present, delete path MyPath = new Path (args[1]); FileSystem HDFs = Mypath.getfilesystem (conf);if (Hdfs.isdirectory (MyPath)) {Hdfs.delete (MyPath, True);} Construct task Job Job = new Job (conf, "Weibo");//Main class Job.setjarbyclass (Weibocount.class);//Mapperjob.setmapperclass ( Weibomapper.class);//Mapper Key Output Type Job.setmapoutputkeyclass (Text.class);//Mapper        Value output Type Job.setmapoutputvalueclass (Text.class); Reducerjob.setreducerclass (Weiboreducer.class);//Reducer Key Output Type Job.setoutputkeyclass (Text.class);//Reducer Value output Type Job.setoutputvalueclass (Intwritable.class);//input path Fileinputformat.addinputpath (Job, New Path (args[0])); /Output path Fileoutputformat.setoutputpath (Job, New Path (ARGS[1)),//Custom input format Job.setinputformatclass ( Weiboinputformat.class)///Custom file output category Multipleoutputs.addnamedoutput (Job, FAN, Textoutputformat.class, Text.class, Intwritable.class); Multipleoutputs.addnamedoutput (Job, followers, Textoutputformat.class, Text.class, Intwritable.class); Multipleoutputs.addnamedoutput (Job, microblogs, Textoutputformat.class, Text.class, intwritable.class);// Remove the job settings Outputformatclass, instead of using the lazyOutputFormat set Lazyoutputformat.setoutputformatclass (Job, Textoutputformat.class); Commit Task return Job.waitforcompletion (true)? 0:1;} Sort the data within the map (only for small data volumes) @SuppressWarnings ("unchecked") public static entry<string, integer>[] Getsortedhashtablebyvalue (map<string, integer> h) {entry<string, integer>[] entries = (Entry<String,  Integer>[]) H.entryset (). ToArray (new entry[0]); Sort Arrays.sort (entries, new comparator<entry<string, integer>> () {public int compare (entry<string, Integer> Entry1, entry<string, integer> entry2) {return Entry2.getvalue (). CompareTo (Entry1.getvalue ());}});  return entries; }public static void Main (string[] args) throws Exception {string[] args0 = {"Hdfs://ljc:9000/buaa/microblog/weibo.txt", " hdfs://ljc:9000/buaa/microblog/out/"};int EC = Toolrunner.run (New Configuration (), New Weibocount (), ARGS0); System.exit (EC);}}

5. Operation Result

This article is copyrighted by the author and Csdn, welcome reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.

Implementing code and data: Downloading

MapReduce Analytics star Weibo data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.