Using Hadoop to implement associated commodity statistics

Last Update:2014-12-22 Source: Internet

Author: User

Keywords nbsp commodity value realization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; Reprint please indicate the source: http://blog.csdn.net/xiaojimanman/article/details/40184581

I've been reading Hadoop-related books for the last few days, and I'm feeling a little bit at the moment, and I've compiled a statistical-related product myself, modelled on the WordCount program.

Requirements Description:

According to the supermarket sales list, calculate the degree of association between the goods (that is, the number of buy A and b goods at the same time).

Data format:

The supermarket sales list is simplified to the following format: One line represents a list, each item is separated by "," as shown in the following figure:

Requirements Analysis:

The requirement is computed using the MapReduce in Hadoop.

The map function mainly splits the associated goods, the output is the key for the commodity a,value as product B, for the first three results of the split results as shown in the following figure:

Here in order to statistics and A, b two items want to be associated with the goods, so the product A, b relationship between the output of two results that is B, B.

The reduce function is divided into commodity a-related merchandise, that is, the number of occurrences of each commodity in value, the output of the key for the commodity a| commodity b,value for the number of times the combination appears. For the 5 records mentioned above, we analyze the key value of r in the map output:

By processing the map function, the records shown in the following figure are obtained:

In reduce, the value of the map output is grouped together, and the resulting results are shown in the following figure

The product a B as a key, the number of combinations as value output, the output of the following figure shows:

To the requirements of the implementation of the process of analysis to the end of the current, the following look at the specific code implementation

Code implementation:

About the code does not do a detailed introduction, specific reference to the comments in the code.

[Java] View plaincopypackage com;    import java.io.ioexception;  import  java.util.hashmap;  import java.util.map.entry;    import  org.apache.hadoop.conf.configuration;  import org.apache.hadoop.conf.configured;  import  org.apache.hadoop.fs.path;  import org.apache.hadoop.io.intwritable;  import  org.apache.hadoop.io.longwritable;  import org.apache.hadoop.io.text;  import  org.apache.hadoop.mapreduce.job;  import org.apache.hadoop.mapreduce.mapper;  import  org.apache.hadoop.mapreduce.reducer;  import org.apache.hadoop.mapreduce.lib.input.fileinputformat;   import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;  import  org.apache.hadoop.mapreduce.lib.output.textoutputformat;  import org.apache.hadoop.util.tool;  import org.apache.hadoop.util.toolrunner;    PUBLIC&NBSP;CLASS&NBSP;TEST&NBSP;extends configured implements tool{       /**      *  Map class, implementing data preprocessing       *  output key for commodity a value for related products b      *  @author  lulei      */      Public static class mapt extends  Mapper<LongWritable, Text, Text, Text> {          Public  void map (Longwritable key, text value, context context)  throws  ioexception, interruptedexception{              string line =  Value.tostring ();              if  ( line == null | |   ". Equals (line))  {                 //split product                   String []vs.nbsp;= line.split (",");                 //22 combination, form a record                 & nbsp; for  (int i = 0; i <  (vs.length - 1);  i++)  {                      if  ("". Equals (Vs[i))  {//excluding NULL records                           continue;                     }                      for  (int j = i+1; j < vs.length; j++)  {                          if  ("". Equals (Vs[j))  {                              continue;                         }                         //output result   & nbsp;                       Context.write (New text (vs[i)),  new  text (Vs[j]);                          Context.write (New text (Vs[j]),  new text (vs[i));                     }                 }                        }     }           /**      * reduce class, implementation data Count       *  output key  for commodity a| B value for this association       *  @author  lulei      */      Public static class reducet extends reducer<text, text, text, intwritable> {          Private  int count;                   /**          *  initialization           */          Public void  setup (Context context)  {             //Get minimum number of records from parameters               string countstr = context.getconfiguration (). Get ("Count");              try {                  This.count  = integer.parseint (COUNTSTR);             } catch  (Exception  e)  {                  this.count = 0;             }        }          Public void reduce (text key, iterable<text > values, context context)  throws IOException, InterruptedException{              string keystr = key.tostring ();              Hashmap<string, integer> hashmap = new hashmap<string,  Integer> ();             //Use hash statistics B product times               for  (text value : values)  {                  string valuestr = value.tostring ();                  if  (Hashmap.containskey (VALUESTR))  {                      Hashmap.put (Valuestr, hashmap.get (VALUESTR)  + 1) &nbSp;                } else {                      Hashmap.put (valuestr, 1);                 }              }             //output result               for  (Entry<string, integer> entry : hashmap.entryset ())  {                  if  (Entry.getvalue ()  >=  This.count)  {//only output times not less than minimum                       Context.write (New text (keystr +  "|")  + entry.getkey ()),  new intwritable (Entry.getvalue ());                 }             }         }    }            @Override       Public int run (string[]  arg0)  throws Exception {         // todo auto-generated  method stub          configuration conf = getconf ();          Conf.set ("Count",  arg0[2]);                    JOB  job = new job (conf);          Job.setjobname ("Jobtest");                    Job.setoutputformatclass (textoutputformat.class);          Job.setoutputkeyclass (text.class);          Job.setoutputvalueclass ( Text.class);                    Job.setmapperclass (Mapt.class);          Job.setreducerclass (Reducet.class);                    Fileinputformat.addinputpath (job, new  Path (Arg0[0]));          Fileoutputformat.setoutputpath (Job, new path (arg0[1));                     Job.waitforcompletion (true);                    return job.issuccessful ()  ? 0 : 1;               }           /**      *   @param  args      */      Public static void main ( String[] args)  {         // todo auto-generated method stub           if  (args.length != 3)  {              System.exit ( -1);         }         try {              int res =  Toolrunner.run (New configuration (),  new test (),  args);              System.exit (RES);         } catch  (exception e)  {             // todo auto-generated catch block              E.printstacktrace ();         }     }   } 

Upload run:

Package the program into a jar file and upload it to the cluster. The test data is also uploaded to the HDFs Distributed file system.

Command run screenshot as shown in the following illustration:

View the appropriate HDFs file system after the run is completed, as shown in the following illustration:

To this a complete MapReduce program is completed, on the learning of Hadoop, they will continue to ~

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More