Using Hadoop to implement associated commodity statistics

Source: Internet
Author: User
Keywords nbsp commodity value realization

&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; Reprint please indicate the source: http://blog.csdn.net/xiaojimanman/article/details/40184581

I've been reading Hadoop-related books for the last few days, and I'm feeling a little bit at the moment, and I've compiled a statistical-related product myself, modelled on the WordCount program.

Requirements Description:

According to the supermarket sales list, calculate the degree of association between the goods (that is, the number of buy A and b goods at the same time).

Data format:

The supermarket sales list is simplified to the following format: One line represents a list, each item is separated by "," as shown in the following figure:

Requirements Analysis:

The requirement is computed using the MapReduce in Hadoop.

The map function mainly splits the associated goods, the output is the key for the commodity a,value as product B, for the first three results of the split results as shown in the following figure:

Here in order to statistics and A, b two items want to be associated with the goods, so the product A, b relationship between the output of two results that is B, B.

The reduce function is divided into commodity a-related merchandise, that is, the number of occurrences of each commodity in value, the output of the key for the commodity a| commodity b,value for the number of times the combination appears. For the 5 records mentioned above, we analyze the key value of r in the map output:

By processing the map function, the records shown in the following figure are obtained:

In reduce, the value of the map output is grouped together, and the resulting results are shown in the following figure

The product a B as a key, the number of combinations as value output, the output of the following figure shows:

To the requirements of the implementation of the process of analysis to the end of the current, the following look at the specific code implementation

Code implementation:

About the code does not do a detailed introduction, specific reference to the comments in the code.

[Java] View plaincopypackage com;    import java.io.ioexception;  import  java.util.hashmap;  import java.util.map.entry;    import  org.apache.hadoop.conf.configuration;  import org.apache.hadoop.conf.configured;  import  org.apache.hadoop.fs.path;  import org.apache.hadoop.io.intwritable;  import  org.apache.hadoop.io.longwritable;  import org.apache.hadoop.io.text;  import  org.apache.hadoop.mapreduce.job;  import org.apache.hadoop.mapreduce.mapper;  import  org.apache.hadoop.mapreduce.reducer;  import org.apache.hadoop.mapreduce.lib.input.fileinputformat;   import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;  import  org.apache.hadoop.mapreduce.lib.output.textoutputformat;  import org.apache.hadoop.util.tool;  import org.apache.hadoop.util.toolrunner;    PUBLIC CLASS TEST extends configured implements tool{       /**      *  Map class, implementing data preprocessing       *  output key for commodity a value for related products b      *  @author  lulei      */      Public static class mapt extends  Mapper<LongWritable, Text, Text, Text> {          Public  void map (Longwritable key, text value, context context)  throws  ioexception, interruptedexception{              string line =  Value.tostring ();              if  ( line == null | |   ". Equals (line))  {                 //split product                   String []vs.nbsp;= line.split (",");                 //22 combination, form a record                 & nbsp; for  (int i = 0; i <  (vs.length - 1);  i++)  {                      if  ("". Equals (Vs[i))  {//excluding NULL records                           continue;                     }                      for  (int j = i+1; j < vs.length; j++)  {                          if  ("". Equals (Vs[j))  {                              continue;                         }                         //output result   & nbsp;                       Context.write (New text (vs[i)),  new  text (Vs[j]);                          Context.write (New text (Vs[j]),  new text (vs[i));                     }                 }                        }     }           /**      * reduce class, implementation data Count       *  output key  for commodity a| B value for this association       *  @author  lulei      */      Public static class reducet extends reducer<text, text, text, intwritable> {          Private  int count;                   /**          *  initialization           */          Public void  setup (Context context)  {             //Get minimum number of records from parameters               string countstr = context.getconfiguration (). Get ("Count");              try {                  This.count  = integer.parseint (COUNTSTR);             } catch  (Exception  e)  {                  this.count = 0;             }        }          Public void reduce (text key, iterable<text > values, context context)  throws IOException, InterruptedException{              string keystr = key.tostring ();              Hashmap<string, integer> hashmap = new hashmap<string,  Integer> ();             //Use hash statistics B product times               for  (text value : values)  {                  string valuestr = value.tostring ();                  if  (Hashmap.containskey (VALUESTR))  {                      Hashmap.put (Valuestr, hashmap.get (VALUESTR)  + 1) &nbSp;                } else {                      Hashmap.put (valuestr, 1);                 }              }             //output result               for  (Entry<string, integer> entry : hashmap.entryset ())  {                  if  (Entry.getvalue ()  >=  This.count)  {//only output times not less than minimum                       Context.write (New text (keystr +  "|")  + entry.getkey ()),  new intwritable (Entry.getvalue ());                 }             }         }    }            @Override       Public int run (string[]  arg0)  throws Exception {         // todo auto-generated  method stub          configuration conf = getconf ();          Conf.set ("Count",  arg0[2]);                    JOB  job = new job (conf);          Job.setjobname ("Jobtest");                    Job.setoutputformatclass (textoutputformat.class);          Job.setoutputkeyclass (text.class);          Job.setoutputvalueclass ( Text.class);                    Job.setmapperclass (Mapt.class);          Job.setreducerclass (Reducet.class);                    Fileinputformat.addinputpath (job, new  Path (Arg0[0]));          Fileoutputformat.setoutputpath (Job, new path (arg0[1));                     Job.waitforcompletion (true);                    return job.issuccessful ()  ? 0 : 1;               }           /**      *   @param  args      */      Public static void main ( String[] args)  {         // todo auto-generated method stub           if  (args.length != 3)  {              System.exit ( -1);         }         try {              int res =  Toolrunner.run (New configuration (),  new test (),  args);              System.exit (RES);         } catch  (exception e)  {             // todo auto-generated catch block              E.printstacktrace ();         }     }   } 


Upload run:

Package the program into a jar file and upload it to the cluster. The test data is also uploaded to the HDFs Distributed file system.

Command run screenshot as shown in the following illustration:

View the appropriate HDFs file system after the run is completed, as shown in the following illustration:

To this a complete MapReduce program is completed, on the learning of Hadoop, they will continue to ~

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.