Java uses Hadoop to implement associated commodity statistics

Java uses Hadoop to implement associated commodity statistics _java

Last Update:2017-01-18 Source: Internet

Author: User

Tags map class static class stub

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I've been reading Hadoop-related books for the last few days, and I'm feeling a little bit at the moment, and I've compiled a statistical-related product myself, modelled on the WordCount program.

Requirements Description:

According to the supermarket sales list, calculate the degree of association between the goods (that is, the number of buy A and b goods at the same time).

Data format:

The supermarket sales list is simplified to the following format: One line represents a list, each item is separated by "," as shown in the following figure:

Requirements Analysis:

The requirement is computed using the MapReduce in Hadoop.

The map function mainly splits the associated goods, the output is the key for the commodity a,value as product B, for the first three results of the split results as shown in the following figure:

Here in order to statistics and A, b two items want to be associated with the goods, so the product A, b relationship between the output of two results that is a-b, b-a.

The reduce function is divided into commodity a-related merchandise, that is, the number of occurrences of each commodity in value, the output of the key for the commodity a| commodity b,value for the number of times the combination occurs. For the 5 records mentioned above, we analyze the key value of r in the map output:

By processing the map function, the records shown in the following figure are obtained:

In reduce, the value of the map output is grouped together, and the resulting results are shown in the following figure

The product a B as key, the number of combinations as the value output, the output of the following figure shows:

To the requirements of the implementation of the process of analysis to the end of the current, the following look at the specific code implementation

Code implementation:

About the code does not do a detailed introduction, specific reference to the comments in the code.

Package com; 
Import java.io.IOException; 
Import Java.util.HashMap; 
 
Import Java.util.Map.Entry; 
Import org.apache.hadoop.conf.Configuration; 
Import org.apache.hadoop.conf.Configured; 
Import Org.apache.hadoop.fs.Path; 
Import org.apache.hadoop.io.IntWritable; 
Import org.apache.hadoop.io.LongWritable; 
Import Org.apache.hadoop.io.Text; 
Import Org.apache.hadoop.mapreduce.Job; 
Import Org.apache.hadoop.mapreduce.Mapper; 
Import Org.apache.hadoop.mapreduce.Reducer; 
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
Import Org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
Import Org.apache.hadoop.util.Tool; 
 
Import Org.apache.hadoop.util.ToolRunner; public class Test extends configured implements tool{/** * Map class, implementation of data preprocessing * Output key for commodity a value for the associated product B * @au Thor Lulei */public static class MapT extends Mapper<longwritable, text, text, text> {public void map ( LonGwritable key, Text value, Context context) throws IOException, interruptedexception{String line = value.tostring ( 
      ); if (!) ( line = = NULL | | 
        '. Equals (line)) {//Split commodity String []VS = Line.split (","); 
            22 combinations, constituting a record for (int i = 0; i < (vs.length-1); i++) {if ("". Equals (Vs[i])) {//excluding blank records 
          Continue 
            for (int j = i+1 J < Vs.length; J + +) {if ("". Equals (Vs[j])) {continue; 
            }//Output result Context.write (new text (Vs[i]), new text (VS[J)); 
          Context.write (new text (Vs[j]), new text (vs[i)); /** * Reduce class, implementing count of data * Output key for commodity a| B value is the association number * @author Lulei/public static class Reducet extends Reducer<text, text, text, INTWRITABLE&G T 
     
    {private int count; /** * Initialization/public void Setup (context context) {//from parameterGets the minimum number of records String COUNTSTR = Context.getconfiguration (). Get ("count"); 
      try {this.count = Integer.parseint (COUNTSTR); 
      catch (Exception e) {this.count = 0; } public void reduce (Text key, iterable<text> values, context context) throws IOException, Interruptedex 
      ception{String keystr = key.tostring (); 
      hashmap<string, integer> HashMap = new hashmap<string, integer> (); 
        The number of times to use hash to count B goods for (Text value:values) {String valuestr = value.tostring (); 
        if (Hashmap.containskey (VALUESTR)) {hashmap.put (Valuestr, Hashmap.get (VALUESTR) + 1); 
        else {hashmap.put (valuestr, 1); }///Output The result for (entry<string, integer> entry:hashMap.entrySet ()) {if (entry.getval UE () >= This.count) {//Context.write (new Text (Keystr + "|" + Entry.getkey ()) is the output number not less than the minimum value (E 
  Ntry.getvalue ()));      @Override public int run (string[] arg0) throws Exception {//TODO Auto-genera 
    Ted Method stub Configuration conf = Getconf (); 
     
    Conf.set ("Count", arg0[2]); 
    Job Job = new Job (conf); 
     
    Job.setjobname ("Jobtest"); 
    Job.setoutputformatclass (Textoutputformat.class); 
    Job.setoutputkeyclass (Text.class); 
     
    Job.setoutputvalueclass (Text.class); 
    Job.setmapperclass (Mapt.class); 
     
    Job.setreducerclass (Reducet.class); 
    Fileinputformat.addinputpath (Job, New Path (arg0[0)); 
     
    Fileoutputformat.setoutputpath (Job, New Path (arg0[1)); 
     
    Job.waitforcompletion (TRUE); Return job.issuccessful ()? 
     
  0:1; 
    }/** * @param args */public static void main (string[] args) {//TODO auto-generated method stub 
    if (args.length!= 3) {system.exit (-1); 
      try {int res = Toolrunner.run (new Configuration (), New Test (), args); SystEm.exit (RES); 
    catch (Exception e) {//TODO auto-generated catch block E.printstacktrace (); 
 } 
  } 
 
}

Upload run:

Package the program into a jar file and upload it to the cluster. The test data is also uploaded to the HDFs Distributed file system.

Command run screenshot as shown in the following illustration:

View the appropriate HDFs file system after the run is complete, as shown in the following illustration:

To this a complete MapReduce program is completed, on the learning of Hadoop, I will continue to ~ Thank you for reading, I hope to help you, thank you for your support for this site!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More