Java uses Hadoop to implement associated commodity statistics _java

Source: Internet
Author: User
Tags map class static class stub

I've been reading Hadoop-related books for the last few days, and I'm feeling a little bit at the moment, and I've compiled a statistical-related product myself, modelled on the WordCount program.

Requirements Description:

According to the supermarket sales list, calculate the degree of association between the goods (that is, the number of buy A and b goods at the same time).

Data format:

The supermarket sales list is simplified to the following format: One line represents a list, each item is separated by "," as shown in the following figure:

Requirements Analysis:

The requirement is computed using the MapReduce in Hadoop.

The map function mainly splits the associated goods, the output is the key for the commodity a,value as product B, for the first three results of the split results as shown in the following figure:

Here in order to statistics and A, b two items want to be associated with the goods, so the product A, b relationship between the output of two results that is a-b, b-a.

The reduce function is divided into commodity a-related merchandise, that is, the number of occurrences of each commodity in value, the output of the key for the commodity a| commodity b,value for the number of times the combination occurs. For the 5 records mentioned above, we analyze the key value of r in the map output:

By processing the map function, the records shown in the following figure are obtained:

In reduce, the value of the map output is grouped together, and the resulting results are shown in the following figure

The product a B as key, the number of combinations as the value output, the output of the following figure shows:

To the requirements of the implementation of the process of analysis to the end of the current, the following look at the specific code implementation

Code implementation:

About the code does not do a detailed introduction, specific reference to the comments in the code.

Package com; 
Import java.io.IOException; 
Import Java.util.HashMap; 
 
Import Java.util.Map.Entry; 
Import org.apache.hadoop.conf.Configuration; 
Import org.apache.hadoop.conf.Configured; 
Import Org.apache.hadoop.fs.Path; 
Import org.apache.hadoop.io.IntWritable; 
Import org.apache.hadoop.io.LongWritable; 
Import Org.apache.hadoop.io.Text; 
Import Org.apache.hadoop.mapreduce.Job; 
Import Org.apache.hadoop.mapreduce.Mapper; 
Import Org.apache.hadoop.mapreduce.Reducer; 
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
Import Org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
Import Org.apache.hadoop.util.Tool; 
 
Import Org.apache.hadoop.util.ToolRunner; public class Test extends configured implements tool{/** * Map class, implementation of data preprocessing * Output key for commodity a value for the associated product B * @au Thor Lulei */public static class MapT extends Mapper<longwritable, text, text, text> {public void map ( LonGwritable key, Text value, Context context) throws IOException, interruptedexception{String line = value.tostring ( 
      ); if (!) ( line = = NULL | | 
        '. Equals (line)) {//Split commodity String []VS = Line.split (","); 
            22 combinations, constituting a record for (int i = 0; i < (vs.length-1); i++) {if ("". Equals (Vs[i])) {//excluding blank records 
          Continue 
            for (int j = i+1 J < Vs.length; J + +) {if ("". Equals (Vs[j])) {continue; 
            }//Output result Context.write (new text (Vs[i]), new text (VS[J)); 
          Context.write (new text (Vs[j]), new text (vs[i)); /** * Reduce class, implementing count of data * Output key for commodity a| B value is the association number * @author Lulei/public static class Reducet extends Reducer<text, text, text, INTWRITABLE&G T 
     
    {private int count; /** * Initialization/public void Setup (context context) {//from parameterGets the minimum number of records String COUNTSTR = Context.getconfiguration (). Get ("count"); 
      try {this.count = Integer.parseint (COUNTSTR); 
      catch (Exception e) {this.count = 0; } public void reduce (Text key, iterable<text> values, context context) throws IOException, Interruptedex 
      ception{String keystr = key.tostring (); 
      hashmap<string, integer> HashMap = new hashmap<string, integer> (); 
        The number of times to use hash to count B goods for (Text value:values) {String valuestr = value.tostring (); 
        if (Hashmap.containskey (VALUESTR)) {hashmap.put (Valuestr, Hashmap.get (VALUESTR) + 1); 
        else {hashmap.put (valuestr, 1); }///Output The result for (entry<string, integer> entry:hashMap.entrySet ()) {if (entry.getval UE () >= This.count) {//Context.write (new Text (Keystr + "|" + Entry.getkey ()) is the output number not less than the minimum value (E 
  Ntry.getvalue ()));      @Override public int run (string[] arg0) throws Exception {//TODO Auto-genera 
    Ted Method stub Configuration conf = Getconf (); 
     
    Conf.set ("Count", arg0[2]); 
    Job Job = new Job (conf); 
     
    Job.setjobname ("Jobtest"); 
    Job.setoutputformatclass (Textoutputformat.class); 
    Job.setoutputkeyclass (Text.class); 
     
    Job.setoutputvalueclass (Text.class); 
    Job.setmapperclass (Mapt.class); 
     
    Job.setreducerclass (Reducet.class); 
    Fileinputformat.addinputpath (Job, New Path (arg0[0)); 
     
    Fileoutputformat.setoutputpath (Job, New Path (arg0[1)); 
     
    Job.waitforcompletion (TRUE); Return job.issuccessful ()? 
     
  0:1; 
    }/** * @param args */public static void main (string[] args) {//TODO auto-generated method stub 
    if (args.length!= 3) {system.exit (-1); 
      try {int res = Toolrunner.run (new Configuration (), New Test (), args); SystEm.exit (RES); 
    catch (Exception e) {//TODO auto-generated catch block E.printstacktrace (); 
 } 
  } 
 
}

Upload run:

Package the program into a jar file and upload it to the cluster. The test data is also uploaded to the HDFs Distributed file system.

Command run screenshot as shown in the following illustration:

View the appropriate HDFs file system after the run is complete, as shown in the following illustration:

To this a complete MapReduce program is completed, on the learning of Hadoop, I will continue to ~ Thank you for reading, I hope to help you, thank you for your support for this site!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.