Implementation of project-based collaborative filtering recommendations through MapReduce

Last Update:2018-07-25 Source: Internet

Author: User

Tags iterable split static class

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

implementation of project-based collaborative filtering recommendations through MapReduceRequirements Introduction
There are many types of collaborative filtering recommendations on the Web, and I'm mainly introducing collaborative filtering based on projects. The project-based collaborative filtering recommendation is based on the assumption that a user would like a project similar to the one he liked earlier. Therefore, the key to project-based collaborative filtering recommendation is to calculate the similarity between items. Data introduction
The data collection I chose is as follows:
1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.0
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0
The data collection has three columns, the ID of the user, the ID of the project, and the user's rating of the project. I want to calculate the recommended results from this data. Environment Introduction
My system is Windows7, with the software as follows:
Eclipse Java EE IDE for WEB developers.
Version:mars.2 Release (4.5.2)
Build id:20160218-0600
JDK1.7
Maven3.3.9
Hadoop2.6
Development environment I do not introduce here, we are interested in online search configuration. Implementation steps Introduction
1, I first build a MAVEN project, configure the Hadoop-dependent jar package, the main steps are as follows:
Open Ecplise, right-click below, select New, select Project, pop up a new project interface, enter MAVEN in the input box, show the project as shown in the following image, and tap next. Building a MAVEN project here is not detailed, there are many online. Here the main introduction pom.xml configuration. I mainly configure the following content:
Hadoop-common
Hadoop-hdfs
Hadoop-mapreduce-client-core
Hadoop-mapreduce-client-jobclient
Hadoop-mapreduce-client-common
The results are as follows:

2, through the WINSCP to upload uitemtable.csv data to the distributed Server side downloads folder, through the command:
./hadoop-put ~/downloads Hdfs://master:9000/data
The effect is shown in the following figure:

3, read the HDFS corresponding data, according to the user group, calculates all the items appear the combination list, obtains the user to the item the scoring matrix.
3.1, the construction of the recommended entry class, Recommend.java content as follows:

Package RECOMMAND.HADOOP.PHL;
Import java.io.IOException;
Import java.net.URISyntaxException;
Import Java.util.HashMap;
Import Java.util.Map;
Import Java.util.regex.Pattern;
/**
<ul><li> Recommended Entrance </li>
<li> @author Root
*
*/
public class Recommend {
public static final String HDFS = "hdfs://10.10.44.92:9000";
public static final Pattern DELIMITER = pattern.compile ("[\ T,]");
public static void Main (string[] args) throws ClassNotFoundException, IOException, URISyntaxException, interruptedexception {
map<string,string> path = new hashmap<string,string> ();
Path.put ("Step1input", HDFS + "/data/uitemtable.csv");
Path.put ("Step1output", HDFS + "/output/step1");
Path.put ("Step2input", Path.get ("Step1output"));
Path.put ("Step2output", HDFS + "/OUTPUT/STEP2");
Path.put ("Step3input1", Path.get ("Step1output"));
Path.put ("Step3output1", HDFS + "/output/step3_1");
Path.put ("Step3input2", Path.get ("Step2output"));
Path.put ("Step3output2", HDFS + "/output/step3_2");
Path.put ("Step4input1", Path.get ("STEP3OUTPUT1"));
Path.put ("Step4input2", Path.get ("Step3output2"));
Path.put ("Step4output", HDFS + "/OUTPUT/STEP4");
STEP1.RUNSTEP1 (path);
STEP2.RUNSTEP2 (path);
Step3.runstep3_1 (path);
Step3.runstep3_2 (path);
STEP4.RUNSTEP4 (path);
System.exit (0);
}
}

3.2, the management of the class of HDFs, Hdfsdao.java content as follows:

Package RECOMMAND.HADOOP.PHL;
Import java.io.IOException;
Import Java.net.URI;
Import java.net.URISyntaxException;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.mapreduce.Job;
Import sun.tools.tree.ThisExpression;
public class Hdfsdao {
HDFs Access Address
private static final String HDFS = "Hdfs://10.10.44.92:9000/";
HDFs Path
Private String Hdfspath;
Hadoop System Configuration
Private Configuration conf;
Name of the running job
private String name;
Public Hdfsdao (Configuration conf) {
this.conf = conf;
}
Public Hdfsdao (Configuration conf,string name) {
this.conf = conf;
THIS.name = name;
}
Public Hdfsdao (String HDFs, Configuration conf,string name) {
This.hdfspath = HDFs;
this.conf = conf;
THIS.name = name;
}
Public Job conf () throws ioexception{
Job Job = Job.getinstance (This.conf,this.name);
return job;
}
public void RMr (String outurl) throws IOException, URISyntaxException {
FileSystem FileSystem = filesystem.get (new URI (Outurl), this.conf);
if (Filesystem.exists (new Path (Outurl))) {
Filesystem.delete (New Path (Outurl), true);
System.out.println ("Outurl" +outurl);
}
}
}

3.3, according to the user group, calculates all the items appear the combination list, obtains the user to the item the scoring matrix, Step1.java, the content is as follows:

Package RECOMMAND.HADOOP.PHL;
Import java.io.IOException;
Import java.net.URISyntaxException;
Import Java.util.Map;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
/**</li>
<li> Group by user, calculate the list of all items appearing, get the user's rating matrix of items </li>
<li> @author Root
*
*/
public class Step1 {
public static class Mymapper extends Mapper<object,text,intwritable,text> {
Private final static intwritable k = new intwritable ();
Private final static Text V = new text ();
@Override
protected void Map (Object key, Text value, context context)
Throws IOException, Interruptedexception {
string[] tokens = Recommend.DELIMITER.split (value.tostring ());
int UserID = Integer.parseint (Tokens[0]);
String ItemID = tokens[1];
String pref = tokens[2];
K.set (UserID);
V.set (ItemID + ":" + pref);
Context.write (k, v);
}
}
public static class Myreducer extends Reducer<intwritable, Text, intwritable, text>{
Private final static Text V = new text ();
@Override
protected void reduce (intwritable K2, iterable<text> v2s,context Context) throws IOException, interruptedexception {
StringBuilder sb = new StringBuilder ();
for (Text v2:v2s) {
Sb.append ("," + v2.tostring ());
}
V.set (Sb.tostring (). Replacefirst (",", ""));
Context.write (K2, v);
}
}
public static void RunStep1 (map<string, string> path) throws IOException, URISyntaxException, ClassNotFoundException, Interruptedexception {
String input = Path.get ("Step1input");
String output = Path.get ("Step1output");
1.1 Reading HDFs files
Configuration conf = new configuration ();
Hdfsdao Hdfsdao = new Hdfsdao (Input,conf,step1.class.getsimplename ());
Job Job = hdfsdao.conf ();
Package Run method
Job.setjarbyclass (Step1.class);
1.1 Read File Settings input path and file input format
Fileinputformat.addinputpath (Job, new Path (input));
Job.setinputformatclass (Textinputformat.class);
Specifying the Custom mapper class
Job.setmapperclass (Mymapper.class);
Specifies the key value type for the mapper output
Job.setmapoutputkeyclass (Intwritable.class);
Job.setmapoutputvalueclass (Text.class);
1.3 Partitioning
/* Job.setpartitionerclass (Hashpartitioner.class);
Job.setnumreducetasks (1); * *
1.4 Sorting groups
1.5 Attribution
Job.setcombinerclass (Myreducer.class);
2.1
2.2 Specifying a custom reduce class
Job.setreducerclass (Myreducer.class);
Job.setoutputkeyclass (Intwritable.class);
Job.setoutputvalueclass (Text.class);
2.3 Specifying the path to the output
Fileoutputformat.setoutputpath (Job, new Path (output));
Job.setoutputformatclass (Textoutputformat.class);
HDFSDAO.RMR (output);
Job.waitforcompletion (TRUE);
}
}

The results are as follows:

Taking the first data as an example, the user 1 rated three items of 101,102,103, the corresponding rating is: 5,3,2.5.
3.4, the list of items to count, the establishment of the same matrix of items, Step2.java content as follows:

Package RECOMMAND.HADOOP.PHL;
Import java.io.IOException;
Import java.net.URISyntaxException;
Import Java.util.Map;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class Step2 {
public static class Mymapper extends Mapper<object, text, text, intwritable>{
Private final static Text k = new text ();
Private final static intwritable v = new intwritable (1);
@Override
protected void Map (Object key, Text value, context context)
Throws IOException, Interruptedexception {
[1, 101:5, 102:3, 103:2.5]
string[] tokens = Recommend.DELIMITER.split (value.tostring ());
for (int i = 1; i < tokens.length; i++) {
String ItemID = Tokens[i].split (":") [0];//itemid 101
for (int j = 1; j < Tokens.length; J + +) {
String itemID2 = Tokens[j].split (":") [0]; 101
K.set (ItemID + ":" + itemID2);
Context.write (k, v);
}
}
}
}
public static class Myreducer extends Reducer<text, intwritable, Text, intwritable>{
Private final static intwritable v = new intwritable ();
@Override
protected void reduce (Text K2, iterable<intwritable> V2s,
Context context) throws IOException, Interruptedexception {
int count=0;
for (intwritable v2:v2s) {
Count+=v2.get ();
}
V.set (count);
Context.write (K2, v);
}
}
public static void RunStep2 (map<string, string> path) throws IOException, URISyntaxException, ClassNotFoundException, Interruptedexception {
String input = Path.get ("Step2input");
String output = Path.get ("Step2output");
1.1 Reading HDFs files
Configuration conf = new configuration ();
Hdfsdao Hdfsdao = new Hdfsdao (Input,conf,step2.class.getsimplename ());
Job Job = hdfsdao.conf ();
Package Run method
Job.setjarbyclass (Step2.class);
1.1 Read File Settings input path and file input format
Fileinputformat.addinputpath (Job, new Path (input));
Job.setinputformatclass (Textinputformat.class);
Specifying the Custom mapper class
Job.setmapperclass (Mymapper.class);
Specifies the key value type for the mapper output
Job.setmapoutputkeyclass (Text.class);
Job.setmapoutputvalueclass (Intwritable.class);
1.3 Partitioning
/* Job.setpartitionerclass (Hashpartitioner.class);
Job.setnumreducetasks (1); * *
1.4 Sorting groups
1.5 Attribution
Job.setcombinerclass (Myreducer.class);
2.1
2.2 Specifying a custom reduce class
Job.setreducerclass (Myreducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);
2.3 Specifying the path to the output
Fileoutputformat.setoutputpath (Job, new Path (output));
Job.setoutputformatclass (Textoutputformat.class);
Check whether the same output is already available, delete
HDFSDAO.RMR (output);
Job.waitforcompletion (TRUE);
}
}

The results of the operation are as follows:
101:101 5
101:102 3
101:103 4
101:104 4
101:105 2
101:106 2
101:107 1
102:101 3
102:102 3
102:103 3
102:104 2
102:105 1
102:106 1
103:101 4
103:102 3
103:103 4
103:104 3
103:105 1
103:106 2
104:101 4
104:102 2
104:103 3
104:104 4
104:105 2
104:106 2
104:107 1
105:101 2
105:102 1
105:103 1
105:104 2
105:105 2
105:106 1
105:107 1
106:101 2
106:102 1
106:103 2
106:104 2
106:105 1
106:106 2
107:101 1
107:104 1
107:105 1
107:107 1
The co-existing matrix we built is a group of two commodities, with the second set as an example, 101:102 3 means that there are 102 users who scored 101 product ratings and 3 items. To facilitate the reading of my blog, I translate the above results into a matrix in mathematics, as follows:
[101] [102] [103] [104] [105] [106] [107]
[101] 5 3 4 4 2 2 1
[102] 3 3 3 2 1 1 0
[103] 4 3 4 3 1 2 0
[104] 4 2 3 4 2 2 1
[105] 2 1 1 2 2 1 1
[106] 2 1 2 2 1 2 0
[107] 1 0 0 1 1 0 1
3.5, merging the matrix and scoring matrix, Step3.java content as follows:

Package RECOMMAND.HADOOP.PHL;
Import java.io.IOException;
Import java.net.URISyntaxException;
Import Java.util.Map;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
        public class Step3 {public static class MyMapper1 extends Mapper<longwritable, Text, intwritable, text> {
        Private final static intwritable k = new intwritable ();
        Private final static Text V = new text (); @Override protected void Map (longwritable key, Text value, Context context) throws IOEXception, interruptedexception {//101:101 5 string[] tokens = Recommend.DELIMITER.split (value.t
            Ostring ());
                for (int i = 1; i < tokens.length; i++) {string[] vector = Tokens[i].split (":");
                int ItemID = Integer.parseint (vector[0]);
                String pref = vector[1];
                K.set (ItemID);
                V.set (Tokens[0] + ":" + pref);
            Context.write (k, v);
         }}} public static class MyMapper2 extends Mapper<longwritable, Text, text,intwritable> {
         Private final static Text k = new text ();
        Private final static intwritable v = new intwritable (); @Override protected void Map (longwritable key, Text value, Context context) throws IOException, in
            terruptedexception {string[] tokens = Recommend.DELIMITER.split (value.tostring ());
            K.set (Tokens[0]); V.set (Integer.parseinT (Tokens[1]));
        Context.write (k, v); }} public static void Runstep3_1 (map<string, string> path) throws IOException, URISyntaxException, Classnot
        Foundexception, interruptedexception{String input = Path.get ("step3input1");
        String output = Path.get ("step3output1");
        1.1 Reading the HDFs file configuration conf = new configuration ();
        Hdfsdao Hdfsdao = new Hdfsdao (Input,conf,step3.class.getsimplename ());
        Job Job = hdfsdao.conf ();
        Packaging Operation Method Job.setjarbyclass (Step2.class);     
        1.1 Read File Settings input path and file input format fileinputformat.addinputpath (job, new Path (input));
        Job.setinputformatclass (Textinputformat.class);
        Specifies the custom mapper class Job.setmapperclass (Mymapper1.class);
        Specifies the key value type Job.setmapoutputkeyclass (Intwritable.class) for the mapper output;
        Job.setmapoutputvalueclass (Text.class);
   2.3 Specifies the path of the output Fileoutputformat.setoutputpath (Job, new path (output));     Job.setoutputformatclass (Textoutputformat.class);
        Check if you have the same output, and then delete hdfsdao.rmr (output);
    Job.waitforcompletion (TRUE); } public static void Runstep3_2 (map<string, string> path) throws IOException, URISyntaxException, ClassNotFound
        Exception, interruptedexception{String input = Path.get ("Step3input2");
        String output = Path.get ("Step3output2");
        1.1 Reading the HDFs file configuration conf = new configuration ();
        Hdfsdao Hdfsdao = new Hdfsdao (Input,conf,step3.class.getsimplename ());
        Job Job = hdfsdao.conf ();
        Packaging Operation Method Job.setjarbyclass (Step2.class);     
        1.1 Read File Settings input path and file input format fileinputformat.addinputpath (job, new Path (input));
        Job.setinputformatclass (Textinputformat.class);
        Specifies the custom mapper class Job.setmapperclass (Mymapper2.class);
        Specifies the key value type Job.setmapoutputkeyclass (Text.class) for the mapper output; Job.setmapoutputvalueclass (INTWRITABLE.CLASS);
        2.3 Specifies the path of the output Fileoutputformat.setoutputpath (Job, new path (output));
        Job.setoutputformatclass (Textoutputformat.class);
        Check if you have the same output, and then delete hdfsdao.rmr (output);
Job.waitforcompletion (TRUE); }
}

The results of the same matrix operation and the results given in 3.4 steps, the user scoring matrix is as follows:

Take the first article as an example, 101 product User 1 gives a rating of 5 points. The matrix is converted into data as follows:
[User1] [user2] [User3] [user4] [User5]
[101] 5 2 2 5 4
[102] 3 2.5 0 0 3
[103] 2.5 5 0 3 2
[104] 0 2 4 4 4
[+] 0 0 4.5 0 3.5
[106] 0 0 0 4 4
[107] 0 0 5 0 0
3.6 For convenience, I introduced a class, Cooccurr The Ence.java content is as follows:

Package RECOMMAND.HADOOP.PHL;

public class Cooccurrence {
    private int itemID1;
    private int itemID2;
    private int num;

    Public cooccurrence (int itemID1, int itemID2, int num) {
        super ();
        THIS.ITEMID1 = itemID1;
        This.itemid2 = itemID2;
        This.num = num;
    }

    public int getItemID1 () {
        return itemID1;
    }

    public void setItemID1 (int itemID1) {
        this.itemid1 = itemID1;
    }

    public int getItemID2 () {
        return itemID2;
    }

    public void setItemID2 (int itemID2) {
        this.itemid2 = itemID2;
    }

    public int Getnum () {
        return num;
    }

    public void setnum (int num) {
        this.num = num;
    }

}
Since our co-existing matrix is grouped by two commodities, ITEMID1,ITEMID2 represents the number of users who evaluate itemID1 and evaluate itemid2,num.

3.7, calculate the recommended list, Step4.java content as follows:

package RECOMMAND.HADOOP.PHL;
Import java.io.IOException;
Import java.net.URISyntaxException;
Import java.util.ArrayList;
Import Java.util.HashMap;
Import Java.util.Iterator;
Import java.util.List;

Import Java.util.Map;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer; Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More