Mahout in Action Chinese version-2. Introduction of the Recommender -2.1~2.2

Last Update:2015-04-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

2. Introduction of Recommender

This chapter outlines:

???????? Recommender in Mahout

????????? A glimpse of the actual recommender

????????? Recommended engine accuracy and quality assessment

????????? Test based on a real data set: Grouplens

Every day we make some comments about things we like, dislike or even care about. This behavior is often unconscious. You hear a song on the radio, you may notice it because it is wonderful or nasty, or you can ignore it directly. Such situations can also be very common in people with T-shirts, salad dressings, ski slopes, hairstyles, face shapes, or TV shows.

???????? Although people have a variety of tastes, they follow certain patterns. People tend to like things that are similar to their preferences. For example, I love bacon lettuce and tomato sandwiches, and you can guess I like turkey sandwiches too, because the two sandwiches are similar. Or, we can think that a person will probably like something similar.

???????? These patterns can help us predict a person's likes and dislikes, and the recommendation is to predict the patterns of things people like, and we can use it to discover something new and valuable.

???????? The above has introduced some ideas about recommendations, this chapter, we will use mahout to experience how to build a simple recommendation engine, and then understand its principle, give you an intuitive feeling.

2.1 What is recommended (recommendation)

There must be a reason for you to buy this book. Maybe it's because the book is close to the book you've been reading about, so it's useful to turn it around, and maybe you see the book on the bookshelf of a colleague like you who likes machine learning as well, or maybe this colleague directly recommended it to you.

Although the reasons are different, the end result is that you discover something new: You find new things you are interested in by people who are similar to you (e.g., referrals from colleagues). On the other hand, there are things that are similar to what you like, and you tend to like them (e.g., books that are very close to the shelves and the books you like). In fact, these two scenarios depict two basic algorithms for the Recommendation engine: "user-based" and "item-based".

2.1.1 Collaborative filtering (collaborative filtering), not content-based recommendations

Strictly speaking, the above scenario is an example of collaborative filtering-it is based only on the known user (users) relationship to the project (items). This technique does not need to know the property characteristics of the project itself, and in some ways it is an advantage. Moreover, this recommendation does not care what the project itself is.

There are other recommended techniques based on project content, often referred to as "content-based". For example, a friend recommends a book to you, which is written by Chanjong so that it can be considered based on content, because the reason for this recommendation is because of a property of the book: the author. Although Mahout provides some methods for some content-based recommendations, Mahout does not have a direct implementation of this framework recommendation.

There is nothing wrong with these content-based recommendations, but they can be very effective in some very specialized areas. It can also be implemented as a meaningful framework. When building a framework for the "content-based" of a book, first choose which features of the book are attributes, such as pages, authors, publishers, colors, fonts, and so on. And you also need to decide how important these attributes are. However, this technique is difficult to apply in other areas of recommendation, such as the recommendation that you use it to recommend a pizza, which is obviously inappropriate because the pizza does not have a "page" attribute.

For this reason, Mahout does not have much to do with this recommendation technique. However, this type of recommendation Mahout can be built, and we will see in the next chapter the relevant recommended techniques for a dating site.

It's time to experience the power of collaborative filtering with Mahout!

2.2 Building the first collaborative filtering engine

Mahout includes several recommendation engines, which in fact start out as traditional user-based and content-based recommendations, but it also implements several other algorithms. But now we're going to start by exploring a user-based recommender.

2.2.1 Establishing the input

A good way to start exploring is to find a small trivial example first. The recommended basis for data input. This data represents a "preference" in the Mahout language, because the referral system is adept at representing the degree of association between the user and the project, which is called a "preference". In the data, users and projects appear to be particularly important. A preference (preference) contains a user ID and an Item ID, and then uses a value to represent the degree of preference. IDs are represented by integers in mahout, and preference can make any number type, the greater the value, the higher the preference level. For example: We divide the degree of preference into five grades: 1-5, then 1 can be very annoying, 5 means very much.

Create a new text to store the input data, we use an integer of 1 to 5 to represent five users, 101 to 104来 for four books, which means that these integers are the IDs of the user's books. Each entry is written in a comma-separated manner. Here is a sample of the data with the file name Intro.csv.

Listing 2.1 The contents of the input data file Intro.csv

By observing the file data, we found that the preferences of user 1 and 5 were similar, they liked book 101 together, and 102 was worse, and 103 worse. The same 1 and 4 are like 101 and 103,user 1 and User 2 just the opposite and so on. The following figure illustrates the relationship between the above data:

Figure 2.1 User-project diagram with dashed lines representing negative relationships (dislike), solid lines representing positive relationships (like)

2.2.2 Setting up a recommender

Which one would you recommend to user 1? Not 101,102 or 103--because he already knows these books, the one we recommend to him must be what he doesn't know. Intuitively we know that 4 and 5 and 1 are more like, so it might be reasonable to recommend 1 4 and 5. That means 104, 105, 106 are in the alternative. and 104 of the preference is 4.5 and 4, so we guess the most recommended 104. Well, seeing is believing, let's Run the program:

Listing 2.2 A simple user-based mahout recommender program

package?mia.recommender.ch02;

???

import?org.apache.mahout.cf.taste.impl.model.file.*;

import?org.apache.mahout.cf.taste.impl.neighborhood.*;

import?org.apache.mahout.cf.taste.impl.recommender.*;?

import?org.apache.mahout.cf.taste.impl.similarity.*;

import?org.apache.mahout.cf.taste.model.*;

import?org.apache.mahout.cf.taste.neighborhood.*;

Import?org.apache.mahout.cf.taste.recommend er.*;?

import?org.apache.mahout.cf.taste.similarity.*;

import?java.io.*;

import?java.util.*;

???

Class? Recommenderintro {

???

?? Public?static?void?main (string[] args) throws? Exception {

???

???? Datamodel model = new? Filedatamodel (new?) File ("Intro.csv")); A

???

???? Usersimilarity similarity = new? Pearsoncorrelationsimilarity (model

???? Userneighborhood neighborhood =

?????? New? Nearestnuserneighborh Ood (2, similarity, model);?

???

???? Recommender recommender = new? Genericuserbasedrecommender (

????????? Model, neighborhood, similarity); B?

???

???? List<recommendeditem > Recommendations =??

????????? Recommender.recommend (1, 1); C?

???

???? For? (Recommendeditem recommendation:recommendations) {

?????? SYSTEM.OUT.PRINTLN (recommendation);?

????}

???

??}

?}?

A? Loading Data Files

B? build a recommendation engine

C? recommended 1 items for user 1

For the sake of introduction, we omit the imports, class declarations, function declarations of the head for the subsequent code. Only the core code in the program is posted. To better visualize the relationship between the basic components, see Figure 2.2. Not all mahout Recommender want this, because some will use different components. The relationship between the components in this example is given.

Figure 2.2 Schematic diagram of a component based on the user's recommender

???????? In the following sections to describe the above components in detail, we can summarize the role they play throughout the recommender. Datamodel is responsible for storing and providing the data needed for user, project, and preferred computing. Usersimiliarity provides a number of methods for measuring user similarity based on some kind of algorithm. Userneighborhood defines a set of users that are similar to a specified user. Finally, Recommender uses all the components to produce a recommendation for a user, and he also provides a series of related methods.

2.2.3 Analysis Output Results

???????? Run the program with your favorite IDE, and the results should look like this: Recommendeditem [item:104, value:4.257081]?

???????? The requirement of this program is to get one of the highest recommended results, with only one result. The Recommender recommended 104 to User 1. Further, the Recommender also gives a quantization value of 4.3, because this value is the highest of all the recommended results, so it is output.

???????? The results don't look too bad, and the recommended 107 doesn't go away, just because 107 and a taste and 1 different users have a correlation. The result is 104 is reasonable, because 104 of the score is higher than 106. Further, 104 of the "preference index" between 4.0 and 4.5 is also reasonable, because 4 and 5 to 104 of the preference index is 4.0 and 4.5 respectively.

???????? It is difficult to know the correct result from the surface of the data, but the recommendation engine can give a convincing result by some wonderful methods. If you feel that this tiny program gives you a bit of pleasure by giving useful and not-so-obvious results from a cluttered pile of data, then the world of machine learning is there for you!

???????? Simply put, small data like the above is trivial for building recommender systems. In real life, the data is very large and full of noise. For example, a news site recommends news articles for readers. Preference is calculated by the number of clicks, but the preference index is probably false-perhaps a reader clicks in to find out that he doesn't like it or clicks on the wrong one. It's also possible that a lot of clicks occur before logging in, so we can't associate these clicks with a user. In addition, you can imagine the amount of data, most likely in one months there will be hundreds of millions of clicks.

???????? It is important to efficiently and accurately derive the recommended results from the data set. Next we will show how Mahout solves these problems in a case-study manner. These examples will show why some standard methods produce very poor results, or eat a lot of memory and CPUs, and also show how to configure and customize the Mahout to improve its performance.

Source Document

Mahout in Action Chinese version-2. Introduction of the Recommender -2.1~2.2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mahout in Action Chinese version-2. Introduction of the Recommender -2.1~2.2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Mahout in Action Chinese version-2. Introduction of the Recommender -2.1~2.2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support