The idea of itembased recommendation algorithm in map-reduce version of Mahout

Last Update:2015-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

the idea of itembased recommendation algorithm in map-reduce version of Mahout
recently wanted to write a map-reduce version of the userbased, so first study mahout in the implementation of the itembased algorithm. Itembased looks simple, but it's a bit complicated to go into the implementation details, and it's even more complicated with map-reduce implementations.

The essence of itembased:

Predict a user's rating for an item item,

Take a look at the user's rating for other item, and if the other item is more similar to the item, the higher the weight.

Last weighted average.

itembased Core steps:

1 calculating the item similarity matrix (multiplying by two matrices)

2 User score matrix multiplied by item similarity matrix = user scoring Prediction matrix

Of course, the so-called matrix multiplication is not a mathematical multiplication. In mathematical sense, the line vector of the front matrix and the following column vectors are the inner product. And here, often not only the inner product, it is possible to do a normalize, it is possible to do downsample and so on.

Input file Data format: Userid,itemid,pref

User1,item1,pref

User2,item1,pref

User2,item2,pref

User3,item1,pref

user_vectors:userid,vector[(ITEMID,PREF)]

user1,vector[(ITEM1,PREF)]

user2,vector[(Item1,pref), (ITEM2,PREF)]

user3,vector[(ITEM1,PREF)]

rating_matrix:itemid,vector[(USERID,PREF)]

item1,vector[(User1,pref), (User2,pref), (USER3,PREF)]

item2,vector[(USER2,PREF)]

Rating_matrix-> Similaritymatrix

By calculating the similarity between Rating_matrix rows and rows, it is concluded that itemsimilarity

	Item1	Item2
Item1
Item2

MAPPER:

Similaritymatrix-> itemid,vector[(Itemid,sim)]

item1,vector[(Item2,sim)]

item2,vector[(Item1,sim)]

User_vectors-> ItemID, (userid,pref)

Item1, (USER1,PREF)

Item1, (USER2,PREF)

ITEM2, (USER2,PREF)

Item1, (USER3,PREF)

(The format is the same as the input file, but the data structure is stored differently)

Reducer:itemid, (vector[(Itemid,sim)], (Vector[userid],vector[pref]))

Current item, a list of item similar to the current item (take TOPK), a user list that is too much for the current item, and its score.

Item1, (vector[(Item2,sim)], (Vector[user1,user2,user3],vector[pref,pref,pref]))

Item2, (vector[(Item1,sim)], (Vector[user2],vector[pref]))

Mapper:userid, (Pref (Cur_item), vector[(Itemid,sim)])

Indicates that the UserID's rating for Cur_item is pref, and the item list similar to Cur_item and its similarity is vector.

User1, (Pref (item1), vector[(Item2,sim)])

User2, (Pref (item1), vector[(Item2,sim)])

User3, (Pref (item1), vector[(Item2,sim)])

User2, (Pref (ITEM2), vector[(Item1,sim)])

For example, the meaning of the first line, to predict User1 to the rating of the non-rated item, because ITEM1 is similar to her, so consider User1 to Item1 rating.

Reduce:userid,itemid,pref

With the mapper above, the same user data falls into a reducer,

You get a user-owned rating item and the item's similarity to the other item.

Userid	Item1	Item2
Item1,pref	Null	Sim
Item2,pref	Sim	Null

User1	Item1	Item2
Item1,pref	Null	Sim
ITEM2,unkownpref	Sim	Null

User2	Item1	Item2
Item1,pref	Null	Sim
Item2,pref	Sim	Null

User3	Item1	Item2
Item1,pref	Null	Sim
ITEM2,unkownpref	Sim	Null

Based on this (User,item), item's matrix, you can predict the user's rating for an item that is not scored.

P (u,n) =sum (pref (u,i) *sim (n,i))/sum (SIM (n,i))

To predict the U-N preference,

I was seen before the U,

The reference to these I preferences,

And the similarity between these n and I,

Weighted average.

It is worth mentioning that mahout in order to consider performance, and did not really do a complete matrix multiplication.

For example, itemsimilarity, only retained the TOPK, the other is not saved (in fact, the similarity is too small, can be ignored).

Therefore, for a user, the item collection to be predicted is not the item collection minus the user's scored item. Instead, the user-scored item has the most similar TOPK item collection. For an item that is not in this collection, the similarity to the item that the current user has scored is too small to directly indicate that the user is not interested, so the predicted score is 0, so no calculation is necessary.

As for the most similar topk how to take, I did not study carefully, it may be agreed that K is a constant, it may be a threshold, the similarity is less than the threshold value. For both, the latter is more reliable and symmetrical.

This article link: http://blog.csdn.net/lingerlanlan/article/details/42656161 This article linger

The idea of itembased recommendation algorithm in map-reduce version of Mahout

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The idea of itembased recommendation algorithm in map-reduce version of Mahout

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The idea of itembased recommendation algorithm in map-reduce version of Mahout

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support