Slope one: simple and efficient recommendation Algorithms

Source: Internet
Author: User
Tags php and mysql

The recommendation system was first applied on Amazon's website. Based on previous user Buying Behaviors, we recommend other products that can be purchased at the same time when purchasing a product. Dangdang, which is a good solution in China, sometimes buys books, it can always recommend other books that I am interested in, which is a technology that greatly promotes sales.

A general collaborative filtering algorithm first collects users' scores on things (products). One is to score a book or a song directly, and the other is a recessive score, for example, a table is purchased in a business system.
2 points, 1 point for browsing, and 0 points for others. I am optimistic about the implicit score, because the direct score requires a high degree of user participation. Many websites leave a score button on the Content Page, from 1 ~ 5. select one.
I may like this article, but where do I know how much I like? I also want to think about it. A very important principle in website design is: do not let me
Think !, Therefore, I would like to give a score or not, but the implicit score is different. You will only buy the books you like, and you will only listen to the songs you like multiple times.

After collecting users' scores, you can use the nearest neighbor to search for other things or people with similar features or interests. The nearest neighbor search algorithm is generally person correlation coefficient) cosine-based similarity and the cosine similarity adjustment (adjusted cosine similarity ). The Application of cosine theorem in data mining has been introduced in the Google black and white paper. You can refer to the 12-cosine theorem of the beautiful mathematical series and the classification of news.

The rest of the work is to make recommendations based on the nearest neighbor set.

The calculation of the nearest neighbor set is relatively costly, especially when there is a large amount of data. Today we will share with you a simple and efficient collaborative filtering algorithm: slope one.

Basic Principles

User Score things Score transaction B
X 3 4
Y 2 4
Z 4 ?

What is the possible score of user Z on thing B? There is a saying on the stock that the average value can cover up all abnormal fluctuations, so the various technical indicators on the stock clean up the average graph or column of different time periods
Graph. Similarly, the slope one algorithm also believes that the average value can also replace the scoring difference between two unknown individuals. The average value of things a on things B is very poor: (3-4) +
(2-4)/2 =-1.5, that is to say, people generally score things B 1.5 higher than things a, so Slope
One algorithm guessed that Z scored 4 + 1.5 = 5.5 for transaction B.

Is it very simple?

Weighting Algorithm

N people scored things a and B, R (A-> B) indicates the average difference (A-B) between the N people who scored a and B ), m people score things B and things c
R (c-> B) indicates the mean difference (C-B) between m people in scoring C and B. Note that the mean difference is not the square difference, now a user scores a as RA and C
RC, then a may score B as follows:

RB = (N * (ra-R (A-> B) + M * (RC-R (c-> B)/(m + n)

Open-source slope one package

  • Python

    Http://www.serpentine.com/blog/2006/12/12/collaborative-filtering-made-easy/

  • Java

    Http://taste.sourceforge.net/

    Http://www.daniel-lemire.com/fr/documents/publications/SlopeOne.java

    Http://www.nongnu.org/cofi/

  • PHP

    Http://sourceforge.net/projects/vogoo

    Http://www.drupal.org/project/cre

    Http://www.daniel-lemire.com/fr/documents/publications/webpaper.txt slope one algorithm written by the author, simple and clear, strongly recommended.

  • Erlang
    Http://chlorophil.blogspot.com/2007/06/collaborative-filtering-weighted-slope.html
  • C #
    C # version written by Chinese people in http://www.cnblogs.com/kuber/articles/SlopeOne_CSharp.html
  • T-SQL
    Http://blog.charliezhu.com/2008/07/21/implementing-slope-one-in-t-sql/

For Versions in other languages, see http://en.wikipedia.org/wiki/slope_one. the slope One Algorithm Implementation for PHP and MySQL will be available at http://code.google.com/p/openslopeone/
The source is optimized mainly for massive data and distributed processing. Currently, in my notebook (with GB memory and GB memory), I have tested the 440w scoring record, with a single thread, processing is completed in 3 hours and 47 minutes.
The speed is quite good. Recently, my work is too busy. I will open the source code and put it on the address. In a few days, I will have a detailed introduction to my algorithms. I hope you will criticize and correct them, learn together, and make progress together.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.