Slope One 之二: C#實現

最後更新：2018-12-07 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

上一篇簡單介紹了Slope One演算法的概念, 這次介紹C#實現
使用基於Slope One演算法的推薦需要以下資料:
1. 有一組使用者
2. 有一組Items(文章, 商品等)
3. 使用者會對其中某些項目打分(Rating)表達他們的喜好
Slope One演算法要解決的問題是, 對某個使用者, 已知道他對其中一些Item的Rating了, 向他推薦一些他還沒有Rating的Items, 以增加銷售機會.

一個推薦系統的實現包括以下三步:
1. 計算出任意兩個Item之間Rating的差值
2. 輸入某個使用者的Rating記錄, 推算出對其它Items的可能Rating值
3. 根據Rating的值排序, 給出Top Items;

第一步:例如我們有三個使用者和4個Items, 使用者打分的情況如下表.

Ratings	User1	User2	User3
Item1	5	4	4
Item2	4	5	4
Item3	4	3	N/A
Item4	N/A	5	5

在第一步中我們的工作就是計算出Item之間兩兩的打分之差, 也就是使說計算出以下矩陣:

	Item1	Item2	Item3	Item4
Item1	N/A	0/3	2/2	-2/2
Item2	0/3	N/A	2/2	-1/2
Item3	-2/2	-2/2	N/A	-2/1
Item4	2/2	1/2	2/1	N/A

考慮到加權演算法, 還要記錄有多少人對這兩項打了分(Freq), 我們先定義一個結構來儲存Rating:
    public class Rating
    {
        public float Value { get; set; }
        public int Freq { get; set; }

        public float AverageValue
        {
            get {return Value / Freq;}
        }
    }
我決定用一個Dictionary來儲存這個結果矩陣:
    public class RatingDifferenceCollection : Dictionary<string, Rating>
    {
        private string GetKey(int Item1Id, int Item2Id)
        {
            return Item1Id + "/" + Item2Id;
        }

        public bool Contains(int Item1Id, int Item2Id)
        {
            return this.Keys.Contains<string>(GetKey(Item1Id, Item2Id));
        }

        public Rating this[int Item1Id, int Item2Id]
        {
            get {
                    return this[this.GetKey(Item1Id, Item2Id)];
            }
            set { this[this.GetKey(Item1Id, Item2Id)] = value; }
        }
    }

接下來我們來實現SlopeOne類. 首先建立一個RatingDifferenceCollection來儲存矩陣, 還要建立HashSet來保持系統中總共有哪些Items:
    public class SlopeOne
    {
        public RatingDifferenceCollection _DiffMarix = new RatingDifferenceCollection(); // The dictionary to keep the diff matrix
        public HashSet<int> _Items = new HashSet<int>(); // Tracking how many items totally

方法AddUserRatings接收一個使用者的打分記錄(Item-Rating): public void AddUserRatings(IDictionary<int, float> userRatings)
AddUserRatings中有兩重迴圈, 外層迴圈遍曆輸入中的所有Item, 內層迴圈再遍曆一次, 計算出一對Item之間Rating的差存入_DiffMarix, 記得Freq加1, 以記錄我們又碰到這一對Items一次:
    Rating ratingDiff = _DiffMarix[item1Id, item2Id];
    ratingDiff.Value += item1Rating - item2Rating;
    ratingDiff.Freq += 1;

對每個使用者調用AddUserRatings後, 建立起矩陣. 但我們的矩陣是以表的形式儲存:

	Rating Dif	Freq
Item1-2	0	3
Item1-3	1	2
Item2-1	0	3
Item2-3	1	2
Item3-1	-1	2
Item3-2	-1	2
Item1-4	-1	2
Item2-4	-0.5	2
Item3-4	-2	1
Item4-1	1	2
Item4-2	0.5	2
Item4-3	2	1

第二步:輸入某個使用者的Rating記錄, 推算出對其它Items的可能Rating值:
public IDictionary<int, float> Predict(IDictionary<int, float> userRatings)
也是兩重迴圈, 外層迴圈遍曆_Items中所有的Items; 內層遍曆userRatings, 用此使用者的ratings結合第一步得到的矩陣, 推算此使用者對系統中每個項目的Rating:
    Rating itemRating = new Rating(); // Prediction of this user's rating
    ...
    Rating diff = _DiffMarix[itemId, inputItemId]:
    itemRating.Value += diff.Freq * (diff.AverageValue + userRating.Value);
    itemRating.Freq += diff.Freq;

第三步:得到使用者的Rating預測後,就可以按rating排序, 向使用者推薦了. 測試一下:
    Dictionary<int, float> userRating userRating = new Dictionary<int, float>();
    userRating.Add(1, 5);
    userRating.Add(3, 4);
    IDictionary<int, float> Predictions = test.Predict(userRating);
    foreach (var rating in Predictions)
    {
        Console.WriteLine("Item " + rating.Key + " Rating: " + rating.Value);
    }
輸出:
Item 2 Rating: 5
Item 4 Rating: 6

改進:
觀察之前產生的矩陣可以發現, 其中有很多浪費的空間; 例如: 對角線上永遠是不會有值的. 因為我們是用線性表儲存矩陣值, 已經避免了這個問題;
對角線下方的值和對角線上方的值非常對稱,下方的值等於上方的值乘以-1; 在資料量很大的時候是很大的浪費. 我們可以通過修改RatingDifferenceCollection來完善. 可以修改GetKey方法, 用Item Pair來作為Key:
    private string GetKey(int Item1Id, int Item2Id) {
        return (Item1Id < Item2Id) ? Item1Id + "/" + Item2Id : Item2Id + "/" + Item1Id ;;
    }
完整代碼在這裡,在.net 3.5上調試通過;
參考資料
tutorial about how to implement Slope One in Python

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More