機器學習Python實現 SVD 分解

最後更新：2015-03-17 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：

這篇文章主要是結合機器學習實戰將推薦演算法和SVD進行相應的結合

任何一個矩陣都可以分解為SVD的形式

其實SVD意義就是利用特徵空間的轉換進行資料的映射，後面將專門介紹SVD的基礎概念，先給出python，這裡先給出一個簡單的矩陣，表示使用者和物品之間的關係

這裡我自己有個疑惑？

對這樣一個DATA = U（Z）Vt

這裡的U和V真正的幾何含義 : 書上的含義是U將物品映射到了新的特徵空間， V的轉置將使用者映射到了新的特徵空間

下面是代碼實現，同時SVD還可以用於降維，降維的操作就是通過保留值比較的奇異值

# -*- coding: cp936 -*-'''Created on Mar 8, 2011@author: Peter'''from numpy import *from numpy import linalg as la #用到別名#這裡主要結合推薦系統介紹SVD，所以這裡的資料都可以看成是使用者對物品的一個打分def loadExData():    return[[0, 0, 0, 2, 2],           [0, 0, 0, 3, 3],           [0, 0, 0, 1, 1],           [1, 1, 1, 0, 0],           [2, 2, 2, 0, 0],           [5, 5, 5, 0, 0],           [1, 1, 1, 0, 0]]    def loadExData2():    return[[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5],           [0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 3],           [0, 0, 0, 0, 4, 0, 0, 1, 0, 4, 0],           [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0],           [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0],           [0, 0, 0, 0, 5, 0, 1, 0, 0, 5, 0],           [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1],           [0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4],           [0, 0, 0, 2, 0, 2, 5, 0, 0, 1, 2],           [0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0],           [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]]    def ecludSim(inA,inB):    return 1.0/(1.0 + la.norm(inA - inB))  #計算向量的第二範式,相當於直接計算了歐式距離def pearsSim(inA,inB):    if len(inA) < 3 : return 1.0    return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1] #corrcoef直接計算皮爾遜相關係數def cosSim(inA,inB):    num = float(inA.T*inB)    denom = la.norm(inA)*la.norm(inB)    return 0.5+0.5*(num/denom)  #計算餘弦相似性#協同過濾演算法#dataMat 使用者資料 user 使用者 simMeas 相似性計算方式 item 物品def standEst(dataMat, user, simMeas, item):     n = shape(dataMat)[1] #計算資料行的數量，物品的數量    simTotal = 0.0; ratSimTotal = 0.0     for j in range(n):        userRating = dataMat[user,j]        print(dataMat[user,j])        if userRating == 0: continue  #如果使用者u沒有對物品j進行打分，那麼這個判斷就可以跳過了        overLap = nonzero(logical_and(dataMat[:,item].A>0,                                       dataMat[:,j].A>0))[0]    #找到對物品 j 和item都打過分的使用者        if len(overLap) == 0: similarity = 0        else: similarity = simMeas(dataMat[overLap,item], dataMat[overLap,j])     #利用相似性計算兩個物品之間的相似性                                           print 'the %d and %d similarity is: %f' % (item, j, similarity)        simTotal += similarity        ratSimTotal += similarity * userRating  #待推薦物品與使用者打過分的物品之間的相似性*使用者對物品的打分    if simTotal == 0: return 0    else: return ratSimTotal/simTotal#利用SVD進行分解，但是這裡是直接用的庫裡面的函數    #如果自己實現一個SVD分解，我想就是和矩陣論裡面的求解知識是一樣的吧，但是可能在求特徵值的過程中會比較痛苦def svdEst(dataMat, user, simMeas, item):    n = shape(dataMat)[1]    simTotal = 0.0; ratSimTotal = 0.0    U,Sigma,VT = la.svd(dataMat) #直接進行分解    Sig4 = mat(eye(4)*Sigma[:4]) #arrange Sig4 into a diagonal matrix    xformedItems = dataMat.T * U[:,:4] * Sig4.I  #create transformed items    for j in range(n):        userRating = dataMat[user,j]        if userRating == 0 or j==item: continue        similarity = simMeas(xformedItems[item,:].T,                             xformedItems[j,:].T)        print 'the %d and %d similarity is: %f' % (item, j, similarity)        simTotal += similarity        ratSimTotal += similarity * userRating    if simTotal == 0: return 0    else: return ratSimTotal/simTotal#真正的推薦函數，後面兩個函數就是採用的相似性的計算方法和推薦用的方法def recommend(dataMat, user, N=3, simMeas=cosSim, estMethod=standEst):    unratedItems = nonzero(dataMat[user,:].A==0)[1]  #find unrated items  nonzero()[1]返回的是非零值所在的行數，返回的是一個元組   if len(unratedItems) == 0: return 'you rated everything'    itemScores = []    for item in unratedItems:        estimatedScore = estMethod(dataMat, user, simMeas, item)        itemScores.append((item, estimatedScore))    return sorted(itemScores, key=lambda jj: jj[1], reverse=True)[:N]#擴充的例子，利用SVD進行映像的壓縮#將映像列印出來def printMat(inMat, thresh=0.8):    for i in range(32):        for k in range(32):            if float(inMat[i,k]) > thresh:                print 1,            else: print 0,        print ''#最後發現重構出來的資料圖是差不多的def imgCompress(numSV=3, thresh=0.8):    myl = []    for line in open('0_5.txt').readlines():        newRow = []        for i in range(32):            newRow.append(int(line[i]))        myl.append(newRow)    myMat = mat(myl)   #將資料讀入了myMat當中        print "****original matrix******"    printMat(myMat, thresh)    U,Sigma,VT = la.svd(myMat)    SigRecon = mat(zeros((numSV, numSV)))   #構建一個3*3的空矩陣    for k in range(numSV):#construct diagonal matrix from vector        SigRecon[k,k] = Sigma[k]    reconMat = U[:,:numSV]*SigRecon*VT[:numSV,:]    print "****reconstructed matrix using %d singular values******" % numSV    printMat(reconMat, thresh)

通過結果可以看到，降維前和降維後的圖片基本都是相似的

機器學習Python實現 SVD 分解

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More