one, factor decomposition machineFMthe Modelfactor decomposition Machine (factorization machine, FM) is bySteffen Rendlea machine learning algorithm based on matrix decomposition is proposed. 1, Factor decomposition machineFMThe advantages
for factor decomposition machinesFM, the most important feature is that the sparse data has a good learning ability. There are a lot of sparse data in the real world, such as the example of the recommendation system that the author suggests is a very intuitive example of sparse features. 2, Factor decomposition machineFMthe Modelfor a factor decomposition machine with a degree of 2FMthe model is:
which, parameters,,. Represents the dot product of two vectors and vectors of a size:
which represents the dimension vector of the coefficient matrix, and is called a hyper-parameter. In factor decomposition machineFMin the model, the first two parts are traditional linear models, and the last part takes into account the interrelationship between the two different feature components. Factor decomposition MachineFMIt can also be generalized to higher-order forms, taking into account the interrelationship between more and more heterogeneous feature components. second, factor decomposition machineFMalgorithmFactor decomposition MachineFMThe algorithm can handle the following three types of problems:
- regression Problems (Regression)
- Two classification problems (Binary classification)
- Sort (Ranking)
here we mainly introduce the regression problem and the two classification problem. 1. Regression problems(Regression)in the regression problem, the direct use as the final prediction result. Using the minimum mean square error in regression problems(the least square error)as a standard for optimization, that is,
where the number of samples is represented. 2, two classification problems(Binary classification)with theLogisticregression is similar to passing a step function, such asSigmoidfunction, which will be mapped into different categories. Use in two classification questionslogit Lossas a standard for optimization, that is,
where the step function is representedSigmoid. The specific form is:
three, factor decomposition machineFMthe solution process of the algorithm1. Cross-term coefficientThe cross-entry is introduced on the basis of the basic linear regression model, as follows:
If this is directly preceded by the intersection of the cross- term coefficients in the case of sparse data, there is a large flaw, that is, in the observation sample does not appear in the interaction of the feature components, the corresponding parameters can not be estimated. for each feature component, an auxiliary vector is used to estimate the coefficient of the cross term, i.e.
make
the
this corresponds to the decomposition of a matrix. The qualification of the value, theFMhas a certain effect on the ability of expression.
2, the solution of the model
here, the main use of the formula to find the cross-term. The process is as follows:
3, based on the random gradient of the way to solve
for regression questions:
for two classification questions:
and
Iv. Experiments (solving two classification problems)1, the Code of the experiment:
#coding: Utf-8from __future__ import divisionfrom Math import expfrom numpy import *from random import normalvariate# Normal distribution FR Om datetime Import datetimetraindata = ' e://data//diabetes_train.txt ' testData = ' e://data//diabetes_test.txt ' Featurenum = 8def loaddataset (data): Datamat = [] Labelmat = [] fr = open (data) #打开文件 for line in Fr.re Adlines (): Currline = Line.strip (). Split () #lineArr = [1.0] Linearr = [] for i in Xran GE (featurenum): Linearr.append (Float (currline[i + 1)) Datamat.append (Linearr) labelmat.a Ppend (float (currline[0]) * 2-1) return Datamat, Labelmatdef sigmoid (INX): Return 1.0/(1 + exp (-inx)) def Stocgrad Ascent (Datamatrix, Classlabels, K, ITER): #dataMatrix用的是mat, Classlabels is the list m, n = shape (Datamatrix) Alpha = 0.01 #初始化参数 w = zeros ((n, 1)) #其中n是特征的个数 w_0 = 0. v = normalvariate (0, 0.2) * ones ((n, k)) for it on Xrange (ITER): Print it for x in Xrange (m): #随机优化, for each sample inter_1 = datamatrix[x] * v inter_2 = multiply (datamatrix[x], Datamatrix [x]) * Multiply (V, v) #multiply对应元素相乘 #完成交叉项 interaction = SUM (Multiply (inter_1, inter_1)-inter_2) /2. p = w_0 + datamatrix[x] * w + interaction# calculate the predicted output loss = Sigmoid (classlabels[x] * p[0, 0])-1 Print loss w_0 = W_0-alpha * loss * Classlabels[x] for i in Xrange (n): If datamatrix[x, I]! = 0:w[i, 0] = w[i, 0]-alpha * loss * classlabels[x] * datamatrix[x, I] for j in Xrange (k): V[i, j] = V[i, j]-alpha * loss * CLASSLABELS[X] * (data Matrix[x, i] * inter_1[0, J]-V[i, j] * datamatrix[x, i] * datamatrix[x, I]) return w_0, W, Vdef Getaccura Cy (Datamatrix, Classlabels, W_0, W, v): M, n = shape (datamatrix) Allitem = 0 Error = 0 result = [] forX in Xrange (m): Allitem + = 1 inter_1 = datamatrix[x] * v inter_2 = multiply (datamatrix[x], Datamatrix [x]) * Multiply (V, v) #multiply对应元素相乘 #完成交叉项 interaction = SUM (Multiply (inter_1, inter_1)-inter_2)/2. p = w_0 + datamatrix[x] * w + interaction# calculate predicted output pre = sigmoid (p[0, 0]) result.append ( PRE) if pre < 0.5 and classlabels[x] = = 1.0:error + 1 elif pre >= 0.5 and Classla BELS[X] = = -1.0:error + 1 else:continue print result return float (er ROR)/Allitem if __name__ = = ' __main__ ': datatrain, Labeltrain = Loaddataset (traindata) datatest, Labelte st = Loaddataset (testData) Date_starttrain = DateTime.Now () print "Start Training" W_0, W, v = stocgradascent (Mat (Datatrain ), Labeltrain, (+) print "Training accuracy:%f"% (1-getaccuracy (Mat (Datatrain), Labeltrain, W_0, W, v)) Date_endtrain = DateTime.Now () Print"Training time:%s"% (date_endtrain-date_starttrain) print "Start test" print "Test accuracy is:%f"% (1-getaccuracy (Mat (datatest), Labe Ltest, W_0, W, v))
2. Experimental results:
Five, some questionson traditional non-sparse datasets, sometimes the effect is not very good. In the experiment, I had a little bit of processing, that is, in solvingSigmoidfunction, a method with thresholds is used on some data sets:
def sigmoid (inx): #return 1.0/(1 + exp (-inx))
more friends are welcome to discuss this algorithm together.
Reference Articles1, Rendle, factorization machines.2. Factorization machines with LIBFM
Easy-to-learn machine learning algorithms-factorization Machines (factorization machine)