Predictive numeric data-regression (Regression)

Source: Internet
Author: User

=====================================================================

"Machine Learning Combat" series blog is Bo master read "machine learning Combat" This book's note also contains some other Python implementation of machine learning algorithmsThe algorithm is implemented using Python

GitHub Source Sync: Https://github.com/Thinkgamer/Machine-Learning-With-Python

=====================================================================


1: Finding the best fit curve with linear regressionThe goal of regression is to predict the target value of the numerical type. The most straightforward approach is to write a formula for the target value based on the loser.If you want to predict the power of your sister's boyfriend's car, this might be the calculation:
This is called the regression equation, where 0.0015 and 0.99 are called regression coefficients, and the process of finding these regression coefficients is regression
General Methods of Regression: (1) Collect data: Collect data in any way (2) Prepare the data: The regression requires numerical data, the nominal data will be converted to two value data (3) analysis data: The visualization of the plotted data of the two-dimensional diagram will help to understand and analyze the data, after the reduction method to obtain a new regression coefficient, The new fitting line can be compared on the graph (4) Training algorithm: the regression coefficient (5) test algorithm: The use of R2 or predictive value and data fit to analyze the effect of the model (6) using the algorithm: Using regression, you can predict a value at the time of the given input, which is the promotion of the classification method, Because you can predict continuity data, not just discrete category labels.
Assuming that the input data is stored in the matrix x, and the regression coefficients are stored in the vector W, the predicted results for a given data x will be given by adding the given data x and Y in case of the W? Error minimization: Finding the minimum error with the same cross to find W (error is the difference between true y and predicted y, using the simple accumulation of this error will make positive and negative errors cancel each other, all of us use squared error)

We can obtain W, called Least squares (OLS), by invoking the matrix method of the NumPy library.
Code implementation
#-*-coding:utf-8-*-"Created on May 14, 2016 @author:gamer Think" from numpy import *#==================== Finding the best fit curve with linear regression =========== #加载数据集def loaddataset (filename): Numfeat = len (open (filename). ReadLine (). Split ("\ T"))-1 dat AMat = []; Labelmat = [] fr = open (filename) for line in Fr.readlines (): Linearr = [] CurLine = Line.strip (). spli T ("\ T") for I in Range (numfeat): Linearr.append (float (curline[i))) Datamat.append (Linearr ) Labelmat.append (float (curline[-1))) return datamat,labelmat# calculates the best fit curve def standregress (Xarr,yarr): XM at = Mat (Xarr); Ymat = Mat (Yarr). T #.  T represents transpose Matrix XTx = xmat.t * Xmat if Linalg.det (xTx) ==0.0: #linalg. Det (xTx) calculates the value of the determinant print "This matrix is singular , cannot do inverse "return ws = XTX.I * (XMAT.T * ymat) return ws# test the upper function Xarr,yarr = Loaddataset (" Ex0.txt ") WS = Standregress (Xarr, Yarr) print "WS (Correlation Factor):", WS #ws is the regression factor # paint show Def show (): Import Matplotlib.pyplot as PLT Xmat = Mat (Xarr);     Ymat = Mat (yarr) Yhat = Xmat*ws Fig = plt.figure () #创建绘图对象 ax = fig.add_subplot (111) #111表示将画布划分为1行2列选择使用从上到下第一块 #scatter绘制散点图 Ax.scatter (Xmat[:,1].flatten (). A[0],ymat.t[:,0].flatten (). A[0]) #复制, sort xCopy =xmat.copy () xcopy.sort (0) yhat = xCopy * ws #plot画线 Ax.plot (xcopy[:,1],yhat) plt. Show () show () #利用numpy库提供的corrcoef来计算预测值和真实值得相关性yHat = Mat (xarr) * WS #yHat = Xmat * wsprint "dependency:", Corrcoef (Yhat.t,mat (YA RR) #==================== to find the best fit curve with linear regression ===========



Effect Show:

(The diagonal yhat in the correlation indicates that the match with himself is complete, and the match with Ymat is 0.98)

2: Local weighted linear regression
Weighted objective: To reduce the mean square error of the prediction and to reduce the weighted method of the less fitting phenomenon: local weighted linear regression (locally Weighted Liner REGRESSION,LWLR) In this algorithm, we give a certain weight to each point near the predicted point, Then we use the method of the previous section to calculate the minimum mean variance for normal regression at this time regression factor:
where W is a matrix used to assign weights to each data point LWLR uses the kernel to give higher weights to nearby points, the most commonly used kernel method is the Gaussian kernel, and its corresponding weights are as follows:
This constructs a weight matrix w with only a diagonal element, and the closer the point x is to X (i), the larger the W (i,i) will be, and the above formula contains a parameter k that needs to be specified by the user, which determines how much weight is given to nearby points, which is the only parameter to consider when using LWLR. Shows the relationship between the parameter K and the weight


Add the following code to the regression.py file
#================== local weighted linear regression ================def LWLR (testpoint,xarr,yarr,k=1.0): Xmat = Mat (Xarr); Ymat = Mat (Yarr).         T m = shape (Xmat) [0] weights = mat (eye (m))) #产生对角线矩阵 for J in Range (m): Diffmat = Testpoint-xmat[j,:]  #更新权重值, descending exponentially weights[j,j] = exp (Diffmat * diffmat.t/( -2.0*k**2)) xTx = xmat.t * (weights * xmat) if Linalg.det (xTx) = = 0.0:print "This matrix was singular,cannot do inverse" return ws = XTX.I * (XMAT.T *    (Weights * ymat)) return testpoint * wsdef lwlrtest (testarr,xarr,yarr,k=1.0): M = shape (Testarr) [0] yhat = zeros (m) for I in range ( m): Yhat[i] =LWLR (testarr[i],xarr,yarr,k) return Yhatxarr,yarr = Loaddataset (' ex0.txt ') print "k=1.0:", LWLR (xarr[ 0],xarr,yarr,1.0) print "k=0.001:", LWLR (xarr[0],xarr,yarr,0.001) print "k=0.003:", LWLR (xarr[0],xarr,yarr,0.003) # Paint Def SHOWLWLR (): Yhat = Lwlrtest (Xarr, Xarr, Yarr, 0.003) Xmat = Mat (xarr) srtind = Xmat[:,1].argsort (0) Xsor t = xmat[srtind][:,0,:] Import matplotlib.pyplot as Plt fig = plt.figure () #创建绘图对象 ax = fig.add_subplot (111) #111表示将画布划分为1行2列选择使用从上 To the next first block Ax.plot (Xsort[:,1],yhat[srtind]) #scatter绘制散点图 Ax.scatter (Xmat[:,1].flatten (). A[0],mat (Yarr). T[:,0].flatten (). a[0],s=2,c= ' Red ') Plt.show () SHOWLWLR ()
Operating results and different k worth image comparisons

the output diagram for k=0.003 is:



k=0.01 the corresponding output diagram:


output diagram for k=1.0:



You can get a good result when you can see k=0.01.


3: Reduction factor to "understand" data


If the coefficient is more characteristic than the sample point, in the calculation (x^t X) ^-1 error, in order to solve this problem, the statisticians introduced the concept of ridge regression, as well as the Lasso method, the method is very good, but the computational complexity, there is another way to reduce the method called Ridge forward stepwise regression, Can get the same effect as lasso, and it's easier to implement.(1): Ridge regressionRidge regression is to add a matrix x^t X so that the matrix is non-singular, and then can be inverse, where the matrix I is a m*m unit matrix, the elements on the diagonal is all 1, the other elements are all 0, but the user-defined values, which will be introduced later, the regression coefficient of the formula will become:
The reduction method can get rid of unimportant parameters, so it can better understand the data, in addition, compared with simple linear regression, the reduction method can achieve better prediction results, where the prediction error is still minimized: the data obtained after the first part of the test, the remaining as a training set for training data W. Add the following code to the regression.py:
#========================= Ridge return ================== #用于计算回归系数def ridgeregres (xmat,ymat,lam=0.2): xTx = xmat.t * XMat Deno m = xTx + eye (Shape (Xmat) [1]) * Lam if Linalg.det (denom) ==0.0:print "This matrix was singular, cannot do inverse "Return WS = Denom. I * (XMAT.T * Ymat) return ws# is used to test the Def ridgetest (Xarr,yarr) on a set of lambda: Xmat = Mat (Xarr); Ymat = Mat (Yarr). T Ymean = mean (ymat,0) #数据标准化 Ymat = Ymat-ymean Xmeans = mean (xmat,0) Xvar = var (xmat,0) Xmat = (Xmat  -Xmeans)/xvar numtestpts = Wmat = Zeros ((numtestpts, Shape (Xmat) [1])) for I in Range (numtestpts): WS = Ridgeregres (Xmat, Ymat, exp (i-10)) Wmat[i,:]=ws.    T return Wmatabx,aby = Loaddataset (' abalone.txt ') ridgeweights = Ridgetest (abx,aby) # print Ridgeweightsdef Showridge (): Import Matplotlib.pyplot as Plt fig = plt.figure () ax = Fig.add_subplot (111) ax.plot (ridgeweights) plt.show () Showridge () #=================== Ridge return =============

Operation Result:



Description: Lambda is very small, the coefficients are the same as normal regression, and when the lambda is very large, all regression coefficients are reduced to 0, so that the best values can be found somewhere in the middle


(2): LassoIt is not difficult to prove that, when the following constraints are added, the normal least squares regression will get the same formula as the ridge regression: The sum of the squares of all regression coefficients is not greater than the lambda, using the normal least squares regression in When two or more characteristics are relevant, a large positive coefficient and a large negative coefficient may be obtained. It is precisely because of these constraints that the use of Ridge regression can avoid this problem similar to the ridge regression, and another reduction method lasso the regression coefficients, the corresponding constraints are as follows: the only point The difference is that this constraint uses the absolute value to replace the sum of squares, although the constraints are only slightly changed, the results are very different, when the lambda is small enough, some coefficients will be forced to reduce to 0, and this feature can help us better understand the data, these two constraints, although similar, But the subtle changes greatly increase the complexity of the calculation, and here is a much simpler way to get the results, which is called forward stepwise regression
(3): Forward stepwise regression forward stepwise regression algorithm can get the same effect as lasso, but more simple, it belongs to a greedy algorithm, that is, each step as far as possible to reduce the error, at the beginning, all the weights are set to 1,                Then each step of the decision is to increase or decrease a weight to a small value of the algorithm's pseudo-code is as follows: Data normalization, so that its distribution satisfies 0 mean and unit variance in each round of iteration: Set the current minimum error lowesterror is positive infinity for each feature: Increase or Decrease: change a coefficient to get a new W to calculate the error under the new W if the error is less than the current minimum error Lowererror: Set Wbest equals current W sets W to new Wbest
The code is implemented as follows
#=================== forward stepwise regression ============ #计算平方误差def rsserror (Yarr,yhatarr): #yArr and Yhatarr both need to be arrays retur N ((Yarr-yhatarr) **2). SUM () #数据标准化处理def regularize (Xmat): #regularize by columns Inmat = xmat.copy () Inmeans = mean (in mat,0) #calc mean then subtract it off InVar = var (inmat,0) #calc variance of Xi then divide by it Inmat = (i Nmat-inmeans)/invar return inmatdef stagewise (xarr,yarr,eps=0.01,numit=100): Xmat = Mat (Xarr); Ymat=mat (Yarr). T Ymean = mean (ymat,0) Ymat = Ymat-ymean #can also regularize Ys but would get smaller coef Xmat = Regulariz E (Xmat) m,n=shape (xmat) Returnmat = Zeros ((numit,n)) #testing code Remove WS = Zeros ((n,1)); Wstest = Ws.copy (); Wsmax = Ws.copy () for I in Range (Numit): #could The change of this to while Loop #print ws.         T lowesterror = inf;                For j in Range (N): For sign in [ -1,1]: Wstest = Ws.copy () wstest[j] + = Eps*sign Ytest = Xmat*wstest Rsse = Rsserror (ymat.a,ytest.a) if Rsse < Lowesterror:lo Westerror = Rsse Wsmax = wstest ws = Wsmax.copy () returnmat[i,:]=ws. T return Returnmatxarr,yarr = Loaddataset (' abalone.txt ') print stagewise (Xarr, Yarr, 0.01, 200)

Operation Result:

It is noteworthy in the above results that both WL and W6 are 0, which indicates that they do not have any effect on the target value, which means thatSome of the features are probably not needed. In addition, when the parameter EPS is set to 0.01, the coefficients are saturated over time.and bounce back and forth between certain values because the stride size is too large. Here you will see that the first weights in 0.04 and 0.05shocks to and fro between.
to set a smaller step size:Print Stagewise (Xarr, Yarr, 0.001, 200)

The results are then compared with the least squares method, the results of which can be obtained by the following code
Xmat = Mat (xarr) Ymat = Mat (Yarr). Txmat = Regularize (xmat) YM = mean (ymat,0) Ymat = Ymat-ymweights = Standregress (Xmat, ymat.t) print weights. T



It can be seen that after 5,000 iterations, the stepwise linear regression algorithm is similar to the conventional least squares method, using 0.005 the epsilon value and after 1000 iterations of the result see

The real benefit of the stepwise linear regression algorithm is not that it can draw a nice diagram like Figure 8-7 ' the main advantage is that it helps people understand existing models and make improvements. When a model is built, it is possible to run the algorithm to identify important features so that the collection of unimportant features may be stopped in a timely manner. Finally, if used for testing, the algorithm can build a model after every 100 iterations , compare these models with a method similar to 10 percent cross-validation, and ultimately choose the model that minimizes the error. when applying a reduction method, such as stepwise linear regression or ridge regression, the model also increases the deviation (8), while reducing the variance of the model.

Predictive numeric data-regression (Regression)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.