I. Outline
Normal equation method for linear regression
Local weighted linear regression
Ii. details of the contents
1. Normal equation solution of linear regression
Linear regression is the prediction of a continuous type of data. The example of linear regression is discussed here, and the nonlinear regression is not discussed first. This part of the content we use is the normal equation solution, the theoretical content has been explained before, the normal equation is θ= (XT X) -1 XT Y. It is important to note that there is a need for XT X is the inverse matrix, so this equation only applies when the inverse matrix exists, so it needs to be judged in the code.
fromNumPyImport*ImportMatplotlib.pyplot as Pltdefloaddataset (filename): Numfeat= Len (open (filename). ReadLine (). Split ('\ t'))-1Datamat= [];labelsvec =[] File=open (filename) forLineinchfile.readlines (): Linearr=[] CurLine= Line.strip (). Split ('\ t') forIinchRange (numfeat): Linearr.append (float (curline[i)) datamat.append (Linearr) labelsvec.append (f Loat (curline[-1])) returnDatamat,labelsvecdefstandregression (Xarr,yarr): Xmat= Mat (Xarr); Ymat =Mat (Yarr) xTx= xmat.t *XmatifLinalg.det (xTx) ==0.0: Print('This matrix was singular,cannot do inverse\n') returnSigma= XTX.I * (XMAT.T *ymat.t)returnSigma
The Loaddataset () function divides the text data into special solicitation and labeling. Standregression () is to use the normal equation to find the regression coefficient sigma, of course, before using the normal equation need to determine whether there is a inverse matrix. This solution is very simple, but its shortcomings I also said in the previous part of the theory. Let's take a look at the results of the fit and use the plotline () function to paint. Note that the incoming parameters of this function xmay and ymat need to be in matrix form
def plotline (xmat,ymat,sigma): = Plt.subplot (111) Ax.scatter (xmat[:,1].flatten (). A[0],ymat.t[:,0].flatten (). A[0]) = xmat.copy () xcopy.sort (0) = xcopy*sigma Ax.plot (xcopy[:,1 ],yhat) plt.show ()
We get the fit line, which looks a little less fit. If you use a different dataset, the resulting fitting line is the same, and that's what we don't want.
Therefore, the method is improved and the regression coefficients are locally weighted. The method here is called local weighted linear regression (LWLR)
2. Local weighted linear regression
In this algorithm, we give a certain weight to each store near the predicted point, and then carry on the normal linear regression on the basis of the minimum mean variance. Its normal equation becomes θ= (XTX) -1xtwy. Here the W is weighted. LWLR uses "cores" to give higher weights to nearby points, the most common being the Gaussian nucleus, which weighs. This constructs a weight matrix with only diagonal elements, and the closer the point x is to the X (i), the greater the weight.
defLWLR (Testpoint,xarr,yarr,k = 1.0): Xmat= Mat (Xarr); Ymat =Mat (Yarr). T m=shape (Xmat) [0] weights=Mat (Eye (m)) forIinchRange (m): Diffmat= TestPoint-xmat[i,:] weights[i,i]= exp (Diffmat * diffmat.t/( -2.0*k**2)) Xtwx= xmat.t * (weights*Xmat)ifLinalg.det (XTWX) ==0.0: Print('This matrix was singular,cannot do inverse\n') returnSigma= XTWX.I * (XMAT.T * (Weights *Ymat)) returnTestPoint *SigmadefLwlrtest (Testarr,xarr,yarr,k = 1.0): M=shape (Testarr) [0] Yhat=zeros (m) forIinchRange (m): Yhat[i]=LWLR (testarr[i],xarr,yarr,k)returnYhat
The LWLR () function is the code for locally weighted linear regression, and the function of the lwlrtest () function is to make the LWLR () function traverse the entire data set. We also need to draw a picture to see how the results fit.
def PlotLine1 (testarr,xarr,yarr,k = 1.0 = Mat (Xarr) ymat = Mat (Yarr) Yhat = Lwlrtest (testarr,xarr,yarr,k) srtind = xmat[:,1 ].argsort (0) xsort = xmat[srtind][:,0,:] Ax = Plt.subplot (111 1].flatten (). A[0],ymat.t[:,0].flatten (). A[0],s = 2,c = " red " ) Ax.plot (xsort[:, 1],yhat[srtind]) plt.show ()
When k=1.0 k=0.01 k=0.003
K=1.0 is the front of the under-fitting state, and k=0.003 is over-fitted state, so when the k=0.01 is a better return.
Datasets and code: HTTP://PAN.BAIDU.COM/S/1I5AAYXN
Machine learning python combat----linear regression