Machine learning python combat----linear regression

Source: Internet
Author: User

I. Outline

Normal equation method for linear regression

Local weighted linear regression

Ii. details of the contents

  1. Normal equation solution of linear regression

Linear regression is the prediction of a continuous type of data. The example of linear regression is discussed here, and the nonlinear regression is not discussed first. This part of the content we use is the normal equation solution, the theoretical content has been explained before, the normal equation is θ= (XT X) -1 XT Y. It is important to note that there is a need for XT X is the inverse matrix, so this equation only applies when the inverse matrix exists, so it needs to be judged in the code.

 fromNumPyImport*ImportMatplotlib.pyplot as Pltdefloaddataset (filename): Numfeat= Len (open (filename). ReadLine (). Split ('\ t'))-1Datamat= [];labelsvec =[] File=open (filename) forLineinchfile.readlines (): Linearr=[] CurLine= Line.strip (). Split ('\ t')         forIinchRange (numfeat): Linearr.append (float (curline[i)) datamat.append (Linearr) labelsvec.append (f Loat (curline[-1]))    returnDatamat,labelsvecdefstandregression (Xarr,yarr): Xmat= Mat (Xarr); Ymat =Mat (Yarr) xTx= xmat.t *XmatifLinalg.det (xTx) ==0.0:        Print('This matrix was singular,cannot do inverse\n')        returnSigma= XTX.I * (XMAT.T *ymat.t)returnSigma

The Loaddataset () function divides the text data into special solicitation and labeling. Standregression () is to use the normal equation to find the regression coefficient sigma, of course, before using the normal equation need to determine whether there is a inverse matrix. This solution is very simple, but its shortcomings I also said in the previous part of the theory. Let's take a look at the results of the fit and use the plotline () function to paint. Note that the incoming parameters of this function xmay and ymat need to be in matrix form

def plotline (xmat,ymat,sigma):     = Plt.subplot (111)    Ax.scatter (xmat[:,1].flatten (). A[0],ymat.t[:,0].flatten (). A[0])    = xmat.copy ()    xcopy.sort (0)    = xcopy*sigma    Ax.plot (xcopy[:,1  ],yhat)    plt.show ()    

We get the fit line, which looks a little less fit. If you use a different dataset, the resulting fitting line is the same, and that's what we don't want.

Therefore, the method is improved and the regression coefficients are locally weighted. The method here is called local weighted linear regression (LWLR)

  2. Local weighted linear regression

In this algorithm, we give a certain weight to each store near the predicted point, and then carry on the normal linear regression on the basis of the minimum mean variance. Its normal equation becomes θ= (XTX) -1xtwy. Here the W is weighted. LWLR uses "cores" to give higher weights to nearby points, the most common being the Gaussian nucleus, which weighs. This constructs a weight matrix with only diagonal elements, and the closer the point x is to the X (i), the greater the weight.

defLWLR (Testpoint,xarr,yarr,k = 1.0): Xmat= Mat (Xarr); Ymat =Mat (Yarr). T m=shape (Xmat) [0] weights=Mat (Eye (m)) forIinchRange (m): Diffmat= TestPoint-xmat[i,:] weights[i,i]= exp (Diffmat * diffmat.t/( -2.0*k**2)) Xtwx= xmat.t * (weights*Xmat)ifLinalg.det (XTWX) ==0.0:        Print('This matrix was singular,cannot do inverse\n')        returnSigma= XTWX.I * (XMAT.T * (Weights *Ymat)) returnTestPoint *SigmadefLwlrtest (Testarr,xarr,yarr,k = 1.0): M=shape (Testarr) [0] Yhat=zeros (m) forIinchRange (m): Yhat[i]=LWLR (testarr[i],xarr,yarr,k)returnYhat

The LWLR () function is the code for locally weighted linear regression, and the function of the lwlrtest () function is to make the LWLR () function traverse the entire data set. We also need to draw a picture to see how the results fit.

 def  PlotLine1 (testarr,xarr,yarr,k = 1.0 = Mat (Xarr) ymat  = Mat (Yarr) Yhat = Lwlrtest (testarr,xarr,yarr,k) srtind  = xmat[:,1 ].argsort (0) xsort  = xmat[srtind][:,0,:] Ax = Plt.subplot (111 1].flatten (). A[0],ymat.t[:,0].flatten (). A[0],s = 2,c =  " red   " ) Ax.plot (xsort[:,  1],yhat[srtind]) plt.show () 
        

When k=1.0 k=0.01 k=0.003

 

K=1.0 is the front of the under-fitting state, and k=0.003 is over-fitted state, so when the k=0.01 is a better return.

Datasets and code: HTTP://PAN.BAIDU.COM/S/1I5AAYXN

Machine learning python combat----linear regression

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.