Systematic discussion on linear regression problem in supervised learning

Source: Internet
Author: User

Objective

This paper introduces a systematic introduction to the regression part of learning in machine learning, and systematically explains how to use regression theory to predict the continuous value of a classification.

Obviously, compared with supervised learning, it has distinct characteristics: the output is a continuous value, not just the classification result of the nominal type.

Basic linear regression solution-least squares method

"Give a bunch of scatter points and find out the regression equation." "There have been many areas of this problem, and the most classic common practice is usually:

1. The sum of the distances between the scattered points to the regression line is expressed in a formula:

  

M is the number of scatter points, Yi is the scatter value, Xi is the scatter coordinate, and w is the regression coefficient vector.

2. The inverse of the equation with the vector w derivative, to find the derivative of the value of 0 o'clock regression coefficient (the specific derivation process involves the derivation of the relevant rules of the vector, slightly):

  

This method is called least squares.

Implementation of least squares method

The following applet reads the scatter from the text and then fits the regression line and displays it using Matplotlib (note: For clarity, feature 0 is not shown):

1 #!/usr/bin/env python2 #-*-coding:utf-8-*-3 4 " "5 Created on 2015-01-046 7 @author: Fangmeng8 " "9 Ten  fromNumPyImport* One  A defLoaddataset (fileName): -     'Load test Data' -      theNumfeat = Len (open (FileName). ReadLine (). Split ('\ t'))-1 -Datamat = []; Labelmat = [] -FR =Open (FileName) -      forLineinchfr.readlines (): +Linearr =[] -CurLine = Line.strip (). Split ('\ t') +          forIinchRange (numfeat): A linearr.append (float (curline[i])) at datamat.append (Linearr) -Labelmat.append (Float (curline[-1])) -     returnDatamat,labelmat -  - #=================================== - #Input: in #Xarr: Characteristic coordinate matrix - #Yarr: Eigenvalue matrix to #Output: + #w: regression coefficient vector - #=================================== the defstandregres (Xarr,yarr): *     'using least squares method to find fitting coefficients' $     Panax NotoginsengXmat =Mat (Xarr); -Ymat =Mat (Yarr). T theXTx = xmat.t*Xmat +     ifLinalg.det (xTx) = = 0.0: A         Print "The matrix cannot be reversed" the         return +WS = XTX.I * (xmat.t*Ymat) -     returnws $  $ defTest (): -     'Show Results' -      the     #The regression coefficients are obtained by least squares and the corresponding eigenvalues of each feature point are predicted. -Xarr, Yarr = Loaddataset ('/home/fangmeng/ex0.txt')WuyiWS =standregres (Xarr, Yarr) theXmat =Mat (Xarr) -Ymat =Mat (Yarr) WuYhat = Xmat *ws -      About     ImportMatplotlib.pyplot as Plt $      -     #draw all sample points -Fig =plt.figure () -Ax = Fig.add_subplot (111) AAx.scatter (xmat[:,1].flatten (). A[0], ymat.t[:, 0].flatten (). A[0]) +      the     #Draw the regression line -XCopy =xmat.copy () $ xcopy.sort (0) theYhat = xcopy*ws theAx.plot (xcopy[:, 1], Yhat) the plt.show () the      - if __name__=='__main__': inTest ()

Test results:

  

Observe the correlation coefficient between prediction and truth:

print corrcoef (yhat.t, Ymat)

Test results:

  

The correlation coefficient of 0.98+, the effect of fitting is good.

Local weighted linear regression

Basic linear regression often encounters some problems.

For example, due to the linear regression itself caused by the problem of lack of fit. In the case of one of the most basic characteristics, if the scatter plot itself presents a non-linear contour, it is forced to synthesize a straight line:

  

Obviously, the two ends of the fitting is very unscientific, deviated very far.

To solve this problem, local weighted linear regression emerges. It is able to get a more scientific fitting segment like this:

  

The so-called local, is the maximum extent to consider the point near the point of prediction, the so-called weighting, is the closer to the predicted point, its reference coefficient (weight) is greater.

Therefore, a diagonal matrix w that is used to measure weights is added to the original least squares. Thus, the formula for the regression coefficients becomes:

  

The weight matrix W is also called the "kernel", and the typical Gaussian kernel is calculated as follows:

  

The following is a regression coefficient solution function using local weighted linear regression:

1 #===================================2 #Input:3 #testpoint: Test pilot4 #Xarr: Characteristic coordinate matrix5 #Yarr: Eigenvalue matrix6 #K: attenuation coefficient of Gaussian kernel weight7 #Output:8 #testpoint * WS: Test results for pilot set9 #===================================Ten defLWLR (testpoint,xarr,yarr,k=1.0): One     'locally weighted linear regression for a specified point' A      -Xmat =Mat (Xarr); -Ymat =Mat (Yarr). T them =shape (Xmat) [0] -      -     #Gaussian kernel calculation using vector method -weights =Mat (Eye ((M))) +      forJinchRange (m): -Diffmat = TestPoint-xmat[j,:] +WEIGHTS[J,J] = exp (diffmat*diffmat.t/( -2.0*k**2)) A          atXTx = xmat.t * (weights *Xmat) -     ifLinalg.det (xTx) = = 0.0: -         Print "error: Coefficient matrix cannot be reversed" -         return -      -WS = XTX.I * (XMAT.T * (Weights *Ymat)) in     returnTestPoint *ws -  to #=================================== + #Input: - #Testarr: Test pilot set the #Xarr: Characteristic coordinate matrix * #Yarr: Eigenvalue matrix $ #Output:Panax Notoginseng #Yhat: The result set corresponding to the pilot set - #=================================== the defLwlrtest (testarr,xarr,yarr,k=1.0): +     'local weighted regression for a specified set of points' A      them =shape (Testarr) [0] +Yhat =zeros (m) -      $     #Find out all the test point sets $      forIinchRange (m): -Yhat[i] =LWLR (testarr[i],xarr,yarr,k) -     returnYhat

The following code shows the regression results:

1 defTest ():2     'Show Results'3     4     #Loading Data5Xarr, Yarr = Loaddataset ('/home/fangmeng/ex0.txt')6 7     #gets the predicted value of local weighted regression for all sample points8Yhat = Lwlrtest (Xarr, Xarr, Yarr, 0.01)9     TenXmat =Mat (Xarr) OneSrtind = xmat[:,1].argsort (0) AXsort =xmat[srtind][:,0,:] -     #print xmat[srtind][:,0,:] -      the     #Show all sample points and local weighted fit segments -     ImportMatplotlib.pyplot as Plt -Fig =plt.figure () -Ax = Fig.add_subplot (111) +Ax.plot (xsort[:,1], Yhat[srtind]) -Ax.scatter (Xmat[:,1].flatten (). A[0], Mat (Yarr). T.flatten (). A[0], s=2, c='Red') +Plt.show ()

When K (attenuation factor) = 1 o'clock, the test result:

  

K (attenuation factor) = 0.003, test result:

  

K (attenuation factor) = 0.01, test result:

  

Observation can be found that k = 1 is the same as the basic linear regression-less than fitting, while k = 0.003 is over-fitted; k = 0.01 Just, is the best choice.

Ridge return

If you encounter such a situation: the number of scattered points is less than the number of features.

What's wrong with this situation----(XTX)-1 will inevitably solve the failure! The solution could be to use ridge regression technology.

The so-called Ridge regression, is in the regression coefficient solution in the xTx after the addition of ΛI so that the inverse part can be solved smoothly, the modified solution is as follows:

  

Where I is the unit diagonal matrix that looks a bit like a ridge. This is why this return is called Ridge return, haha!

Specific implementation code This article is not specifically given, but there are two places to pay special attention to:

1. All data needs to be standardized

2. According to the different λ to get the regression coefficients of different groups, we also need to take the weight of different groups to merit. The more commonly used methods of lasso (and the difference between ridge regression are the constraints of W and λ).

Formulation of specific programmes

When it comes to such a variety of regression schemes, what is the best way to use it?

First, you have to choose the right solution based on the characteristics of the problem.

In the same case, "the tradeoff between deviation and variance" is a good choice experience.

  

The solution to the red dot location is the best solution.

Summary

Regression and classification have different algorithms for different areas of the problem. The key is to grasp its overall thinking, according to the need to choose.

However, this article is all about linear regression. Linear regression always has its drawbacks, because many practical problems are inherently non-linear.

Even if the local weighted linear regression is described in this article, it is very painful to think that every test should be fitted once.

Therefore, in the next article, we will introduce an advanced nonlinear regression method-tree regression in detail.

 

Systematic discussion on linear regression problem in supervised learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.