Linear regression Python sample

Source: Internet
Author: User

Linear regression
Pros: Results are easy to understand and computationally uncomplicated
Cons: Poor fitting of non-linear data
Applicable data type: Numeric and nominal type data
Horse=0.0015*annualsalary-0.99*hourslisteningtopulicradio
This is called the regression equation, where 0.0015 and 0.99 are called regression coefficients,
The process of finding these regression coefficients is regression. Once you have these regression coefficients, it's easy to make predictions given the inputs.
The specific approach is to multiply the regression coefficients by the input values, and then add all the results together to get the predicted value
General methods of regression
(1) Collect data: Collect data by any means
(2) Prepare the data: The regression requires numerical data, the nominal data will be converted to two value data (3) analysis data: The visualization of the two-dimensional graph of the data will help to understand and analyze the data, after the introduction of the reduction of the new regression coefficient of the ball,
The new fitting line can be plotted on a graph as a comparison
(4) Training algorithm: Finding regression coefficients
(5) test algorithm: Apply R2 or predicted value and data to fit, to analyze the effect of the model
(6) Using the algorithm: Using regression, you can predict a value at the given input, which is the promotion of the classification method,
Because it predicts continuous data, not just discrete category labels.
How should we find the regression equation from a large pile of data? jiading input data is stored in the hold X, and the regression coefficients are stored in the vector W, then for
Given the data x1, the predicted results will be given through Y1=x1^t *w. The question now is, how do you find W when you have x in your hand and a corresponding Y value?
A common method is to find the W that minimizes the error. The error here is to predict the difference between the Y value and the true Y value, using the simple summation of the error
will make the positive and negative values offset each other, so we use squared error

1  fromNumPyImport*2 3 defLoaddataset (FileName):#General function to parse tab-delimited floats4Numfeat = Len (open (FileName). ReadLine (). Split ('\ t'))-1#get number of fields5Datamat = []; Labelmat = []6FR =Open (FileName)7      forLineinchfr.readlines ():8Linearr =[]9CurLine = Line.strip (). Split ('\ t')Ten          forIinchRange (numfeat): One linearr.append (float (curline[i])) A datamat.append (Linearr) -Labelmat.append (Float (curline[-1])) -     returnDatamat,labelmat the  - defstandregres (Xarr,yarr): -Xmat = Mat (Xarr); Ymat =Mat (Yarr). T -XTx = xmat.t*Xmat +     ifLinalg.det (xTx) = = 0.0: -         Print("This matrix is singular, cannot do inverse") +         return AWS = XTX.I * (xmat.t*Ymat) at     returnWs

One problem with linear regression is the possibility of an under-fitting phenomenon, because it asks for unbiased estimation with minimum mean square error.
It is obvious that if the model is not fit, it will not be able to achieve better prediction results. So some methods allow for the introduction of some deviations in the estimation,
Thus reducing the mean square error of the forecast.
One of these methods is local weighted linear regression (LWLR). In this algorithm, we give a certain weight to each point near the predicted point;

1 defLWLR (testpoint,xarr,yarr,k=1.0):2Xmat = Mat (Xarr); Ymat =Mat (Yarr). T3m =shape (Xmat) [0]4weights =Mat (Eye ((M)))5      forJinchRange (m):#next 2 lines create weights matrix6Diffmat = Testpoint-xmat[j,:]#7WEIGHTS[J,J] = exp (diffmat*diffmat.t/( -2.0*k**2))8XTx = xmat.t * (weights *Xmat)9     ifLinalg.det (xTx) = = 0.0:Ten         Print("This matrix is singular, cannot do inverse") One         return AWS = XTX.I * (XMAT.T * (Weights *Ymat)) -     returnTestPoint * ws

What if the data is more characteristic than the sample point? Can you use linear regression and previous methods to make predictions?
The answer is no, it is no longer possible to use the method described earlier, because errors occur when calculating (x^t*x) ^-1
If the feature is more than the sample point (N>m), that is, the matrix X of the input data is not the full rank matrix, and the non-full rank matrix is inverse
, in order to solve this problem, experts introduced the concept of ridge regression. In simple terms, the ridge regression is in the matrix
A λi is added to the x^t*x to make the matrix non-singular, and then the x^t*x+λi can be reversed. Where I is the unit matrix, λ is the user-determined

A numeric value of righteousness.

Ridge regression is one of the reduction methods, which is equivalent to limiting the size of the regression coefficients. Another good method of reduction is lasso. Lasso is difficult to solve, but it can be calculated by using a stepwise linear regression method to obtain approximate results.

The reduction method can also be seen as adding deviations to a model while reducing variance. The variance analysis tradeoff is an important concept that can help us understand the existing scale and make improvements to get a better model

Linear regression Python sample

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.