"Linear Regression" heights Field machine learning Cornerstone

Last Update:2015-06-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This section begins with the basic linear regression algorithm.

(1) The hypothetical space of Linear regression becomes the real field

(2) The goal of Linear regression is to find the dividing line (super plane) that makes the residuals smaller

Below the core:the Linear regression is optimized for minimize Ein (W)

For ease of expression, the first thing to do is to convert this σ symbol into a matrix form, as follows:

One or two: the sum of the squares of multiple items can be converted to the square of a vector

One or both: cross each column vector x to form a new X matrix.

Finally, the expression is converted to the final minimize: a continuous, convex set of points with a gradient of 0. The process for finding the ladder 0 is as follows:

How to find an expression with a differential equal to 0.

The first one is designed to some matrix differential formula, find the following blog (http://blog.sciencenet.cn/blog-849193-653656.html) to review a bit.

Which involves the differential of two functions (quadratic products): note, because here A=x ' X is a symmetric array, so a==a ', so got 2AW

Also involves the differentiation of one function (Linear products)

How to find out the final result .

There are two kinds of situations here.

(1) One is X ' x is invertible, it is obvious that the inverse matrix can be directly

(2) An X ' x is Singluar, then a generalized inverse matrix should be obtained

What is generalized inverse? I learned some of it before using Elm. Read the Courseware and review:

At the same time, Lin also recommended the actual use of well-implemented routine generalized inverse is good.

Therefore, the generalized inverse matrix is calculated, and even if OK.

The matrix fitting y^ the obtained W -band Back is XX (generalized inverse)y

Here again, a Hat Matrix concept is extracted:

The course focuses on the geometric meaning of what this hat matrix H is.

At first understanding this gemoetric meaning has not been understood well, until after reading this blog (http://beader.me/mlnotebook/section3/linear-regression.html), some understand.

Y hat, presumably, is a projection of Y to span of x, in this way, | | y-yhat| | ² minimum (perpendicular distance shortest).

In this way, the linear change in i-h is to turn y into y-yhat.

In another way, Y can be thought of as an ideal f (x) +noise, so, in fact, i-h This linear transformation is applied to the Nosie and becomes y-yhat.

What's the point of looking at it? mainly to get the relationship between Ein and noise.

Linear regression equivalent to the noise on the original basis to reduce the (1+d)/n so much.

But Eout is bigger than the original noise. To summarize, see:

When n is large, the Ein and Eout average error is 2 (d+1)/n level noise.

VC is bound live error probability, but the above learning curve said is the average situation, but the two want to express the meaning is similar, is the greater N, Ein and Eout closer .

Finally, a question Linear the relationship between classification vs. Linear Regression is also mentioned.

I understand that there are two points:

(1) Intuitively, regression seems to be able to replace the classification (because after all, just a few steps to count sign), and classification is Np-hard (slow), but linear Regression is analytic solution (FAST).

(2) for the problem of classification, Y only 1 and-1, the two cases of the error curve drawn out, found in fact classification line is always in the regression line below.

Therefore, the conclusion is that Linear regression can be used as a slightly looser upper bound on binary classification problems.

The trade-off object here is the efficiency of the algorithm and the tightness of the error bound .

Here, Lin presents a practical approach: in practice, you can even do a regression to get an initialization parameter value, and then apply algorithms such as pla/pocket to reduce the error.

"Linear Regression" heights Field machine learning Cornerstone

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Linear Regression" heights Field machine learning Cornerstone

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Linear Regression" heights Field machine learning Cornerstone

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support