Machine Learning-Linear Regression

Source: Internet
Author: User

Linear regreesion

Linear regression is supervised learning. Therefore, the method and supervised learning should be the same. First, a training set is given and a linear function is learned based on the training set, then, test whether the function is trained (that is, whether the function is sufficient to fit the training set data), and select the best function (minimum cost function.

Purpose of the cost function: Evaluate the hypothetical function. The smaller the cost function, the better the fit of the training data;

1. Least Squares:

What is the least square method is actually very simple. We have a lot of fixed points. At this time, we need to find a line to fit it. Then I first assume the equation of this line, and then substitute the data points into the hypothetical equation to get the observed values, evaluate the parameter that minimizes the sum of squares between the actual value and the observed value. The variable can be obtained at the beginning of the lead.

For example, to determine the wear rate of a tool, that is to say, with the more times the tool is used, the thickness of the tool itself will gradually decrease, so the tool thickness and use time will be linearly related, assume that f (t) = at + B (t represents the time, f (t) represents the thickness of the tool itself), A, B is the constant to be determined, so how can a and B be determined? The ideal situation is to select a and B to make the value of the straight line y = at + B fully match with the thickness of the tool measured in reality, but this is actually impossible, because the error is always inevitable. Due to the error, the theoretical value is deviated from the actual value. To minimize the deviation, the sum of squares of the deviation is used to determine the coefficients A and B, so as to determine the functional relationship between the two variables F (t) = at + B. This method determines the constants A and B by the sum of the squared values of the deviations is the least square method.

2. Linear Regression:

In mathematics, a regression is a given point set that can be fitted with a curve. If the curve is a straight line, it is called linear regression. If the curve is a quadratic curve, it is called quadratic regression.

It is a special linear model to study the relationship between several variables, especially when the dependent variable and the independent variable are in a linear relationship. The simplest case is an independent variable and a dependent variable, which is basically wired. This is called a linear regression, that is, the model is y = a + bx + ε. Here X is an independent variable, Y is the dependent variable, and ε is a random error. Generally, random errors follow the normal distribution where the mean value is 0.

Therefore, we can think that linear regression is to give a series of points to fit the curve h (x) = θ + θ 1x (both linear and nonlinear actually mean one thing, are looking for appropriate parameters to meet the rules of existing data. We can use the least square method). Of course, the same is true for multiple dimensions, that is, a little more parameters.

3. analysis process for Linear Regression Problems:

A function model is provided. This function model has many unknown parameters, and we can substitute a lot of observation data. However, this is difficult to solve the equation after the substitution, so we use the approximate solution, converts it to solving the problem of minimizing errors. After listing the error items, use the gradient descent or Newton Method to Solve the minimum value and determine the unknown parameters.

(1) Give assumptions (hypotheses: H and parameters: θ)

(2) Learning θ Based on the given training set.

We provide a cost function, which is actually the least square method:

(3) Find A θ to minimize J (θ.

We need J (minimum), but we don't know the parameters, and we can't figure out the minimum one by one. Therefore, gradient descent is used to find θ to minimize J (θ.

4. Gradient Descent)

The gradient descent algorithm is a method for finding the local optimal solution. For f (X), The gradient at point A is F (X) The fastest-growing direction. The opposite direction is the fastest-dropping direction of the point. For more information, see Wikipedia.

Principle: compare a function to a mountain. We stand on a hillside and look around, from which direction we take a small step down, which is the fastest way to decline;

Note: When the size difference between variables is very large, you should first process them so that their values are in the same range, which is more accurate.

1) assign a value to θ first. This value can be random or make θ a completely zero vector.

2) Change the θ value so that J (θ) decreases in the gradient descent direction.

 

Describe the gradient reduction process. Evaluate the biased J for our Function J (θ:

Repeat until convergence :{

The following is the update process, that is, θ I will decrease toward the smallest gradient. θ I indicates the value before the update.-the latter part indicates the quantity reduced in the gradient direction. α indicates the step size, that is, the change in the direction of each gradient reduction.

}

5. Local weighted linear regression:

For linear regression, there is still Newton's method to use the least square method. The least square method can only be called the optimum, but the confidence level is not known. Sometimes linear regression may cause the simulated curve to be "unfitted ".

We use local weighted linear regression to simulate a better curve. Just add a weight.

Re-construct a new cost function J (X)

.

From the representation of the weight function, we can see that the smaller the value of the point far from Requirement X, the smaller the value at the end is basically 0. That is to say, only the points around the request X work. The method is the same as above.

 

6. Logistic/sigmoid regreesion mode:

You can use a specific function to convert a linear regression problem into a classification problem, that is, you can use this function to make y within the range of 0-1. This function can be logistic/sigmoid.

In fact, linear regression is used for classification, and the original value range is mapped to 0 to 1.

 

Reference: http://blog.csdn.net/xiazdong/article/details/7950084

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.