Notes of machine learning (Andrew Ng), Week, Linear Regression

Source: Internet
Author: User

① assumed function (hypothesis functions)

Given some sample data (training set), a learning algorithm (learning algorithm) is used to train the sample data, and a model or hypothetical function is obtained.

When the result of new data needs to be predicted, the new data is used as the input of the assumption function, assuming the function calculates the result, and the result is the predicted value.

Suppose the representation of a function is generally as follows: θ is called the parameter of the model (or: weight weights), and x is the input variable (inputs variables or feature variables)

As can be seen, assuming that the function h (x) is a function of x, as long as θ is determined, the hypothetical function (θ can also be regarded as a vector) is obtained. Then for the new input sample x, you can predict the result y of the sample.

The above assumes that the function is summed from 0 to N, that is, for each input sample x, it is treated as a vector, and each x has a n+1 features. For example, the estimated price, the input sample x (the size of the house, the city of the house, the number of toilets, the number of balconies ...). A series of features)

About classification problems and regression problems : Assume that the output of the function y (predicted y) has two representations: discrete values and successive values. For example, in this article, the prediction of profit, the result is a continuous value, and then for example, according to historical weather conditions forecast tomorrow's weather (rain or rain), the result of the prediction is a discrete value (discrete values)

Therefore, if the hypothesis function output is a continuous value, it is said that this type of learning problem is a regression problem (regression problem), and if the output is discrete, it is called a classification problem (classification problem)

② cost function

The learning process is the process of determining the hypothetical function, or the process of finding Theta.

Now, assuming that Theta has been worked out, you need to judge whether this hypothetical function is good or not. What is its deviation from the actual value? Therefore, the cost function is used to evaluate.

The cost function after vectorization:

Generally, the number of training samples is expressed in m (size of training set), X (i) represents the first sample, and Y (i) represents the predicted result of the sample I.

It can be seen that the cost function is very similar to the concept of "minimum mean variance". J (θ) is the theta function.

Obviously, the smaller the cost function, the better the model. The goal, therefore, is to find a suitable set of θ, so that the cost function takes the minimum value.

If we find theta, then we have a hypothetical function? A model--linear regression is also obtained.

So how do you find theta? Is the gradient descent algorithm mentioned below (gradient descent algorithm)

③ gradient descent algorithm (Gradient descent algorithm)

The essence of gradient descent algorithm is to seek partial derivative, make partial derivative equal to 0, solve θ

Start with an initial θ, and then the for loop executes the formula above, and when the partial derivative equals 0 o'clock, the θj is no longer updated, and a final θj value is obtained.

The entire derivative is calculated as follows:

Vector representation of ④ hypothesis function, cost function and gradient descent algorithm

Suppose the vector of the function is represented as follows:

The cost function is represented as follows:

The vectorization of θ using the gradient descent algorithm is represented as follows:

(There is an error in the original formula, the formula after the first equals should not be divided by M, corrected here)

The certification process is as follows:

Add:

Closed solution of θ (close-form solution)

The closed solution of θ , which is its analytic solution, is the solution that makes the cost function J (θ) get the minimum value.

The advantage of using closed solution is to get the exact solution in one step and avoid "loop until converge";

The disadvantage is that when the number of features is large, the dimension of x is larger, and the complexity of the solution is O (n^3), and the time cost is higher. (The characteristic quantity generally takes the 10^4 as the demarcation point, above this value generally considers the gradient descent)

Quote Original: http://www.cnblogs.com/hapjin/p/6079012.html

Notes of machine learning (Andrew Ng), Week, Linear Regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.