[Stanford open courses] Machine Learning: Linear Regression with one variable (Week 1)

Source: Internet
Author: User

From ⅱ to IV, linear regression is used. Chapter II describes simple linear regression (SLR) (single variable ), chapter III describes the basis of line generation, and chapter IV describes multivariate regression (greater than one independent variable ).

The purpose of this article is to implement some algorithms that appear in chapter II. Suitable for scholars who have already completed Stanford courses in this chapter. I am just a beginner and try to explain the problem in vernacular. For more information, see.

Before you start to discuss the specific steps, first give a brief thought route:

1. Having a point set, in order to get the best fit line;

2. Use the least square method to measure the fitting degree and obtain the Price equation;

3. the gradient descent algorithm is used to obtain the minimum value of the Price equation;


First, we will introduce several concepts:

Given a point set in mathematics, regression can be fitted with a curve.. If the curve is a straight line, it is called linear regression. If the curve is a quadratic curve, it is called quadratic regression. There are many variants of regression, such as locally weighted regression, logistic regression.

H obtained in the course is a linear regression equation:

Next, we will first introduce the linear regression of a single variable:

The problem is that, given a point set, a straight line is used for fitting, and the best fitting effect (best fitting) is required ).

Since it is a straight line, let's assume that the equation of the straight line is:

The vertex set is ready and the linear equation is ready. Next, we need to calculate the sum so that the fitting effect is optimal (the best fit ).

So what are the criteria for judging fit results? In other words, we need to know a measurement of the fitting effect.

Here, we propose"Least Squares": (The following is taken from wiki)

Least Squares(Also knownLeast flat MethodIs a mathematical optimization technology. It minimizes the sum of squares of errors to find the optimal function matching for data.

ExploitationLeast SquaresIt is easy to obtain unknown data and minimize the sum of squares between the obtained data and the actual data.

We will not discuss the "Least Square Method". As long as we know that it is a measurement standard, we can use it to judge whether the calculated linear equation has reached the optimal fit.

Then, back to the problem, in the linear regression of a single variable, the expression of this fitting effect uses the least square method.Minimizes the sum of the residual squares of unknown values:

Combined with the course, a cost function is defined:

In fact, if we place the specific values of the point set into the cost function, we have completely abstracted a question of Advanced Mathematics (solving the problem of the minimum value of a binary function ).

Where, A, B, C, D, E, F are all known.

This course introduces a method called Gradient Descent-gradient descent algorithm.

The two figures show the basic idea of the algorithm:

The so-called Gradient Descent Algorithm (a method for finding the local optimal solution), for example, if you are on a hill now, you want to reach the bottom of the hill as soon as possible (minimum value ), this is a process of decline. Here there are two problems involved: 1) how many steps you take when you go down the hill (of course, it is definitely not the bigger the better, because one possibility is that you miss a very small position when you step too far.) 2) in which direction do you move? (note that this direction is constantly changing, when you go to a new position, you have to judge which direction the next step is the best, but one thing you can be sure of is that to reach the lowest point as soon as possible, you should go down the hill from the steep place ).

Then, when does it mean that you have reached a minimum point? Obviously, when the changes in your position keep decreasing until it converges to a certain position, it means that the position is a minimum point.

 

So, let's look at the changes, then we need to evaluate the deviation, and the reciprocal represents the change rate. That is to say, we have to go down the hill to the steep area (because it is obviously faster along the steep area), and we have achieved the following changes:

Simplified:

 

Step size should not be too large or too small

The gradient descent method is carried out in the following process: (from: http://blog.sina.com.cn/s/blog_62339a2401015jyq.html)

1) assign a value to θ first. This value can be random or make θ a completely zero vector.

2) Change the θ value so that J (θ) decreases in the gradient descent direction.

{

To facilitate your understandingSingle VariableExample:

Eg: minimum value. (Note :)

The Java code is as follows:

·

Package onevariable; public class onevariable {public static void main (string [] ARGs) {Double E = 0.00001; // defines the iteration accuracy. Double alpha = 0.5; // define the iteration step Double X = 0; // initialize X double Y0 = 2 * x + 3 * x + 1; // double Y1 = 0 for the Y value corresponding to the initialization X; // defines the variable, used to save the current value while (true) {x = x-Alpha * (4.0 * x + 3.0); Y1 = 2 * x + 3 * x + 1; if (math. ABS (y1-y0) <E) // If the results of two iterations change little, end iteration {break;} Y0 = Y1; // update iteration result} system. out. println ("min (f (x) =" + y0); system. out. println ("Minx =" + x) ;}// output min (f (x) = 1.0 Minx =-1.5

}

For the sake of clarity, the following figure is given:

This is the relationship between the parameter θ and the Error Function J (θ). The red part indicates that J (θ) has a relatively high value. What we need is to make J (θ) the value is as low as possible. That is, the dark blue part. θ 0 and θ 1 represent the two dimensions of the θ vector.

The first step of the gradient descent method mentioned above is to give θ an initial value, assuming that the initial value given randomly is the cross point on the graph.

Then we adjust θ in the direction of gradient descent, and then J (θ) Changes in the lower direction ,, the end of the algorithm is to fall from θ until it cannot continue to fall.

Of course, the final point of gradient descent may not be the global minimum point, but may be a local minimum point, which may be the following:

The above figure shows a local minimum point. We have re-selected an initial point, it seems that our algorithm will be largely influenced by the choice of the initial vertex and fall into the local minimum vertex.

It is worth noting that the gradient has a direction. For a vector θ, every one-dimensional component θ I can find a gradient direction, we can find a general direction. When we change, we will change towards the most downward direction to reach a minimum point, whether local or global.

 

This is the theoretical knowledge. Next, we will use Java to implement this algorithm:

There are two types of gradient descent: Batch gradient descent and random gradient descent. See: http://blog.csdn.net/lilyth_lilyth/article/details/8973972

The test data is based on the data (ex1data1.txt) in the question after the lesson, and is displayed by opening the graph with MATLAB:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.