Stanford machine learning-lecture 1. Linear Regression with one variable

Source: Internet
Author: User

This topic (Machine Learning) including Single-parameter linear regression, multi-parameter linear regression, Octave tutorial, logistic regression, regularization, neural network, machine learning system design, SVM (Support Vector Machines support vector machine), clustering, dimensionality reduction, exception detection, and large-scale machine learning. All content comes from Standford Open Class Machine
In learning. Https://class.coursera.org/ml/class/index)


Chapter 1 ------- single-Parameter Linear Regression linear regression with one variable


(1) Cost Function

Linear regression is a series of points. Assume that the fitting line is h (x) = theta0 + theta1 * X, and the cost function is J (theta0, theta1)

The single parameter only has one variable X, that is, the one-dimensional variable that affects the regression parameter θ 1 and θ 0, or the input variable has only one-dimensional attributes.


For simplified mode, only theta1 does not have theta0, that is, the fitting line is h (x) = theta1 * x

The left figure shows the straight lines and data points for given theta1 x

The right figure shows the cost function J (theta1) under different theta1)



Cost Function plot:



When two parameters theta0 and theta1 exist, cost function is a three-dimensional function. The image like this is called bowl-shape function.


Map the cost function in two dimensions with contour lines of different colors to the right of the following picture. You can get the cost function shown in the diagram when a (theta0, theta1) is given in the left graph.



Our goal is to minimize the cost function, that is, in the last graph, theta0 = 450, theta1 = 0.12.




(2) Gradient Descent

Gradient Descent refers to gradient descent, which is used to depict cost funciton, let the parameter go along the gradient descent direction, and iteratively reduce J (theta0, theta1), that is, the steady state.


The direction of each descent along the gradient:


Parameter conversion formula: the gradient (inside the Blue Box) and learning rate (α) are marked ):


Gradient is the tangent slope and tan β of J at this point. The slope (gradient) is positive and negative, respectively:



Theta0 and theta1 are updated at the same time. The left side is the solution:



Learning rate:


Alpha is too small: Learning is slow; Alpha is too large: Learning is easy

So if it falls into a local minimum, the slope is 0 and will not change to left or right.

This figure shows that the descent amplitude can be gradually decreased without decreasing α gradually (because the gradient decreases gradually ):


After export:



From this we get:


X (I) indicates group I data in input data X.




More learning materials about machine learning will be updated. Stay tuned to this blog and Sina Weibo sophia_qing.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.