This topic (Machine Learning) including Single-parameter linear regression, multi-parameter linear regression, Octave tutorial, logistic regression, regularization, neural network, machine learning system design, SVM (Support Vector Machines support vector machine), clustering, dimensionality reduction, exception detection, and large-scale machine learning. All content comes from Standford Open Class Machine
In learning. Https://class.coursera.org/ml/class/index)
Chapter 1 ------- single-Parameter Linear Regression linear regression with one variable
(1) Cost Function
Linear regression is a series of points. Assume that the fitting line is h (x) = theta0 + theta1 * X, and the cost function is J (theta0, theta1)
The single parameter only has one variable X, that is, the one-dimensional variable that affects the regression parameter θ 1 and θ 0, or the input variable has only one-dimensional attributes.
For simplified mode, only theta1 does not have theta0, that is, the fitting line is h (x) = theta1 * x
The left figure shows the straight lines and data points for given theta1 x
The right figure shows the cost function J (theta1) under different theta1)
Cost Function plot:
When two parameters theta0 and theta1 exist, cost function is a three-dimensional function. The image like this is called bowl-shape function.
Map the cost function in two dimensions with contour lines of different colors to the right of the following picture. You can get the cost function shown in the diagram when a (theta0, theta1) is given in the left graph.
Our goal is to minimize the cost function, that is, in the last graph, theta0 = 450, theta1 = 0.12.
(2) Gradient Descent
Gradient Descent refers to gradient descent, which is used to depict cost funciton, let the parameter go along the gradient descent direction, and iteratively reduce J (theta0, theta1), that is, the steady state.
The direction of each descent along the gradient:
Parameter conversion formula: the gradient (inside the Blue Box) and learning rate (α) are marked ):
Gradient is the tangent slope and tan β of J at this point. The slope (gradient) is positive and negative, respectively:
Theta0 and theta1 are updated at the same time. The left side is the solution:
Learning rate:
Alpha is too small: Learning is slow; Alpha is too large: Learning is easy
So if it falls into a local minimum, the slope is 0 and will not change to left or right.
This figure shows that the descent amplitude can be gradually decreased without decreasing α gradually (because the gradient decreases gradually ):
After export:
From this we get:
X (I) indicates group I data in input data X.
More learning materials about machine learning will be updated. Stay tuned to this blog and Sina Weibo sophia_qing.