Supervised Learning
Learn a function H: X → y
H is called a hypothesis.
1. Linear Regression
In this example, X is a two-dimensional vector, x1 represents living area, and x2 represents bedrooms.
Functions/hypotheses H
Set X0 = 1.
Now, given a training set, how do we pick, or learn, the parameters θ? Now it is used to evaluate the θ parameter.
One reasonable method seems to be to make h (x) close to y,
We define the cost function: defines the loss function:
To minimize the value of this function
1. LMS algorithm: Least Mean Square
We want to choose θ so as to minimize J (θ ).
Gradient Descent Algorithm
α is called the learning rate.
LMS update Rule
Called batch Gradient Descent
Algorithm:
Every θ in each loop, for example, θ J must be updated m times, I = 1, 2 ,... M and M are the number of elements in the training set.
If M is too large, the algorithm will be slow. Instead, use the random gradient descent method.