From the previous article, the most important thing in supervised learning is to determine the imaginary function h (θ), which is to determine the H (θ) by making the cost function J (θ) the smallest.
The last one is to find the smallest J (θ) by the gradient descent method, which we will use to explain the matrix.
1, ordinary least squares
Using a matrix, the M training set (x, y) can be represented as follows:
Therefore, so
According to the
In order to minimize J (θ), derivation can be obtained by derivation:
It can be seen from (Formula 1) that the matrix is inverse and therefore only applicable to the existence of the inverse matrix. This is the ordinary least squares.
2. Local weighted linear regression (locallyweighted Linear REGRESSION,LWLR)
The linear regression of the ordinary least squares is likely to occur under-fitting, because it asks for unbiased estimation with the least mean variance. So we can choose local weighted linear regression algorithm. In this algorithm, we give a certain weight to each point near the predicted point, and then make a normal regression on this subset based on the minimum mean variance. The closer to the predicted point, the heavier the weight, which is to use the points near the check to give higher weights. The most common is the Gaussian nucleus. The weights corresponding to the Gaussian nuclei are as follows:
In (Formula 2), the only thing we need to make sure is that it's a user-specified parameter that determines how much weight is given to nearby points.
Therefore, as shown in (Equation 3), local weighted linear regression is a non-parametric algorithm.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Stanford "Machine learning" lesson1-3 impressions-------3, linear regression two