1. Iterative Solution to linear fitting
The gradient descent method moves the parameter a small distance along the gradient direction each time.
There are two specific implementations: one is to consider all the experiment points each time you move the training set, which has a large overhead. (As shown below, all M test points are scanned each time)
The other is stochastic gradient deseent.
When scanning each vertex, it is determined that the parameter is adjusted according to the gradient of the vertex. That is
Each parameter adjustment takes into account only the current test point. This convergence speed will be faster, but it is not guaranteed to be able to converge to the best, but if the value is gradually reduced, it can converge to the best.
I personally think the gradient descent method depends on the starting position, and eventually it is a local optimal result.
2. Matrix derivative, trace)
3. Least Squares
Here the courseware uses the derivative definition of the matrix (here is a vector actually), the trace feature of the matrix
The result is that the least squares solution is consistent from the perspective of vector projection.
4. Probability View
5. Local Weighted Linear Regression
That is to say, when we consider predicting y = f (x), we should give priority to the characteristics of the test points near X, they give a higher weight, the Influence Coefficient of the test points far from X is smaller.
Optimization mentioned aboveAlgorithmThe steps are as follows:
For locally Weighted Linear Regression
In this way, the point closest to X has a large influence factor.
Local weighting should be better, but we can calculate the parameters offline for normal linear regression, so we do not need to load training point data online.
However, local weighting requires that training data be loaded when every Y = f (x) is calculated. For different x values.
Local weighted linear regression is a non-parametric method.
6. Classification and Logistic Regression
Assume that the classification target is {0, 1. Prediction Classification
Gradient Method Rules note that the form is the same as linear regression, but it is not a linear function (g (z )).
7. perceptron learning method
The value is different from the value of 0-1 in the above logsitic g (z) output. Here, Let g (z) output only 0 or 1
8. Newton's method
Demonstrate how to use the Newton Method to Solve f (x) = 0
Then it corresponds to our least square problem.
= 0
Newton-Raphson is a vector.
H-Hessian
9. Extended Linear Model index series
Can be written in the following format
Consider the bernuli model corresponding to the above logical Regression
Note:
For more information, see Speech and Language Processing p231.
Prediction is obviously not suitable if we use a linear model for fitting, because the value on the right can be any value, and the value on the left is [].
We can consider using it to predict odds.
but the left side still belongs, so we get the log OK on the left side, which is consistent with the above...