First we look at a linear regression problem, in the following example, we select the characteristics of different dimensions to fit our data.
For the above three images do the following explanation:
Select a feature to fit the data, it can be seen that the fitting situation is not very good, some data error is still relatively large
For the first one, we added extra features, and we can see that the situation is a lot better.
This time may have doubts, is not the more features selected the better, the higher the dimension, the better? So for this question, like the rightmost graph, we use the 5-yer polynomial to make the data points on the same curve. At this point it's a good fit for the training set, but we don't think it's a good assumption because it doesn't make better predictions.
For the above analysis, we think that the second one is a good hypothesis, and the first figure we call theunderfitting, and the right-most case we call overfitting (overfitting )
So we know that the selection of features is very important to the performance of the learning algorithm, so now we are going to introduce local weighted linear regression, which makes the selection of features less important for the algorithm, that is, more random.
In our primitive linear regression, for the input variables, we want to predict, usually do:
For locally weighted linear regression, we do:
As a weight, from the above we can see that if very large, we will be very difficult to make small, so if very small, then it has a small impact.
Usually the form we choose is as follows:
The parameter in the above formula is the sample characteristic data of the new prediction, it is a vector, the parameter controls the rate of the weight change, and the image is as follows
Can see
(1) if, then.
(2) if, then.
That is, close to the sample, the weight is close to 1, and for the very far sample, at this time the weight is close to 0, so that is in the local formation of linear regression, it relies on is only the surrounding point
The result of using linear regression in the red line in the figure, the result of black line using LWR, can see the effect of local weighted regression is better.
Attention:
is similar to the Gaussian function, but it has nothing to do with the Gaussian function, it is the wavelength parameter, and the larger the distance sample weight drops faster.
Local weighted regression will re-determine the parameters every time the new samples are predicted, thus achieving better prediction results when the data size is larger, the computational amount is very high and the learning efficiency is very low. and local weighted regression is not necessarily to avoid underfitting.
for the linear regression algorithm, once fitting the parameters of the training data θi ' s, save these parameters θi ' s, for the subsequent predictions, no need to use the original training data set, so is the parameter learning algorithm .
For the local weighted linear regression algorithm, all the training data is required for each prediction (each time the predictions are made with different parameters θi ' s), there is no fixed parameter θi ' s, so it is a non-parametric algorithm .
Local weighted linear regression (locally weighted linear regression)