2.3 Two simple prediction methods: least squares and nearest neighbors
in this section we discuss two simple but effective prediction methods, using the least squares linear model to fit and k Nearest neighbor prediction. The linear model makes a lot of assumptions about the structure, but it may produce inaccurate predictions. k -nearest neighbors make the appropriate assumptions about the structure, so predictions are usually accurate but unstable.
2.3.1 linear model and least squares
in the past - The linear model has been a pillar of statistics for years, and is still one of our most important tools. Given an input vector , use the following model to predict Y:
where x contains a constant variable 1 contains is convenient. This linear model of the vector form can be written in the form of an inner product:
whichrepresentsXthe transpose. This is modeled on a single output, sois scalar. Generallycan makeKvector. SoBetais apxkmatrix of coefficients. In (p+1) Dimension Input-in the output space, (X,) represents a hyper-plane. IfXcontains a constant, the hyper-plane contains the origin, and it is a subspace. IfXdoes not contain constants, then the hyper-plane is an affine set,Yaxes and points (0,) intersect. Now let's assume that the intercept is contained inthe.
assumed to be P dimension of the input space, then is linear, and the gradient F ' (X) =β is the vector in the input space, pointing to the steepest direction of ascent.
So how do we fit the training data set with a linear model? There are a number of different methods, but the most popular is the least squares so far. In this method, we select the coefficient β to make the residuals squared and minimum:
RSS (Beta) is a two-time function of a parameter, so the minimum value always exists, but may not be unique. The form of the solution matrix is easiest to show, and the equation can be written as:
which X is a N x P The matrix, each line is an input vector, y is the output from the training data set. N vector. We can get the standard equation for the beta differential:
If It is non-singular, the only solution is:
The proof is as follows (i):
and the firstIan inputthe fitting value for the. At any input, the forecast is. The entire fitting face isPa parameteras a feature. Intuitively, we don't seem to need much data to fit this type.
Statistical Learning Basics (Second edition) Two simple prediction methods: least squares and nearest neighbors