The least squares loss function and its related knowledge are mentioned in linear regression. For this part of the knowledge is unclear to the students can refer to the previous article "linear regression, gradient descent." This paper focuses on the method of constructing loss function and minimizing loss function using least squares method.
Construction of loss function by least squares method
The least squares method is also an optimization approach, which is used to obtain the optimal value of the objective function. To put it simply: Let's minimize the total fit error (i.e. total residuals) of the predicted value and the real value.
The loss function is constructed using the least squares method in linear regression:
In the previous article, "Linear regression, gradient descent" mentioned in the solution of the loss function J (θ) to take the minimum θ value has two methods: gradient descent (gradient descent) and the regular equation (the normal equations). The following is mainly about the regular equation. Gradient descent method minimizing loss function reference article "linear regression, gradient descent"
The regular equation
The training features are represented as X-matrices, the results are expressed as Y-vectors, and the linear regression model is still the same, and the loss function is unchanged. Then θ can be derived directly from the following formula:
The derivation process involves the knowledge of linear algebra, where the linear algebra knowledge is not expanded in detail.
Set m as the number of training samples; x is the independent variable in the sample, that is, the housing area and I are the number in the house price forecast, X is the n-dimensional vector, the vector y is the house price in the training data, and Y is the M-dimensional vector. Then the training data can be expressed in matrices as:
Because, so it can be expressed as:
The loss function is transformed into:
There are two formulas in linear algebra:
Where the symbol represents a m*n matrix, the first (I,J) element of the matrix is. The above two formulas together can be expressed as:
Based on this formula, the loss function J (θ) is deduced:
To minimize J (θ), and because J (θ) is obtained by the least squares method, the value of J (θ) is greater than or equal to 0, that is, the minimum value is 0. So, we make, and thus get theta values:
Loss function-andrew ng machine Learning public Lesson Note 1.2