---restore content starts---
Matrix method in Machine learning 01: Linear system and least squares
Description: Matrix Methods in Data Mining and Pattern recognition reading notes
Very nice matrix online calculator, URL: http://www.bluebit.gr/matrix-calculator/.
1. LU decomposition
Suppose you want to solve a linear system now:
Ax = B,
Where A is a nxn non-singular square, there is a unique solution for any vector b.
Looking back at our practice of solving this linear equation by hand, we first add and subtract the matrix A line, transform a matrix into an upper triangular matrix, and then solve the unknown one from the bottom to the next, which is the Gaussian elimination element method .
In fact, the matrix is equivalent to the left multiplication of a unit matrix, the row transformation operation can be directly reflected on the left multiplication of the unit matrix. For example, the second line of matrix A is subtracted from the first row, equivalent to the left multiplication of the second row of the unit matrix minus the first row, of course, the matrix A finally converted to the upper triangular matrix U is more complex than this, we will be left multiply the unit matrix finally equivalent transformation of the matrix named M, obviously, M is a lower triangular matrix
MA = U.
A = LU. Where M, L are mutually inverse matrices
The image is shown as follows:
Since the first item of the first line after a conversion is not 0, the second item in the second row is not 0, the third line ... Therefore, a front should be preceded by a matrix P that is exchanged between rows and rows. Therefore, the LU decomposition formula can also be written as:
PA = LU
P is permutation matrix, L is lower triangular matrix, U is upper triangular matrix. Some books place the permutation matrix P on the right side of the equals sign. For example, the Matrix online calculator defaults to placing P in front of the Matrix L. Use this tool to test the correctness of LU decomposition.
If a is a symmetric positive definite matrix, L and U are just the opposite of each other, the difference is that there is a multiple relationship between the corresponding row, this multiple relationship can be implemented with a diagonal matrix, so, LU decomposition can be written as:
By averaging the diagonal matrix to both sides, the formula can be converted to:
Where the matrix U is the upper triangular matrix, this is Cholesky decomposition, this decomposition method is a bit like the square root of the truth-seeking number. At the same time, the calculation of the decomposition method is only half of the LU decomposition.
2. Number of conditions
The condition number is a measure of the sensitivity of the ax=b of a linear equation group to the error or uncertainty in B. Mathematically defined as the number of conditions of matrix A is equal to the norm of a and the inverse norm of a, that is, cond (A) =‖a‖ ‖a-1‖, corresponding to the matrix of 3 norms, corresponding to the definition of 3 conditions.
Matlab inside arithmetic function: Cond (a,2) or cond (A): 2 norm
An extreme example, when a singular, the condition number is infinite, then even if not change b,x can also change. The essence of singularity is that the matrix has 0 eigenvalues, and X moves in the direction of the corresponding eigenvector without changing the value of AX. If a characteristic value is much smaller than other eigenvalues at the order of magnitude, X moves very much in the direction of the eigenvector to produce a slight change in B, which explains why the matrix has a large number of conditions, in fact, the condition number of the normal array under the second norm can be expressed as ABS (maximum eigenvalue/minimum eigenvalue). --Excerpt from Baidu Encyclopedia
In the computer programming environment, the data has the floating-point type representation, the precision is limited, exists the disturbance, therefore will have the error when solving the linear equation.
3. The problem of least squares
In machine learning, the problem of least squares is often encountered when using linear classifiers or linear regression methods (classifications can also be seen as a special regression with target values of 1 and-1 or 0 and 1). The existing problem, the left is the sample point two properties, the right is the regression value, how to determine E and k?
This linear system can be expressed as:
Ax = b
Where the number of rows in a is greater than the number of columns, this is called overdetermined. Suppose that a matrix is a matrix of 3x2, so that a = (A1, a2), Ax represents the linear combination (span) of the two base vectors, which is geometrically a plane in three-dimensional space. The b vector is a point in the three-dimensional space, if it is not on SPAN{A1, A2}, then the equation does not have the correct solution. Now our task is to make the residual vector r = B-ax Minimum:
The optimal value of objective function can be achieved by using gradient descent method in LMS algorithm, but what is the real meaning of this objective function?
The intuitive sense of geometry is that if the residual vector is just perpendicular to the A1, the A2 consists of a plane, at which point the optimum is achieved:
The residual vector r = b-ax and A1, the plane of A2 is perpendicular, then A1 with the vector, A2 is also perpendicular to each other:
Expand the formula as follows:
If the column vectors of a matrix are linearly independent, then there is a unique solution to the equation:
There are two defects in solving least squares problems with this method:
- A^t A can cause loss of information. Suppose that an item in a is a floating-point number that can be represented by a computer, a multiplied by a, the square of a floating-point number may be lost beyond precision, resulting in a^t a being a singular matrix that cannot be solved, and the method to be followed is the singular value decomposition.
- The condition number of a^t A is the square of a. System instability has become larger. The solution is to centrally preprocess the data in order to increase the orthogonality of the base vectors.
Matrix method in Machine learning 01: Linear system and least squares