Original address

Mathematics is the foundation of computer technology, linear algebra is the basis of machine learning and deep learning, the best way to understand the knowledge of the data I think is to understand the concept, mathematics is not only used for exams in school, but also the essential basic knowledge of the work, in fact, there are many interesting math classes in the school There are many expanding class of data can let us divergent thinking, but master the most basic mathematical knowledge is the premise, this article on the linear algebra of various terms to do a warm-up, do not know to remember Baidu.

Please respect original, reprint please indicate source website www.shareditor.com and original link address

Matrices and equations

Remember how the N*n equations are solved. This term is called "back to the law", that is, to the triangular system of equations and then into the solution

Has not understood the "algebra" This "generation" is what meaning, now finally understood, the generation, English is the substitution, the meaning is replaces, from middle school to now always thought "algebra" is "substituting"

Coefficient matrix, English name is coefficient matrix, no wonder that read open source code often encountered variable name called Coe, originally from here

"Derivative", "can Guide" remember. Do not know what the meaning of "guide" has wood there. English derivative (meaning is derived, derivative), it does not seem to divert the meaning, but transliteration over the

The matrix is a rectangular array of numbers, which is a simple one.

N*n's matrix is called a Phalanx, fools know

The matrix with the right end of the coefficient matrix is called the augmented matrix, which is called augmented, and is recorded as: (a| B, scientists casually think of a thing name let us hold a book to chew, I put a behind a two B, called "Augmented Matrix II" line?

Row Ladder type matrix, this time is a bit difficult, it is like this: not 01 lines less than one line, the first element is 1, the number on the right

Gauss elimination Method: the augmented matrix is converted into row-step matrix

Super-Set equations: The number of equations is more than the number of unknown quantity

Row simplest shape: row ladder shape, each row first is not 0 yuan is this column only is not 0 yuan

Gauss-Jordan as elimination method: A method of translating matrices into the simplest form

Homogeneous Equation Group (homogeneous): The right end is all zero. Homogeneous equations always have solutions.

The ordinary solution is 0 solution (0,0,0,..... 0), can not so ordinary call ....

Nontrivial solution: A solution other than the 0 solution

X with a horizontal arrow above the horizontal array (row vector), does not add a column vector, different book notation is not the same, let's remember

The properties of a symmetric matrix: transpose equals his own

If A= (1), then an= (2n-1)

If ab=ba=i, it is said that a is reversible, or a is not singular (nonsingular), B is called a inverse, and is recorded as A-1

The matrix does not have a multiplicative inverse, then it is called singular (singlular)

(AB) -1=b-1a-1

(AB) T=btat

The adjacency matrix of the graph (connected to 1 otherwise 0) is symmetric

Elementary matrix: By multiplying the two ends of the equation to get the row ladder shape, the elementary matrix is nonsingular, that is, the inverse

If multiple elementary matrices are connected by a, then A and B are b=.

If a is equivalent to line I, then ax=0 only trivial solution 0, and a has the inverse matrix A-1, that is, a is singular, at this time Ax=b has a unique solution

Method of inversion: a| the augmented matrix I make the row and column transformation, turn a to I, then I become A-1

Diagonal matrices: Elements outside the diagonal are 0

If a can only use the line to simplify the triangle strictly, then A has one LU decomposition, L is the unit lower triangular matrix, the matrix value is the transformation used coefficient, this is called LU decomposition

The rule of matrix multiplication after matrix chunking

The inner product is also called a scalar product: The product of a row vector and a column vector to derive a number

Outer product: The product of a column vector and a row vector to derive a matrix

Outer product Expansion: Two matrices are represented by vectors respectively, and the product can be expressed as an outer product expansion.

Determinant

Determinant: An array of two vertical bars

each square matrix can correspond to his determinant, the determinant is a singular

determinant algorithm: Expand a row, multiply each number by his cofactor and add and

if the determinant is not 0, the square matrix is nonsingular

Det (A) can be represented as a cofactor expansion

of any row or column of a, the determinant of the triangle matrix equals the product of the diagonal element

Interchange matrix. The determinant becomes the original negative number, that is, the Det (EA) =-det (a)

The matrix is multiplied by a, and the determinant becomes the original a Times, That is, Det (EA) =adet (a)

a row of a matrix multiplied by a to another line, the determinant is invariant

if a multiple of the other row of the behavior, the matrix determinant is zero

det (AB) =det (a) det (B)

Adj A: Adjoint of the Matrix (adjoint), Replace the element with cofactor and transpose the

Inversion method: a-1= (1/det (a)) adj A, deduced: a (adj a) =det (a) I (((1/det (a)) adj a) = I

carat summerside law: Ax=b's only solution is Xi=det (Ai)/ Det (A), which is a convenient method for solving linear equations by determinant

Information encryption method: Find the determinant of positive and negative 1 integer matrix A,a-1=+-adj A easy, multiply a encryption, multiply A-1 decryption, A's construction method: unit matrix do Elementary Transformation

Vector product is also a vector

x as a row vector in calculus, x in linear algebra as a column vector

assuming x and y are row vectors, then x*y= (x2y3-y2x3) I-(x1y3-y1x3) j+ (x1y2-y1x2) K, where i,j,k is the row vector of the unit matrix

The vector product can be used to define the vice normal direction

XT (x*y) =yt (x*y) = 0, indicating that the vector product and the vector are at the angle of 0

Vector space

vector space: This set satisfies addition and scalar multiplication operations, scalar usually refers to the real number

subspace: The subset of the vector space S is also a vector space, this subset is called the subspace

except {0} and vector space itself, other subspaces are called real subspaces, Similar to the concept of a true subset, {0} is called the 0 subspace

Ax=0 solution Space N (a) is called the 0 space of a, that is, the solution space of a ax=0 linear equation group forms a set of linear combinations of multiple vectors in a vector space

vector space V the spanned (span) of these vectors, Span (v1,v2,..., VN)

Span (e1,e2) is a subspace of R3, which is represented geometrically as a vector of 3-D space in all X1X2 planes

Span (e1,e2,e3) =R3

If span (v1,v2,v3) = R3, then say that vector v1,v2,v3 spanned r3,{v1,v2,v3} is a set of V

The minimum set is said to have no extra vectors

The minimum tensor is judged by: these vector linear combinations = 0 have only 0 solutions, and this is the case that these vectors are linearly independent, If there is a non 0 solution then it is linear correlation

in geometry, two-bit vector linear correlation is equivalent to parallel, three-dimensional vector linear correlation is equivalent to in the same plane

vector constituent matrix determinant det (a) = 0, then linear correlation, otherwise linearly irrelevant

The unique linear combination of a linear independent vector represents the base of a vector space for any vector

minimum tensor, {e1,e2...en} is called a standard base, and the number of base vectors is the dimension of the vector space

Transfer matrix: The transformation matrix from one group to another group

spanned by the row vector of a R1 The *n subspace becomes the line space of a, by which the RM subspace of the column vector of a spanned becomes the rank (rank) of A's column space

A, and the dimension of the line space becomes a. The rank method of a is: To turn a into a row ladder shape, not 0 rows is the rank

the dimension of the 0 space of the matrix becomes the zero of the Matrix, The sum of general rank and Zero equals the number of columns of the matrix

M*n the dimension of the row space of the matrix equals the number of dimensions in the column space

Linear transformation

Linear transformation: L (Av1+bv2) =al (v1) +BL (v2)

Linear operator: A linear transformation of a vector space to its own

Typical linear operator Distance: Ax (elongation or compression a-fold), x1e1 (projection to the X1 axis), (X1,-X2) t (about the X1 axis for symmetry), (-x2,x1) t counterclockwise rotate 90 degrees

Judging whether it's a linear transformation, see if this transformation can be transformed into a m*n matrix.

The kernel of the linear transformation L is ker (l), which represents the 0 vector in the vector space after the linear transformation.

The image of the subspace S is L (s), which indicates that the vector on the subspace S is the value of the L transformation.

The like L (V) of the whole vector space becomes the domain of L

Ker (l) is a subspace of V, L (s) is a subspace of W, where L is a linear transformation of V to W, S is a subspace of V

The matrix A of the linear transformation of a vector space V with an ordered base of E to a vector space W with an F ordered base is called the representation matrix.

B is l corresponding to [U1,U2] 's representation matrix, A is l corresponding to [e1,e2] 's representation matrix, U is a transfer matrix from [U1,U2] to [E1,e2], then B=u-1au

If b=s-1as, then say b similar to a

If A and B are representation matrices of the same linear operator L, then A and B are similar

Orthogonality of

The scalar product of two vectors is zero, then they are called orthogonal (orthogonal)

The distance between vectors x and y in R2 or R3 is: | | x-y| |

xty=| | x| | || y| | cosθ, i.e. Cosθ=xty/(| | x| | || y| |)

Set direction vector u= (1/| | x| |) x,v= (1/| | y| |) Y, then COSΘ=UTV, which is the scalar product of the angle cosine equal to the unit vector

Cauchy-Schwartz Inequality: |xty| <= | | x| | || y| |, if and only if there is a 0 vector or exponential relationship, the equals sign is established

Scalar projections: The length of vector projections, α=xty/| | y| |

Vector projection: p= (xty/| | y| |) y= (Xty/yty) y

to r3:| | x*y| | = || x| | || y| | sinθ

When x and Y are orthogonal, | | x+y| | 2 = | | x| | 2 + | | y| | 2, the Pythagorean law.

C2=a2+b2 is called Pythagoras theorem, in fact, is the hook-string theorem

Cosine should be used to judge the similarity degree

U is a vector matrix, C=utu corresponds to the scalar product value of each row vector, this matrix represents the correlation, that is, the correlation matrix (correlation matrices), the value is positive is positive correlation, the value is negative is negative correlation, the value is 0 is irrelevant

Covariance: X1 and X2 are deviation vectors of the relative average of two sets, covariance cov (x1,x2) = (X1TX2)/(n-1)

The covariance matrix s=1/(n-1) XTX, the diagonal element of the matrix is the variance of the three score sets, and the non diagonal elements are covariance

Orthogonal subspace: The two subspaces of a vector space each take out a vector is orthogonal, then subspace orthogonal. For example, z-axis subspace and XY flat face space are orthogonal

Orthogonal complement of subspace Y: a set in which each vector in the set is orthogonal to Y

An orthogonal complement must also be a subspace

The column Space R (a) of a is the domain of a, that is, the x vector in Rn, the B=ax in the column space

The orthogonal space of R (at) is 0 space N (A), which means that the column space of A and the 0 space of a are orthogonal

S is a subspace of Rn, the dimension of S +s the dimension of the orthogonal space =n

S is a subspace of Rn, then the orthogonal space of the orthogonal space of S is his own

least squares (least squares) used to fit a set of points on a plane

The least squares solution is the vector nearest B to the P=ax, and the vector p is the projection of B on R (A).

The residual r (x) of the least squares solution x must belong to the orthogonal space of R (A)

Residual error: R (x) = B-ax

Atax = ATB is called the normal equation group, which has a unique solution x = (ATA) -1ATB, which is the least squares solution, the projection vector p=a (ATA) -1ATB is the element in R (A)

Interpolation polynomial: a polynomial not exceeding n times n+1 a point on a plane

A vector space that defines the inner product becomes an inner product space

Scalar inner product is the standard inner product of RN, and weighted summation is also an inner product

Neiji expressed as, internal product to meet: >= 0; =; =a+b

a=/| | v| | Scalar projection for u to V

Vector projection of p= (/) v to u to V

Cauchy-Schwartz Inequality: | | <= | | u| | || v| |

Norm (Norm): Defines the real number associated with a vector | | v| |, Meet | | v| | >=0; | | av| | =|a| | | v| |; || v+w| | <= | | v| | + || w| |

|| v| | = () ^-1 is a norm

|| x| | =sigma|xi| is a norm

|| x| | =max|xi| is a norm

Generally, a norm gives a way to measure the distance between two vectors.

V1,v2,..., vn if each other = 0, then {v1,v2,..., vn} becomes the orthogonal set of vectors

The vectors in an orthogonal set are all linearly independent.

The canonical orthogonal vector set is the orthogonal set of the unit vector, the canonical orthogonal set = 1, and the vector inside is called the canonical orthogonal base.

Orthogonal matrix: Column vector composition specification orthogonal basis

The important condition of matrix Q is orthogonal matrix is qtq=i, i.e. Q-1=QT

multiplied by an orthogonal matrix, the inner product remains unchanged, that is =

Multiply by an orthogonal matrix, still keep the vector length, namely | | qx| | =|| x| |

Permutation matrix: Rearrange the columns of the unit matrix

If a's column vector constitutes a canonical orthogonal set, the least squares problem is X=ATB

Projection P=uutb of vectors b to s in non 0 subspace S, where U is a set of canonical orthogonal bases of S, where Uut is a projection matrix to s

The continuous function is approximated by the polynomial with no more than n times, and the least squares approximation can be used.

The subspace of a linear function within a range of values, and the inner product form is the integral of the product of two functions within the range of values.

The method of calculating the discrete Fourier coefficients d by multiplying the FN by the vector z is called the DFT algorithm (discrete Fourier transform).

FFT (Fast Fourier transform), using matrix block, faster than the discrete Fourier transform 8w multiple times

Gramm-Schmitt orthogonal process: u1= (1/| | x1| |) X1, u2= (1/| | x2-p1| |) (X2-P1), ... The direct finding of a set of canonical orthogonal bases

Gramm-Schmitt QR decomposition: m*n matrix A If the rank is n, then a can be decomposed into qr,q as the orthogonal matrix of the column vectors, R is the upper triangular matrix, and the diagonal elements are positive, the specific algorithm:

r11=| | a1| |, where R11 is the first element of the first column of the diagonal matrix, A1 is the column vector of A,

rkk=| | Ak-p (k-1) | |, Rik=qitak, A1=R11Q1

The least squares solution of ax=b is X=R-1QTB, in which QR is the factorization matrix, and the solution of X can be solved by the RX=QTB

Using polynomial to fit data and approximate continuous function can be simplified by selecting a set of orthogonal basis of approximation function.

Polynomial sequence P0 (x), p1 (x),... Subscript is the highest number of times, if = 0, then {PN (x)} becomes an orthogonal polynomial sequence, if = 1, is called the canonical orthogonal polynomial sequence

Classical orthogonal polynomial: Legendre polynomial, Chebyshev polynomial, Jacques ratio polynomial, Almite polynomial, Laguerre polynomials term formula, and so on

Legendre polynomial: In the inner product

=-1 to 1 of the integral P (x) q (x) dx in the sense of orthogonal, (n+1) p (n+1) (x) = (2n+1) xpn (x)-NP (n-1) (x)

Chebyshev: In-product

=-1 to 1 of integral P (x) Q (x) (1-X2) -1/2dx in the sense of orthogonal, T1 (x) =xt0 (x), T (n+1) (x) =2xtn (x)-T (n-1) (x)

Lagrange Interpolation formula: P (x) =sigma F (xi) Li (x)

Lagrange function li (x) = (X-XJ) continuous product/(XI-XJ) product

The integral of f (x) W (x) in a to B can be reduced to the integral F (xi) of Sigma Li (x) w (x) in A to B

Please respect original, reprint please indicate source website www.shareditor.com and original link address

Characteristic value

After matrix transformation, the vector remains unchanged, and the stable vector is called the steady state vector of the process.

There is a Non-zero x that makes ax=λx, then it is called λ as a eigenvalue, and X is a eigenvectors that belongs to λ. A characteristic value is a scaling factor that represents the natural frequency of the operator of a linear transformation.

SubSpace N (a-λi) is called the characteristic space of the corresponding eigenvalue λ

Det (a-λi) =0 is called the characteristic equation of matrix A, and solving the characteristic equation can calculate the λ

Λ1λ2 ... Λn=det (A), that is, the continuous product of all eigenvalues equals the value of the determinant of matrix A

Sigmaλi = Sigma aII, sum of all eigenvalues and equal to the diagonal elements of matrices

The diagonal element of a and the trace called A (trace), recorded as TR (a)

Similarity matrix: b=s-1as

The similarity matrix has the same feature polynomial, and the same characteristic value

The solution of linear differential equation can be obtained by eigenvalue eigenvector, the solution of the shape like y ' =ay, y (0) =y0 is AE (λt) x, where x is a vector, the problem is called the initial value problem, and if there are multiple eigenvalues, the solution can be a linear combination of multiple AE (λt) x

Any higher order differential equation can be transformed into a first order differential equation, and the first order differential equation can be solved by eigenvalue eigenvector.

The characteristic vectors of the different eigenvalues of matrix A are linearly independent

If x makes X-1ax=d,d a diagonal matrix, it says that A is diagonalization, that x is diagonally, and that X is called the diagonalization matrix.

If a has n linearly independent eigenvectors, a can be diagonally

The column vector of the diagonal matrix X is the eigenvector of a, and the diagonal element of D is the eigenvalues of A, X and D are not unique, multiplied by a scalar, or rearranged, all a new

An=xdnx-1, so by a=xdx-1 decomposition, it is easy to compute power

If a has less than n linearly unrelated eigenvectors, it is called a degenerate (defective), and the degenerate matrix is not diagonal.

Geometrical understanding of eigenvalues and eigenvectors: matrix A has a characteristic value of 2, the characteristic space is E3 spanned, and the geometrical weight (geometric multiplicity) is 1

Matrix B has a characteristic value of 2, the eigenvector has two x= (2,1,0) and E3, as the geometric weight (geometric multiplicity) is 2

Stochastic process: A test sequence in which the output of each step is determined by the probability

Markov process: The possible output set or state is limited; the next output depends only on the previous step, and the probability is constant relative to the time.

If 1 is the live eigenvalue of the transfer matrix A, then the Markov chain converges to the stationary vector

A Markov process with a transfer matrix of a, if the elements of a power of a are all positive, they are called regular (regular)

PageRank algorithm can be regarded as browsing the Web page is a Markov process, the steady-state vector to get the PageRank value of each page

Singular value of a (singlular value) decomposition: Decomposition of A to a product uσvt, where u, v are orthogonal matrices, σ matrix of all elements under the diagonal 0, diagonal elements to reduce one by one, the value on the diagonal is called singular value

The rank of a is equal to the number of non 0 singular values

The singular value of a is equal to the root of the eigenvector

If A=UΣVT, then above atuj=σjvj, below Atuj=0, where VJ is called a right singular vector, uj is called the left singular vector

Singular value decomposition in the form of compression: u1= (u1,u2,..., ur), v1= (V1,v2,..., VR), a=u1σ1v1t

Singular value decomposition problem solving process: First calculate the characteristics of ATA, so as to calculate the singular value, at the same time, calculate the eigenvector, from the eigenvector to obtain the orthogonal matrix V, the N (at) of a group of the base and into the canonical orthogonal base, composed of u, and finally came to A=UΣVT

Numerical rank is the rank in the finite-bit precision calculation, not the exact rank, generally assuming a very small epsilon value, if the singular value is less than it is 0, so as to calculate the numerical rank

The matrix used to store the image to do singular value decomposition and then remove the smaller singular value to get smaller rank matrix, realize the compressed storage

It can greatly improve the retrieval efficiency and reduce the error to get rid of the approximate matrix of the small singularity in information retrieval.

Two times: Each of the two-time equations associated with the vector function f (x) =xtax, that is, the ax2+2bxy+cy2 part of the two-time equation

Ax2+2bxy+cy2+dx+ey+f=0 graph is a conic section, if there is no solution is called a virtual conic, if only a point, line, two straight lines, then called degenerate conic, non-degenerate conic section is circle, ellipse, parabola, hyperbola

A two-second equation about X and y can be written as xtax+bx+f=0, where A is 2*2 symmetric, B is the 1*2 matrix, and if a is singular, by rotating and moving the axes, the simplification is λ1 (x ') 2+λ2 (y ') 2+f ' = 0, where λ1 and λ2 are eigenvalues of a. If a is singular and has only one eigenvalue of zero, simplify to λ1 (x ') 2+e ' y ' +f ' = 0 or λ2 (x ') 2+d ' x ' +f ' =0

Two times f (x) =xtax for all x is a symbol, is called a fixed (definite), if the symbol is positive, it is called positive definite (positive definite), relative should be called negative (negative definite), if the symbol is different is called indefinite ( indefinite), if possible = 0, then is called semidefinite (positive semidefinite), and partially negative (negative semidefinite)

If the two-time positive definite is called a is positive definite

The point where the first order bias exists and is 0 is called the stationary point, the minimum point or the maximum point or the saddle point depends on whether a is positive definite negative or uncertain.

A symmetric matrix is positive definite if and only if all of its eigenvalues are positive

R-Order Former master matrix: a matrix that deletes n-r rows and columns

If a is a symmetric positive definite matrix, then a can be decomposed into LDLT, where L is the lower triangle, the diagonal element is the diagonal matrix and the diagonal element is positive 1,d

If a is a symmetric positive definite matrix, then a can be decomposed into llt, where l is the lower triangular, and its diagonal elements are positive

Symmetric matrices are equivalent to conclusion: A is positive definite, and the former principal matrix is positive definite; A can only use row operations into the upper triangular, and the main element is positive; A has a Leski decomposition llt (where l is the lower triangular matrix and its diagonal element is positive); A can be decomposed into a product BTB, where B is a nonsingular matrix

Nonnegative matrix: All elements are greater than or equal to 0

A nonnegative matrix A, if the subscript set {1,2,..., n} can be divided into I1 and I2, so that when I belongs to I1 and J belongs to I2, the aij=0 is the irreducible one.

Numerical linear algebra

Rounding error (round off error): The difference between the rounded floating-point number X ' and the original Count X

Absolute error: × ' x '

Relative error: (x '-)/x, usually represented by a symbol δ, |δ| can be limited by a normal number of ε, called Machine precision (machine epsilon)

Gauss elimination method involves the least arithmetic operation, so it is considered as the most efficient calculation method.

Solution ax=b steps: Multiply a n elementary matrix to get the upper triangular matrix U, the elementary matrix inverse multiplication to get l, then A=lu, where l is the lower triangular matrix, once the simplification is triangular form, LU decomposition is determined, then the solution equation is as follows: Lux=b, make Y=ux, then ly=b, So we can find the y,y and solve the ux=y by finding the lower trigonometric equation, then we get the X

The Frobenius norm of the Matrix | | | | F, to find the square root of the sum of all its elements

If the singular value of a is decomposed a=uσvt, then | | a| | 2=σ1 (maximum singular value)

Matrix norm can be used to estimate the sensitivity of linear equations to small changes in coefficient matrices

The X ' =ax ' and B's approach to Chengdu to examine the accuracy, r=b-b ' =b-ax ' is called residuals (residual), | | r| | /|| b| | It's called relative residuals.

Singular value is a measure of the approximate singularity of a matrix, the closer The matrix is, the more morbid the singularity.

The household transformation (householder transformation) Matrix H can be obtained by vector V and scalar β, so storage V and β are more province-space

The principal eigenvalue is the maximum eigenvalue.

The method of finding the principal eigenvalue: Power method.

To find the eigenvalue method: QR algorithm. The decomposition of A is a product q1r1, in which the Q1 is orthogonal, the R1 is the upper triangular, a2=q1taq1=r1q1, the A2 is decomposed into q2r2, the definition of A3=Q2TA2Q2=R2Q2 is continued, and the similarity matrix sequence Ak=qkrk is obtained, and eventually converges to a similar triangular matrix, The diagonal block on the diagonal is 1*1 or 2*2, and the eigenvalues of the diagonal block are the eigenvalues of a.

The final summary

Singular value decomposition is a destructor of this linear transformation, a=, and is two sets of orthogonal unit vectors, is a diagonal matrix, which represents a singular value, which means that the role of a matrix is to rotate a vector from the space of the orthogonal base vector to the set of orthogonal base vector spaces, and to scale each direction in a certain way, and the scaling factor is the singular value. If the dimension is larger, the projection is also indicated. Singular value decomposition can be said to describe a complete function/feature of a matrix.

In fact, the eigenvalue decomposition only describes some functions of the matrix. Eigenvalue, the eigenvector is obtained by Ax=x, which indicates that if a vector v is in the direction of a eigenvector, then the linear transformation of AV to V is only a scaling. In other words, the process of finding eigenvectors and eigenvalues, we have found some direction in which matrix A's rotation and scaling transformation of vectors (because the eigenvalues are only for square matrices, so there is no projection transformation) to some extent offset, becomes a "zoom" (the scaling ratio may not be the same as in the singular value decomposition).

To sum up, eigenvalue decomposition only tells us that in that direction of the eigenvector, the linear variation of the matrix is equivalent to a simple scaling, but not in other directions, so I say it only represents some of the properties of the matrix. The singular value decomposition of the original implied in the matrix of the rotation, scaling, projection three functions clearly resolved, expressed, it is a complete feature of the Matrix analysis.

Original address