Directory
- Linear Algebra
- I. Basic Knowledge
- Ii. vector operations
- Iii. Matrix Operations
Linear Algebra 1. Basic Knowledge
- All vectors in this book are in the form of column vectors:
\ [\ Mathbf {\ VEC x} = (x_1, x_2, \ cdots, X_n) ^ t = \ begin {bmatrix} X_1 \ X_2 \ vdots \ x_n \ end {bmatrix} \] all the moments in this book \ (\ mathbf x \ In \ mathbb R ^ {M \ times n }\) are:
\ [\ Mathbf x =\begin {bmatrix} X _ {1, 1} & X _ {1, 2} & \ cdots & X _ {1, n} \ x _ {2, 1} & X _ {2, 2} & \ cdots & X _ {2, N }\\ vdots & \ ddots & \ vdots \ x _ {M, 1} & X _ {m, 2} & \ cdots & X _ {M, n} \ end {bmatrix} \] is abbreviated as \ (X _ {I, j}) _ {M \ times n }\) or \ ([X _ {I, j}] _ {M \ times n }\).
- Matrix
F
NORM: Set moment \ (\ mathbf A = (A _ {I, j}) _ {M \ times n} \), thenF
The norm is \ (| \ mathbf A | _ f = \ SQRT {\ sum _ {I, j} A _ {I, j} ^ {2 }}\).
It is the extension of vector \ (L_2 \) norms.
- Matrix Trace: Set moment \ (\ mathbf A = (A _ {I, j}) _ {M \ times n }\), $ \ mathbf a $ trace is \ (TR (\ mathbf A) = \ sum _ {I} A _ {I, I }\).
Trace properties:
- \ (\ Mathbf \)
F
Norm \ (\ mathbf A \ mathbf a ^ t \) the square root of the trace \ (| \ mathbf A | _ f = \ SQRT {tr (\ mathbf A \ mathbf a ^ {t })}\).
- \ (\ Mathbf A \) trace \ (\ mathbf a ^ t \) trace \ (TR (\ mathbf A) = tr (\ mathbf a ^ {t }) \).
- Exchange Law: Assume \ (\ mathbf A \ In \ mathbb R ^ {M \ times n}, \ mathbf B \ In \ mathbb R ^ {n \ times m }\), then \ (TR (\ mathbf A \ mathbf B) = tr (\ mathbf B \ mathbf )\).
- Combination law \ (TR (\ mathbf A \ mathbf B \ mathbf c) = tr (\ mathbf C \ mathbf A \ mathbf B) = tr (\ mathbf B \ mathbf C \ mathbf )\).
Ii. vector operations
- A group of \ (\ mathbf {\ VEC v} _ 1, \ mathbf {\ VEC v} _ 2, \ cdots, \ mathbf {\ VEC v} _ n \) linear Correlation: refers to the existence of a group of incomplete zero solid \ (A_1, A_2, \ cdots, a_n \), make \ (\ sum _ {I = 1} ^ {n} a_ I \ mathbf {\ VEC v} _ I = \ mathbf {\ VEC 0 }\).
A group of \ (\ mathbf {\ VEC v} _ 1, \ mathbf {\ VEC v} _ 2, \ cdots, \ mathbf {\ VEC v} _ n \) is linearly independent. When and only \ (a_ I = 0, I =, \ cdots, n, \ (\ sum _ {I = 1} ^ {n} a_ I \ mathbf {\ VEC v} _ I = \ mathbf {\ VEC 0 }\).
- The maximum number of Linear Independent Vectors contained in a vector space is called the dimension of the vector space.
Point product \ (\ mathbf {\ VEC u} \ cdot \ mathbf {\ VEC v} = U _ xv_x + u_yv_y + u_zv_z = | \ mathbf {\ VEC u} | | \ mathbf {\ VEC v} | \ cos (\ mathbf {\ VEC u }, \ mathbf {\ VEC v })\).
- Cross Product of 3D vectors:
\ [\ Mathbf {\ vec w }=\ mathbf {\ VEC u} \ times \ mathbf {\ VEC v }=\ begin {bmatrix} \ mathbf {\ VEC I }& \ mathbf {\ vec j} & \ mathbf {\ VEC k} \ u_x & u_y & u_z \ v_x & v_y & v_z \ end {bmatrix} \] its \ (\ mathbf {\ VEC I }, \ mathbf {\ vec j}, \ mathbf {\ VEC k} \) unit vectors of the \ (X, Y, Z \) axis respectively.
\ [\ Mathbf {\ VEC u} = u_x \ mathbf {\ VEC I} + u_y \ mathbf {\ VEC J} + u_z \ mathbf {\ VEC k }, \ quad \ mathbf {\ VEC v} = v_x \ mathbf {\ VEC I} + v_y \ mathbf {\ VEC J} + v_z \ mathbf {\ VEC k} \]?
- $ \ Mathbf {\ VEC u} $ and \ (\ mathbf {\ VEC v} \) cross products are perpendicular to \ (\ mathbf {\ VEC u }, \ mathbf {\ VEC v.
- The modulus of the cross product is equal to the area of the parallelogram consisting of \ (\ mathbf {\ VEC u}, \ mathbf {\ VEC v} \).
- \ (\ Mathbf {\ VEC u} \ times \ mathbf {\ VEC v} =-\ mathbf {\ VEC v} \ times \ mathbf {\ VEC u }\)
- $ \ Mathbf {\ VEC u} \ times (\ mathbf {\ VEC v} \ times \ mathbf {\ VEC w }) = (\ mathbf {\ VEC u} \ cdot \ mathbf {\ VEC w }) \ mathbf {\ VEC v}-(\ mathbf {\ VEC u} \ cdot \ mathbf {\ VEC v}) \ mathbf {\ VEC w} $
- Hybrid product of 3D vectors:
\ [[\ Mathbf {\ VEC u} \; \ mathbf {\ VEC v }\; \ mathbf {\ VEC w}] = (\ mathbf {\ VEC u} \ times \ mathbf {\ VEC v }) \ cdot \ mathbf {\ vec w }=\ mathbf {\ VEC u} \ cdot (\ mathbf {\ VEC v} \ times \ mathbf {\ VEC w }) \\=\begin {vmatrix} u_x & u_y & u_z \ v_x & v_y & v_z \ w_x & w_y & w_z \ end {vmatrix} =\begin {vmatrix} u_x & v_x & w_x \ u_y & v_y & w_y \ u_z & v_z & w_z \ end {vmatrix} \] its physical meaning is: \ (\ mathbf {\ VEC u}, \ mathbf {\ VEC v}, \ mathbf {\ VEC w} \) is a parallel six-sided edge Body volume. When \ (\ mathbf {\ VEC u}, \ mathbf {\ VEC v}, \ mathbf {\ VEC w} \) constitutes the right hand system, the volume of the parallel cubes is positive.
Returns the union of two vectors: two directions \ (\ mathbf {\ VEC x} = (x_1, x_2, \ cdots, X_n) ^ {t }, \ mathbf {\ VEC y} = (y_1, Y_2, \ cdots, y_m) ^ {t} \), then the vector's parallel vector is recorded:
\ [\ Mathbf {\ VEC x} \ mathbf {\ VEC y }=\begin {bmatrix} x_1y_1 & x_1y_2 & \ cdots & x_1y_m \ x_2y_1 & x_2y_2 & \ cdots & x_2y_m \ vdots & \ ddots & \ vdots \ x_ny_1 & x_ny_2 & \ cdots & x_ny_m \ end {bmatrix} \] \ (\ mathbf {\ VEC x} \ otimes \ mathbf {\ VEC y }\) or \ (\ mathbf {\ VEC x} \ mathbf {\ VEC y} ^ {t }\).
Iii. Matrix Operations
- Given two moments \ (\ mathbf A = (A _ {I, j}) \ In \ mathbb R ^ {M \ times n }, \ mathbf B = (B _ {I, j}) \ In \ mathbb R ^ {M \ times n} \), definition:
- Adamachai
Hadamard product
(Also called element-by-element product ):
\ [\ Mathbf A \ CIRC \ mathbf B =\begin {bmatrix} A _ {1, 1} B _ {1, 1} & A _ {1, 2} B _ {1} & \ cdots & A _ {1, n} B _ {1, n} \ A _ {2, 1} B _ {2, 1} & A _ {2, 2} B _ {2} & \ cdots & A _ {2, n} B _ {2, N }\\\ vdots & \ ddots & \ vdots \ A _ {M, 1} B _ {M, 1} & A _ {m, 2} B _ {m, 2} & \ cdots & A _ {m, n} B _ {M, n} \ end {bmatrix} \]
- Kroninji
Kronnecker product
:
\ [\ Mathbf A \ otimes \ mathbf B =\begin {bmatrix} A _ {1, 1} \ mathbf B & A _ {1, 2} \ mathbf B & \ cdots & _{ 1, n} \ mathbf B \ A _ {2, 1} \ mathbf B & A _ {2} \ mathbf B & \ cdots & A _ {2, n} \ mathbf B \ vdots & \ ddots & \ vdots \ A _ {M, 1} \ mathbf B & A _ {M, 2} \ mathbf B & \ cdots & A _ {m, n} \ mathbf B \ end {bmatrix} \]
\ (\ Mathbf {\ VEC x}, \ mathbf {\ VEC a}, \ mathbf {\ VEC B}, \ mathbf {\ VEC c} \) \ (n \) order vector \ (\ mathbf A, \ mathbf B, \ mathbf C, \ mathbf x \) \ (n \) square matrix, then:
\ [\ Frac {\ partial (\ mathbf {\ VEC a} ^ {t} \ mathbf {\ VEC x })} {\ partial \ mathbf {\ VEC x }=\ frac {\ partial (\ mathbf {\ VEC x }^{ t} \ mathbf {\ VEC })} {\ partial \ mathbf {\ VEC x }=\ mathbf {\ VEC a} \] \ [\ frac {\ partial (\ mathbf {\ VEC a} ^ {t} \ mathbf x \ mathbf {\ VEC B })} {\ partial \ mathbf x }=\ mathbf {\ VEC a} \ mathbf {\ vec B }^{ t }=\ mathbf {\ VEC a} \ otimes \ mathbf {\ VEC B} \ In \ mathbb R ^ {n \ times n} \] \ [\ frac {\ partial (\ mathbf {\ VEC a} ^ {t} \ mathbf x ^ {t} \ mathbf {\ VEC B })} {\ partial \ mathbf x }=\ mathbf {\ VEC B} \ mathbf {\ vec a }^{ t }=\ mathbf {\ VEC B} \ otimes \ mathbf {\ VEC a} \ In \ mathbb R ^ {n \ times n} \] \ [\ frac {\ partial (\ mathbf {\ VEC a} ^ {t} \ mathbf x \ mathbf {\ VEC })} {\ partial \ mathbf x }=\ frac {\ partial (\ mathbf {\ VEC a} ^ {t} \ mathbf x ^ {t} \ mathbf {\ VEC })} {\ partial \ mathbf x }=\ mathbf {\ VEC a} \ otimes \ mathbf {\ VEC a} \] \ [\ frac {\ partial (\ mathbf {\ VEC} ^ {t} \ mathbf x \ mathbf {\ VEC B })} {\ partial \ mathbf x} = \ mathbf X (\ mathbf {\ VEC a} \ otimes \ mathbf {\ VEC B} + \ mathbf {\ VEC B} \ otimes \ mathbf {\ VEC }) \] \ [\ frac {\ partial [(\ mathbf A \ mathbf {\ VEC x} + \ mathbf {\ VEC }) ^ {t} \ mathbf C (\ mathbf B \ mathbf {\ VEC x} + \ mathbf {\ VEC B})]} {\ partial \ mathbf {\ VEC x }=\ mathbf a ^ {t} \ mathbf C (\ mathbf B \ mathbf {\ VEC x} + \ mathbf {\ VEC B }) + \ mathbf B ^ {t} \ mathbf C (\ mathbf A \ mathbf {\ VEC x} + \ mathbf {\ VEC }) \] \ [\ frac {\ partial (\ mathbf {\ VEC x} ^ {t} \ mathbf A \ mathbf {\ VEC x })} {\ partial \ mathbf {\ VEC x }=( \ mathbf A + \ mathbf a ^ {t }) \ mathbf {\ VEC x} \] \ [\ frac {\ partial [(\ mathbf x \ mathbf {\ VEC B} + \ mathbf {\ VEC c }) ^ {t} \ mathbf A (\ mathbf x \ mathbf {\ VEC B} + \ mathbf {\ VEC c})]} {\ partial \ mathbf x} = (\ mathbf A + \ mathbf a ^ {t }) (\ mathbf x \ mathbf {\ VEC B} + \ mathbf {\ VEC c }) \ mathbf {\ VEC B} ^ {t} \] \ [\ frac {\ partial (\ mathbf {\ VEC B} ^ {t} \ mathbf x ^ {t }\ mathbf A \ mathbf x \ mathbf {\ VEC c })} {\ partial \ mathbf x }=\ mathbf a ^ {t} \ mathbf x \ mathbf {\ VEC B} \ mathbf {\ VEC c} ^ {t} + \ mathbf \ mathbf x \ mathbf {\ VEC c} \ mathbf {\ VEC B} ^ {t} \]
- For example, if \ (f \) is a mona1 function, then:
- The element-by-element vector function is \ (f (\ mathbf {\ VEC x}) = (f (X_1), F (X_2), \ cdots, F (x_n )) ^ {t }\).
- Its Matrix-by-matrix function is:
\ [F (\ mathbf X) = \ begin {bmatrix} f (x _ {}) & F (X }) & \ cdots & F (X _ {1, n}) \ f (x _ {2, 1}) & F (X _ {2, 2 }) & \ cdots & F (X _ {2, n}) \ vdots & \ ddots & \ vdots \ f (x _ {M, 1 }) & F (X _ {m, 2}) & \ cdots & F (X _ {m, n}) \ end {bmatrix} \]
- The Yuan-by-yuan derivatives are:
\ [F ^ {\ Prime} (\ mathbf {\ VEC x}) = (f ^ {\ Prime} (X1), f ^ {\ Prime} (X2 ), \ cdots, f ^ {\ Prime} (x_n) ^ {t} \ f ^ {\ Prime} (\ mathbf X) = \ begin {bmatrix} f ^ {\ Prime} (X _ {}) & f ^ {\ Prime} (X }) & \ cdots & f ^ {\ Prime} (X _ {1, n}) \ f ^ {\ Prime} (X _ {2, 1 }) & f ^ {\ Prime} (X _ {2}) & \ cdots & f ^ {\ Prime} (X _ {2, n }) \ vdots & \ ddots & \ vdots \ f ^ {\ Prime} (X _ {M, 1 }) & f ^ {\ Prime} (X _ {m, 2}) & \ cdots & f ^ {\ Prime} (X _ {m, n }) \ end {bmatrix} \]
- Partial Derivatives of various types:
- The partial derivative of the scalar \ (\ frac {\ partial u} {\ partial v }\).
- Scalar vector \ (n \) dimension vector) partial Derivative \ (\ frac {\ partial U }{\ partial \ mathbf {\ VEC v }}= (\ frac {\ partial U }{\ partial v_1 }, \ frac {\ partial u} {\ partial V_2}, \ cdots, \ frac {\ partial u} {\ partial v_n}) ^ {t }\).
- The partial derivative of the scalar pair matrix \ (M \ times n \) Order Matrix:
\ [\ Frac {\ partial u} {\ partial \ mathbf v }=\ begin {bmatrix} \ frac {\ partial U }{\ partial V _ {1, 1 }}&\ frac {\ partial u} {\ partial V _ {1, 2 }}& \ cdots & \ frac {\ partial u} {\ partial V _ {1, N }}\\ frac {\ partial U }{\ partial V }}&\ cdots & \ frac {\ partial u} {\ partial V _ {2, N }}\\ vdots & \ ddots & \ vdots \\\\ frac {\ partial U }{\ partial V _ {M, 1 }}& \ frac {\ partial U }{\ partial V _ {M, 2 }}& \ cdots & \ frac {\ partial U }{\ partial V _ {M, N }}\ end {bmatrix} \]
- Vector \ (M \) dimension vector) partial Derivative of scalar \ (\ frac {\ partial \ mathbf {\ vec u }{\ partial v }=( \ frac {\ partial U_1} {\ partial v }, \ frac {\ partial u_2} {\ partial v}, \ cdots, \ frac {\ partial u_m} {\ partial v}) ^ {t }\).
- Vector \ (M \) dimension vector) partial derivative of vector \ (n \) dimension vector (KNN matrix, row first)
\ [\ Frac {\ partial \ mathbf {\ vec u }{\ partial \ mathbf {\ VEC v }}=\ begin {bmatrix} \ frac {\ partial U_1 }{ \ partial v_1} & \ frac {\ partial U_1} {\ partial V_2} & \ cdots & \ frac {\ partial U_1} {\ partial v_n} \ frac {\ partial u_2} {\ partial v_1} & \ frac {\ partial u_2} {\ partial V_2} & \ cdots & \ frac {\ partial u_2} {\ partial v_n} \ vdots & \ vdots & \ ddots & \ vdots \ frac {\ partial u_m} {\ partial v_1} & \ frac {\ partial u_m} {\ partial V_2} & \ CDO TS & \ frac {\ partial u_m} {\ partial v_n} \ end {bmatrix} \] if it is column-first, it is the transpose of the above matrix.
- Matrix \ (M \ times n \) Order Matrix) partial derivative of scalar
\ [\ Frac {\ partial \ mathbf u} {\ partial v }=\ begin {bmatrix} \ frac {\ partial U _ {} {\ partial v }&\ frac {\ partial U _ {1, 2 }}{\ partial v} & \ cdots & \ frac {\ partial U _ {1, N }}{\ partial v }\\\ frac {\ partial U _ {}}{\ partial v} & \ frac {\ partial U }}{\ partial v} & \ cdots & \ frac {\ partial U _ {2, N }}{\ partial v }\\\ vdots & \ ddots & \ vdots \\\ frac {\ partial U _ {M, 1 }}{\ partial v} & \ frac {\ partial U _ {M, 2 }}{\ partial v} & \ cdots & \ frac {\ partial U _ {M, N }}{\ partial v} \ end {bmatrix} \]
- For the trace of a matrix, the following partial derivatives are true:
\ [\ Frac {\ partial [tr (f (\ mathbf X)]} {\ partial \ mathbf x} = (f ^ {\ Prime} (\ mathbf X )) ^ {t} \] \ [\ frac {\ partial [tr (\ mathbf A \ mathbf x \ mathbf B)]} {\ partial \ mathbf x }=\ mathbf a ^ {t} \ mathbf B ^ {t} \] \ [\ frac {\ partial [tr (\ mathbf A \ mathbf x ^ {t} \ mathbf B)]} {\ partial \ mathbf x }=\ mathbf B \ mathbf A \] \ [\ frac {\ partial [tr (\ mathbf A \ otimes \ mathbf X)]} {\ partial \ mathbf x} = tr (\ mathbf) \ mathbf I \] \ [\ frac {\ partial [tr (\ mathbf A \ mathbf x \ mathbf B \ mathbf X)]} {\ partial \ mathbf x }=\ mathbf a ^ {t} \ mathbf x ^ {t} \ mathbf B ^ {t} + \ mathbf B ^ {t} \ mathbf x \ mathbf a ^ {t} \] \ [\ frac {\ partial [tr (\ mathbf x ^ {t} \ mathbf B \ mathbf x \ mathbf C)]} {\ partial \ mathbf x} = (\ mathbf B ^ {t} + \ mathbf B) \ mathbf x \ mathbf C ^ {t} \] \ [\ frac {\ partial [tr (\ mathbf C ^ {t} \ mathbf x ^ {t }\ mathbf B \ mathbf x \ mathbf C)]} {\ partial \ mathbf x }=\ mathbf B \ mathbf x \ mathbf C ++ \ mathbf B ^ {t} \ mathbf x \ mathbf C ^ {t} \] \ [\ frac {\ partial [tr (\ mathbf A \ mathbf x \ mathbf B \ mathbf x ^ {t} \ mathbf C)]} {\ partial \ mathbf x }=\ mathbf a ^ {t} \ mathbf C ^ {t} \ mathbf x \ mathbf B ^ {t} + \ mathbf C \ mathbf \ mathbf x \ mathbf B \] \ [\ frac {\ partial [tr (\ mathbf A \ mathbf x \ mathbf B + \ mathbf C) (\ mathbf A \ mathbf x \ mathbf B ++ \ mathbf C)]} {\ partial \ mathbf x} = 2 \ mathbf a ^ {t} (\ mathbf A \ mathbf x \ mathbf B + \ mathbf C) \ mathbf B ^ {t} \]
False \ (\ mathbf u = f (\ mathbf X) \) is the matrix value function \ (F: \ mathbb R ^ {M \ times n} \ rightarrow \ mathbb R ^ {M \ times n} \), \ (G (\ mathbf U )\) is the real value function $ G: \ mathbb R ^ {M \ times n} \ rightarrow \ mathbb r $ of \ (\ mathbf U \), the following chain rule is true:
\ [\ Frac {\ partial g (\ mathbf U) }{\ partial \ mathbf x }=\ left (\ frac {\ partial g (\ mathbf U )} {\ partial X _ {I, j }}\ right) _ {M \ times N }=\ begin {bmatrix} \ frac {\ partial g (\ mathbf U )} {\ partial X _ {1, 1 }}& \ frac {\ partial g (\ mathbf U )} {\ partial X _ {1, 2 }}& \ cdots & \ frac {\ partial g (\ mathbf U) }{\ partial X _ {1, N }}\\ frac {\ partial g (\ mathbf U) }{\ partial X _ {2, 1 }}& \ frac {\ partial g (\ mathbf U )} {\ partial X _ {2 }}& \ cdots & \ frac {\ partial g (\ mathbf U) }{\ partial X _ {2, N }}\\ vdots & \ ddots & \ vdots \\\\ frac {\ partial g (\ mathbf U) }{\ partial X _ {M, 1 }}& \ frac {\ partial g (\ mathbf U) }{\ partial X _ {M, 2 }}& \ cdots & \ frac {\ partial g (\ mathbf U) }{\ partial X _ {M, N }}\\ end {bmatrix }\\=\ left (\ sum _ {k} \ sum _ {L} \ frac {\ partial g (\ mathbf U )} {\ partial U _ {K, L }}\ frac {\ partial U _ {K, L }}{\ partial X _ {I, j }}\ right) _ {M \ times N }=\ left (tr \ left [\ left (\ frac {\ partial g (\ mathbf U) }{\ partial \ mathbf u} \ right) ^ {t} \ frac {\ partial \ mathbf u} {\ partial X _ {I, j }}\ right] \ right) _ {M \ times n} \]
?
This article reposted self-China School Teachers blog, blog address: http://www.huaxiaozhuan.com/
[Mathematical basics of machine learning] basics of Linear Algebra