Now, the books I face are filled with matrices or vectors everywhere, so matrix analysis is important. Holding this idea, I took the "Matrix Analysis" course. However, I always feel that teachers and textbooks teach matrices in a way that makes me uncomfortable how to speak. But after finishing this course, according to their own ideas to comb through the knowledge, and have a sense of the enlightened. This article will follow my personal thinking to clarify the relationship between the concepts of matrices or causality.
1. Why to introduce a matrix
The problem is well explained and the matrix makes the formula more convenient to express. For this convenience, it is worthwhile to introduce the concept of matrices, such as:
After introducing the basic matrix multiplication, the above equations can be expressed, wherein $A \in \textit{c}^{m\times n}$,$\mathbf{b} x \in \textit{c}^{m}, x \in \textit{c}^{m}$. Of course, the expression of the problem is the key to solving the problem, but our aim is to solve the problem. Take the linear equation group, if we want to solve $x$, and for $a$, existence $b$ make $ab=i$, then obviously have $x=b\mathbf{b}$. However, this condition is too strict, $A $ needs to be a full-rank phalanx.
Easy to be proven, when the above equation $ax=\mathbf{b}$ is a compatible equation (that is, there is a solution), at this time $b$ to meet the condition is $aba=a$. According to the thought of solving the linear equations, the inverse, generalized inverse, pseudo-inverse, and rank of matrices appear naturally. It is necessary to apply the full rank decomposition, norm and other knowledge to the matrix in the solution process.
2. What is the root of matrix computing?
Of course, it is not enough to solve the solution of a linear equation group in the process of calculation. Taking the optimization problem, the basic idea of solving the problem is to use derivation (gradient). If we are simply using a matrix representation of the problem, it is indeed concise, but what is the use of not solving (derivation)? In terms of a representation of the matrix's mere problem, the operation of the Matrix should not be a new algorithm, but should be matched with the calculation of the number .
First consider one of the simplest, as shown below
The matrix (vector) representation is
The derivative is expressed as
The first question is, what is the derivation in the end? All along, we only take a derivative of one variable, the derivative is the slope of the tangent, even the multivariable, is also a biased guide. Of course, another understanding of the derivative is a first-order approximation, i.e.
From this point of order, the quantitative function of the vector (matrix) derivation is actually (matrix derivative has its strict definition, do not explain here)
So the derivation of this simplest example is that this example illustrates the two characteristics of a matrix derivation: the definition is not blind, it is consistent with the derivative of the numerical function, and the solution is actually biased respectively to the elements in the matrix (vector). Of course, this example leads to a new problem--and since we have to be biased in the end, why do we have to write a matrix?
The answer to this question has two points: first, for the derivative of the number function on the matrix, there are basic operational formulas can be used, and some basic derivative is obvious, and the other is that sometimes we can directly use the definition and the nature of the matrix solution, such as the previous time with the knowledge of the problem
It can be solved by the following way
so, back to the title of this section, what is the root of the matrix calculation? Matrix provides a more concise way to describe the problem, using the matrix method to represent the problem to calculate, for the matrix has a corresponding operational rules, this is the matrix calculation. The result of the matrix calculation must be the same as the result of not using matrix calculation, which is the criterion to be followed in the process of matrix calculation and derivation.
In this way, it is not difficult to understand the operation, including the derivation of matrices, integrals, and differential equations. Further expansion, including matrix sequence, series and function calculation also follows this idea.
3. Space-making matrices more than just matrices
Does the existence of a matrix simply mean a more favourable representation of the formula and a simple calculation method? Of course not, otherwise the matrix textbook will not spend so much effort from the linear space discussion. When we review the first section of this article, the solution of the equations is $x=b\mathbf{b}$. The solution is represented by a matrix and a vector, which gives us a new idea-the matrix is not only a concise representation of the formula, but also a matrix that can represent the solution .
One thing is very well understood, and that is if, then
It seems that we have become accustomed to this expression, but have we thought about why we can say so? This expression implies, at a minimum, the following properties, for
is actually a linear space, precisely because it is a linear space, we can use a set of bases to represent elements in this space. Of course, the base is obviously not the only one. To take space, it is a group of bases, and a group of bases. Any constant in space can be expressed as
This leads to two questions: what is the corresponding relationship between the different bases, and which group is better. The conclusion of the first question is simple, if
So the relationship between coordinates is $k_\alpha=pk_\beta$, the relationship between the coordinates of different bases, we can be understood as a linear transformation, namely $f (X) =ax$. The second problem, we must put forward a suitable criterion. With the introduction of a linear transformation, this problem can be converted to another problem-specifically, the matrix $p$ of linear transformations has different expressions for different bases, that is, we need only care about the matrix $p$ is enough. For linear transformations, the following diagram is available on the wiki
Represents the change in the following different transformation matrices, where Blue is the original space, and green is the mapped icon. Then it's up to you to explain "which group is better", depending on what geometry you want your space to meet.
4. Characteristics of feature vectors
Mappings (which refer to linear transformations) tend to make values change, but we always want to look for a constant quantity in a change, called a feature. If a constant quantity is changed after a linear change in the direction of the invariance, that is, then called the eigenvector, the corresponding eigenvalues. From the actual example to talk about "characteristics" is not "characteristic", about this part of the content can refer to the Mathematics of machine Learning (5)-powerful matrix singular value decomposition (SVD) and its application, interpretation is very detailed and has a distance. A minor flaw is the absence of a clarification of the relationship between SVD and eigenvalues, with a small error in the example. The next part of this section will focus on clarifying the correlation of matrix diagonalization.
Solving eigenvalue is the solution, if it is a diagonal array, then there is obviously a characteristic value. It is important to note that the eigenvalues and linear transformations are inseparable, and that the writing is not going to be done. I can't understand it myself ...
Since the original idea has not been written down, then the first simple talk about the relationship between the various diagonalization.
For the same linear transformation, the matrix representation of different bases is satisfied (similar relationship)
The first point is that the diagonal matrix is extremely convenient for a linear transformation, so we want the matrix to be similar to a diagonal array . Unfortunately, not all matrices can be similar to diagonal arrays, so a Jordan standard is proposed. (I haven't had any problems involving Jordan's standard type yet.)
And more so, we don't just want to be similar to diagonal arrays, but there are requirements, after all, the column vectors are eigenvectors. We want a group of bases to be orthogonal and normalized. That is to say, this is the concept of unitary diagonalization . Unitary diagonalization requirements are more stringent and need to be met.
The Hermite matrix satisfies this condition, so for any matrix, we can decompose the singular value obtained by seeking or the unitary diagonalization.
That is to say, on the basis of similar diagonalization to constrain or compromise, as far as possible to find the characteristics of the Matrix . so much of the matrix is about diagonalization, which is probably the purpose.
5. Summary
3.4 Knots are a bit confusing, and the knowledge of matrices needs to be further mastered. Perhaps to understand enough practical problems, in the process of solving the experience will have a deeper understanding.
And the learning process to ask a few why, why to do this and the benefits of doing so, it will be beneficial to study.
If there are any errors or deficiencies in this article, please correct them. And I can't even sum it up ...
Another angle to see matrix analysis