Geometric significance of feature vectors
For a long time, I have never understood the significance of the matrix feature values and feature vectors (it is estimated that many of them feel the same way ). I know its mathematical formula, but I can't find its geometric meaning. in textbooks, I didn't really instantiate this concept from various perspectives to explain it, it's just a day or night.
According to the definition of the feature vector mathematical formula, the result of multiplying a matrix by a vector is still a vector of the same dimension. Therefore, matrix multiplication corresponds to a transformation, which converts one vector to another vector of the same dimension, so what is the transformation effect? Of course, this is closely related to the structure of the square matrix. For example, we can take an appropriate two-dimensional square matrix, so that the effect of this transformation is to rotate the two-dimensional vector on the plane by 30 degrees counterclockwise, in this case, we can ask a question: is there a vector that does not change the direction under this transformation? You can think about it. Except for the zero vector, there are no other vectors that can rotate for 30 degrees on the plane without changing the direction. Therefore, this transformation corresponds to the matrix (or this transformation itself) there is no feature vector (Note: The feature vector cannot be a zero vector). Therefore, a specific transformed feature vector is such a vector. After such a specific transformation, the direction remains unchanged, it's just about scaling the length. (Let's look at the original definition of feature vector AX = Cx. CX is the result of transformation of the vector X by square matrix, but obviously, CX and X are in the same direction ).
Here is a simple example of a feature vector. For example, a transformation on the plane performs an image symmetric transformation on the horizontal axis of a vector, that is, the horizontal coordinates of a vector remain unchanged, but the vertical coordinates take the opposite number, this transformation is represented as the matrix [1 0; 0-1] (semicolon indicates line feed), apparently [1 0; 0-1] * [a B] '= [A-B, what is the feature vector of this matrix? Think about the vectors that remain in the same direction under this transformation. Obviously, the vectors on the horizontal axis remain in the same direction under this transformation (remember that this transformation is an image symmetric transformation, that mirror surface (on the horizontal axis) of course, it will not change), so we can directly guess that its feature vector is [a 0] '(A is not 0). Is there anything else? Yes, that is, the vector on the vertical axis. After the transformation, the direction is reversed, but it is still on the same axis, so it is considered that the direction is not changed, therefore, [0 B] '(B is not 0) is also its feature vector.
In summary, the feature value only reflects the scaling multiple of the feature vectors during the transformation. For a transformation, the direction specified by the feature vector is very important, and the feature value does not seem so important. However, when we reference spectral theorem, the situation is different.
The core content of spectral theorem is as follows: a linear transformation (represented by matrix multiplication) can be expressed as a linear combination of all its feature vectors, the linear coefficient is the feature value corresponding to each vector. The formula is as follows:
From this we can see that a transformation (matrix) can be fully represented by all its feature vectors, and the feature values corresponding to each vector, it represents the contribution rate of the matrix in this aspect. The common point is power. At this point, the feature value is turned over as the master, and the initiative of the feature vector is thoroughly grasped: what do you think you can say that the energy level of this matrix is in my hands?
We know that a transformation can be represented by a matrix multiplication, so a spatial coordinate system can be regarded as a matrix, and this coordinate system can be represented by all the feature vectors of this matrix, if it is represented by a graph, it can be imagined that it is a coordinate angle of the open space. This group of vectors can completely represent the "Features" of the Space represented by a matrix ", their feature values represent the energy of each angle (as you can imagine, the longer the axis, the more it represents the space, its "feature" is stronger or more explicit, and the short axis is naturally a hidden feature). Therefore, the feature vector/value can fully describe the characteristics of a geometric space, feature vectors and feature values can be used in ry (especially space ry) and their applications.
There are too many applications of feature vectors (especially feature values). Recently, for example, the PCA method I mentioned previously, selects K feature vectors with the highest feature values to represent a matrix, in this way, the Dimensionality Reduction Analysis and feature display methods can be achieved. Close to, for example, Google's PageRank, it is also used to calculate the feature vector of a matrix-based graph (which represents the association between the "nodes" of each web page) to divide each node by the "feature value; for example, there are many applications in Face Recognition and data stream mode mining and analysis. If you are interested, refer to several articles by IBM Spiros on vldb '05 and sigmod' 06.
Feature vectors are both mathematical, physical, material, and mechanical (stress and strain tensor, laomei once said in a line generation book that "feature values and feature vectors exist in a place where there is vibration", which is truly awesome + chilling ......
Physical Meaning of feature conversion:
1. mathematical significance of features]
First, we examine a linear change. For example, the elliptic equation in the X and Y coordinate systems can be written as x ^ 2/A ^ 2 + y ^ 2/B ^ 2 = 1, after the coordinate system rotates the origin, the elliptic equation will undergo transformation. We can multiply the (x, y) of the original coordinate system by a matrix to obtain a new (x', y') representation, written as an operator in the form of (x, y, y) * m = (x', y '). The matrix m here represents a linear transformation: stretching, translation, and rotation. So, is there any linear transformation of B (B is a vector) that makes the transformed result look like and Let (X, Y) * B is like a number B multiplied by a number M * B? In other words, is there such vector B that the linear transformation of matrix A * B is equivalent to the projection of M * B on vector B? If yes, B is a feature vector of A, and M is a corresponding feature value. A matrix can contain many feature vectors. Feature values can be obtained using feature equations, and feature vectors can be obtained through the equations corresponding to feature values, which is the same in turn. For example, if a is set to a cubic real symmetric matrix, a1 = (a,-A, 1) T is the solution of AX = 0, a2 = (A, 1,-) T is the solution of (a + E) x = 0. If a is less than 2, the constant A =? Because a1 = (a,-A, 1) T is the solution of AX = 0, it means that a1 = (a,-A, 1) T is a 0 feature vector of, a2 = (A, 1,-a) T is the solution of (a + E) x = 0, indicating a2 = (A, 1,-) T is a-1 feature vector of. The real symmetric matrix belongs to the feature vector orthogonal with different feature values, so a ^ 2-a-a = 0, A = 2, so a = 0.
It is still too abstract. Specifically, finding the relationship between feature vectors is to perform Orthogonal Decomposition of the Space represented by matrix, so that the vector set of a can represent the projection length of each vector A on each feature vector. For example, if A is a matrix of M * n and N> m, then the feature vector is m (because the maximum rank is m), and N row vectors are projected on each feature vector E, its feature value V is the weight. Now, each row vector can be written as VN = (E1 * v1n, e2 * v2n... Em * vmn), and the matrix becomes a square matrix. If the rank of the matrix is smaller, the storage of the matrix can be compressed. Furthermore, because the projection size represents the projection of each component of a in the feature space, we can use the least 2 multiplication to find the components with the largest projection energy, remove the remaining components to save the information represented by the matrix to the maximum extent, and greatly reduce the dimensions to be stored in the matrix, or the PCA method for short.
For example, for a point (x, y) on the X and Y planes, I perform linear transformation on it, (x, y) * [; 0, -1]. The semicolon indicates the line feed of the matrix. The result is (x,-y). This linear transformation is equivalent to mirroring the X axis. We can find two feature vectors in the matrix [; 0,-1], [] and [], that is, the X axis and Y axis. What does it mean? The projection on the X axis is not changed after this linear transformation. The projection on the Y axis is multiplied by the amplitude coefficient-1 and does not rotate. The two feature vectors indicate that the linear transformation matrix is linear for the x-axis and Y-axis orthogonal bases. For other linear transformation matrices, we can also find n Symmetry Axes. The transformed results will not change linearly about these N symmetry axes. The N symmetry axes are the N feature vectors of linear transformation. This is the physical meaning of feature vectors. Therefore, matrix A is equivalent to linear transformation.
In actual matrix algorithms, the inverse of the matrix is often required: When the matrix is not a square matrix, there is no solution. This requires Singular Value Decomposition, that is, a = PSQ, p and q are reciprocal matrices, while S is a matrix, and then the pseudo-inverse value can be obtained. At the same time, a = PSQ can be used to reduce the storage dimension of a, As long as P is a thin long matrix and Q is a wide and flat matrix. In a very large case, the storage capacity can be reduced by several orders of magnitude.
[2. physical significance]
What are the physical meanings of feature vectors? For example, if a standing wave passes through a rope, each point on the rope forms an infinite vector. The feature vector of this vector is the feature function sin (t), because it is time-varying, it becomes a feature function. Each vertex feature value is the sin (x + T) value of each vertex at a specific time point. Another example is that although the coordinates of each scene are constantly changing from a certain angle in space, this transformation has symmetry on the autobiography axis of the Earth, that is, the coordinate transformation of the pan and stretch of this axis is not sensitive. Therefore, the self-rotating axis of the earth is a feature vector of the spatial transformation of the Earth's rotation. Google's PageRank is the correction of the adjacent matrix of WWW links. The projection component of the main feature vector gives the page split. What are the features? AB and Ba have the same feature vector ---- if the feature vector of AB is X and the corresponding feature value is B, then (AB) x = Bx, take the left multiplication matrix B on both sides of the above formula and obtain B (AB) x = (BA) (BX) = B (BX). Therefore, B is the characteristic value of BA, the corresponding feature vector is BX. And vice versa.
What are feature matrices and feature values? We use the general theory to consider that P (A) = (1, 2, 3) is the three feature vectors of. P (A ^ 2) is (1 ^ 2, 2 ^ 2, 3 ^ 2). P can be considered as an operator. Of course, the features of operators need to be proved in detail in Part/in detail. Once proved, it can be used as a whole feature. What are the characteristics of feature values? It means that the matrix can be decomposed into n-dimensional feature vector projection. The N feature values represent the length of each Projection Direction. Since N * n matrix A can be projected into an orthogonal vector space, any matrix composed of n-dimensional feature vectors can be a linear projection transformation matrix, I is a linear transformation projection matrix. Therefore, for the feature value m, it must be enough to generate a matrix without linear irrelevant vectors, where AA = MA is used to obtain AA = Mai through the multiplication of both sides, so (A-mi) if a = 0 has a non-zero solution, then | A-mi | = 0 (the arc method can be used. If this determinant is not 0, then n vectors are linearly independent, in an n-dimensional space, it can only be at the origin and cannot have a non-zero solution ). So some useful properties can be introduced, such as a = [1/2, 1/5;/;,], if the value of | A-mi | = 0 is the feature value, it is obvious that the feature value array can be obtained immediately (1/2, 1/3, 1/5 ). If the rank of a n * n matrix is 1, the largest linear independent group is 1, and the feature vector is 1. Any n-dimensional non-zero vector is a feature vector of. The feature vector itself is not dead, just like the coordinate system can be rotated. Once each direction of the feature vector is determined, the feature value vector is also determined. The process of finding the feature value is to use the feature equation: | A-Me | = 0, P (1/a) = 1/P (A), which can be proved. What is the physical meaning? A vector with n-dimensional linear independence removes one dimension, so at least two vectors are linearly related, so the deciding factor is 0. What is the role of the feature matrix? Change the Matrix to a positive definite matrix, that is, a = P ^-1bp. In this transformation, A is a diagonal matrix.
The Study of linear algebra is to take vector and matrix as a whole, starting from the nature of some, to the overall nature, and then obtain various application and physical concepts from the overall nature. When matrix A is a symbol, its nature is very similar to that of real number. Scientific theorem seems always recursive. For another example, the basic concepts of high numbers include differential, integral, and reciprocal. Now I can think of three median theorems, namely, differential, integral, and reciprocal.
[3. application scenarios]
Disadvantages of linear transformation: linear transformation PCA can be used to process images. For example, 2D portrait recognition:
1. We regard image A as a matrix, and further as a linear transformation matrix. We can find out the feature matrix of this training image (assuming n feature vectors with the largest energy ). Multiply A by the N feature vectors to obtain an n-dimensional vector A, that is, the projection of a in the feature space.
2. in the future, the same class of images (for example, from the face photos of the same person) will be recognized as linear correlation images of A, multiplied by this feature vector, obtain a vector B composed of N numbers, that is, the projection of B in the feature space. The distance between A and B is the criterion for us to judge whether B is.
However, PCA has a natural disadvantage, that is, the linear vector correlation test has the "Translation independence" advantage while completely ignoring it. In a two-dimensional graph, the order between vector components is meaningful. Different sequences can represent completely different information. In addition, image B must be scaled by a (determined by the feature vector space) to be well projected into the feature vector space of, if B contains a certain rotation factor in a, PCA can be permanently invalidated. Therefore, in actual application, the PCA method is used for image recognition, and the recognition rate is not high. It requires that the image have certain strict direction alignment and normalization. Therefore, PCA is generally used for dimensionality reduction of the feature matrix instead of directly extracting features. Of course, the result of dimensionality reduction is not ideal for classification. We can further perform the Fisher transformation of the distance between classes in the least square. However, the Fisher transform will introduce new weaknesses, that is, the data in the training category becomes more sensitive. The cost of increasing the classification effect is the decline in universality. When the number of types expands sharply, the classification function is still straight down-but it is much better than the direct PCA classification function. PCA "subjective" holds that the n + 1 matrix of a type can be linearly expressed by the known [1, N] matrices by pulling them into vectors. Obviously, this is just a good subjective wish, because even if the new input matrix is the original matrix, some primary transformations of columns and columns, such as exchanges, this linear table after straightening may not exist at all (two-dimensional PCA cannot overcome this objective non-existent setting). Therefore, when applied to reality, we can only try to optimize it. We can use the least square distance to determine. "Think" that matrix belongs to a certain category. Since the feature matrix trained by PCA is a matrix of classes and a matrix, the subspaces composed of these matrices cannot guarantee the orthogonal design, so the projection results do not have the fundamental classification feature. This algorithm is a practical algorithm, but theoretically there is no solution at all.
K-L transformation is an application form of PCA. Assume that image type C has n images, then each image is dropped into a vector, and the vectors of n images form a matrix to obtain the feature vectors (column vectors) of the matrix ). Then we use the original n images multiplied by these column vectors to obtain the average value, which is our feature image. We can see that the feature image is similar to the original image, but some deformation information related to stretching and translation is removed. While being robust, it sacrifices a lot of accuracy. Therefore, it is suitable for verification of images in a specific range, that is, to determine whether image P belongs to type C. Comparison of neural networks: To put it bluntly, the ing of the Function Y = f (x) is changed to the vector ing of [Y] = [F (x. The input and output entries are fixed. The real neural system does not clearly distinguish between internal processing and external interfaces. Therefore, all neural networks are named neural networks, which are essentially far behind each other.
[4. Spectrum]
What is spectrum )? We know that music is a dynamic process, but the music score is on paper and static. For mathematical analysis tools, the time-varying function tools can be used to study the frequency spectrum corresponding to Fourier transformation. For probability problems, although the results of each projection are different, however, the power spectral density of probability distribution can be obtained. As a metaphysical tool, mathematics focuses on the unchanging laws in the changing world.
[5. Can it be used for classification?]
The so-called feature matrix is how the original matrix is similar to an X-Dimensional Quantitative matrix. Lamda (I) illustrates the I axis of a similar projection and an x-dimensional linear space. Lamda (I) is a scale-down ratio. The order between Lamda (I) is not important because the interchange between coordinate axes is elementary linear transformation and does not affect the properties of the algebra topology. Feature vector XI shows how a projects a linear combination to a coordinate axis. A feature vector is a set of orthogonal basis sets.
When the image is regarded as a matrix in the problematic domain of image processing, the classification problem of the image is that the similar matrix is considered to have the same or algebraic Approximation "invariant ". Obviously, "similar" is a class defined by subjective assumptions, rather than a class "determined" by computation. This leads to a problem. The so-called different types mean that the subjective Comprehension Ability of a person is a prior, not a posterior obtained through computation, it does not represent any deterministic information in the mathematical logic. If the feature vectors or feature matrix of a matrix are used as the classification information, there is no evidence that different "classes" matrices can have more approximate feature values. The so-called matrix decomposition method, the intra-class least distance method (Fisher), has an unpleasant premise, that is, it must ensure the intra-class matrix, the Euclidean distance is small enough-the Euclidean distance is often different from the human's geometric topology ). Because the matrix itself does not have predefined topology information, when the Euclidean distance between similar images is increased, it cannot be well classified. At the same time, the more classes of images are divided, the more severe the overlap between these subspaces. in a timely manner, we can look for Linearly unchanged sub-spaces or factors from the sub-spaces of each category, nor can this overlap be eliminated-the Fisher algorithm tries to bypass the past, but it has paid the cost of relying heavily on initial data and the cost of losing universality. The PCA algorithm tries to obtain the best classification in the statistical sense, but when the number of types increases, the previous parameters will be voided and no useful computing flow can be obtained. As the overlapping spaces cannot be solved, the classification will continue to decline. Why? It is because classification itself is not obtained based on the algebraic characteristics of linear transformation itself, but a prior non-linear "smart" human judgment. Therefore, binary computation is a cooperative classification of discrete sets, which must be performed in Orthogonal division of linear space. This leads to a logical irreconcilable paradox. Non-linear determination is continuous, geometric topology, infinite vertices, non-separated variables, and cannot be modeled at all. Therefore, it is an unidentifiable problem.
So without the idea of Higher Algebra, can the practical signal processing method extract local features for classification? This still does not answer the question of "A prior" classification, and is still trying to find a way to barely use it on a bad premise. How does one know that the local part of a Matrix actually corresponds to the local location of another matrix? This is still a subjective and intuitive judgment! A computer is just a deformation of paper and pen, and it cannot understand the meaning-even if the result of an operation like 1 + 1 = 2, it cannot determine whether it is right or wrong. If it asks other computers to determine whether it is right or not-how can other computers prove itself right or wrong? No. You have to wait for the "person" of a subject to observe the result. This result will become meaningful. So just like the cat of schörödnex, she smiled at me lazily in the sun. The metaphysical theory is subtle, and it is not beyond the cage of empirical doctrine.
Therefore, I no longer need algorithms or philosophy.
Http://blog.csdn.net/xiaojiang0805/article/details/7606222
Foundation of Image Processing-geometric significance of feature vectors