Physical significance of High-number feature vectors

Source: Internet
Author: User

[1. mathematical significance of features]
First, we examine a linear change. For example, the elliptic equation in the X and Y coordinate systems can be written as x ^ 2/A ^ 2 + y ^ 2/B ^ 2 = 1, after the coordinate system rotates the origin, the elliptic equation will undergo transformation. We can multiply the (x, y) of the original coordinate system by a matrix to obtain a new (x', y') representation, written as an operator in the form of (x, y, y) * m = (x', y '). The matrix m here represents a linear transformation: stretching, translation, and rotation. So, is there any linear transformation of B (B is a vector) that makes the transformed result look like and Let (X, Y) * B is like a number B multiplied by a number M * B? In other words, is there such vector B that the linear transformation of matrix A * B is equivalent to the projection of M * B on vector B? If yes, B is a feature vector of A, and M is a corresponding feature value. A matrix can contain many feature vectors. Feature values can be obtained using feature equations, and feature vectors can be obtained through the equations corresponding to feature values, which is the same in turn. For example, if a is set to a cubic real symmetric matrix, a1 = (a,-A, 1) T is the solution of AX = 0, a2 = (A, 1,-) T is the solution of (a + E) x = 0. If a is less than 2, the constant A =? Because a1 = (a,-A, 1) T is the solution of AX = 0, it means that a1 = (a,-A, 1) T is a 0 feature vector of, a2 = (A, 1,-a) T is the solution of (a + E) x = 0, indicating a2 = (A, 1,-) T is a-1 feature vector of. The real symmetric matrix belongs to the feature vector orthogonal with different feature values, so a ^ 2-a-a = 0, A = 2, so a = 0.
It is still too abstract. Specifically, finding the relationship between feature vectors is to perform Orthogonal Decomposition of the Space represented by matrix, so that the vector set of a can represent the projection length of each vector A on each feature vector. For example, if A is a matrix of M * n and N> m, then the feature vector is m (because the maximum rank is m), and N row vectors are projected on each feature vector E, its feature value V is the weight. Now, each row vector can be written as VN = (E1 * v1n, e2 * v2n... Em * vmn), and the matrix becomes a square matrix. If the rank of the matrix is smaller, the storage of the matrix can be compressed. Furthermore, because the projection size represents the projection of each component of a in the feature space, we can use the least 2 multiplication to find the components with the largest projection energy, remove the remaining components to save the information represented by the matrix to the maximum extent, and greatly reduce the dimensions to be stored in the matrix, or the PCA method for short.
For example, for a point (x, y) on the X and Y planes, I perform linear transformation on it, (x, y) * [; 0, -1]. The semicolon indicates the line feed of the matrix. The result is (x,-y). This linear transformation is equivalent to mirroring the X axis. We can find two feature vectors in the matrix [; 0,-1], [] and [], that is, the X axis and Y axis. What does it mean? The projection on the X axis is not changed after this linear transformation. The projection on the Y axis is multiplied by the amplitude coefficient-1 and does not rotate. The two feature vectors indicate that the linear transformation matrix is linear for the x-axis and Y-axis orthogonal bases. For other linear transformation matrices, we can also find n Symmetry Axes. The transformed results will not change linearly about these N symmetry axes. The N symmetry axes are the N feature vectors of linear transformation. This is the physical meaning of feature vectors. Therefore, matrix A is equivalent to linear transformation.
In actual matrix algorithms, the inverse of the matrix is often required: When the matrix is not a square matrix, there is no solution. This requires Singular Value Decomposition, that is, a = PSQ, p and q are reciprocal matrices, while S is a matrix, and then the pseudo-inverse value can be obtained. At the same time, a = PSQ can be used to reduce the storage dimension of a, As long as P is a thin long matrix and Q is a wide and flat matrix. In a very large case, the storage capacity can be reduced by several orders of magnitude.

[2. physical significance]
What are the physical meanings of feature vectors? For example, if a standing wave passes through a rope, each point on the rope forms an infinite vector. The feature vector of this vector is the feature function sin (t), because it is time-varying, it becomes a feature function. Each vertex feature value is the sin (x + T) value of each vertex at a specific time point. Another example is that although the coordinates of each scene are constantly changing from a certain angle in space, this transformation has symmetry on the autobiography axis of the Earth, that is, the coordinate transformation of the pan and stretch of this axis is not sensitive. Therefore, the self-rotating axis of the earth is a feature vector of the spatial transformation of the Earth's rotation. Google's PageRank is the correction of the adjacent matrix of WWW links. The projection component of the main feature vector gives the page split. What are the features? AB and Ba have the same feature vector ---- if the feature vector of AB is X and the corresponding feature value is B, then (AB) x = Bx, take the left multiplication matrix B on both sides of the above formula and obtain B (AB) x = (BA) (BX) = B (BX). Therefore, B is the characteristic value of BA, the corresponding feature vector is BX. And vice versa.
What are feature matrices and feature values? We use the general theory to consider that P (A) = (1, 2, 3) is the three feature vectors of. P (A ^ 2) is (1 ^ 2, 2 ^ 2, 3 ^ 2). P can be considered as an operator. Of course, the features of operators need to be proved in detail in Part/in detail. Once proved, it can be used as a whole feature. What are the characteristics of feature values? It means that the matrix can be decomposed into n-dimensional feature vector projection. The N feature values represent the length of each Projection Direction. Since N * n matrix A can be projected into an orthogonal vector space, any matrix composed of n-dimensional feature vectors can be a linear projection transformation matrix, I is a linear transformation projection matrix. Therefore, for the feature value m, it must be enough to generate a matrix without linear irrelevant vectors, where AA = MA is used to obtain AA = Mai through the multiplication of both sides, so (A-mi) if a = 0 has a non-zero solution, then | A-mi | = 0 (the arc method can be used. If this determinant is not 0, then n vectors are linearly independent, in an n-dimensional space, it can only be at the origin and cannot have a non-zero solution ). So some useful properties can be introduced, such as a = [1/2, 1/5;/;,], if the value of | A-mi | = 0 is the feature value, it is obvious that the feature value array can be obtained immediately (1/2, 1/3, 1/5 ). If the rank of a n * n matrix is 1, the largest linear independent group is 1, and the feature vector is 1. Any n-dimensional non-zero vector is a feature vector of. The feature vector itself is not dead, just like the coordinate system can be rotated. Once each direction of the feature vector is determined, the feature value vector is also determined. The process of finding the feature value is to use the feature equation: | A-Me | = 0, P (1/a) = 1/P (A), which can be proved. What is the physical meaning? A vector with n-dimensional linear independence removes one dimension, so at least two vectors are linearly related, so the deciding factor is 0. What is the role of the feature matrix? Change the Matrix to a positive definite matrix, that is, a = P ^-1bp. In this transformation, A is a diagonal matrix.
The Study of linear algebra is to take vector and matrix as a whole, starting from the nature of some, to the overall nature, and then obtain various application and physical concepts from the overall nature. When matrix A is a symbol, its nature is very similar to that of real number. Scientific theorem seems always recursive. For another example, the basic concepts of high numbers include differential, integral, and reciprocal. Now I can think of three median theorems, namely, differential, integral, and reciprocal.

[3. application scenarios]
Disadvantages of linear transformation: linear transformation PCA can be used to process images. For example, 2D portrait recognition:
1. We regard image A as a matrix, and further as a linear transformation matrix. We can find out the feature matrix of this training image (assuming n feature vectors with the largest energy ). Multiply A by the N feature vectors to obtain an n-dimensional vector A, that is, the projection of a in the feature space.
2. in the future, the same class of images (for example, from the face photos of the same person) will be recognized as linear correlation images of A, multiplied by this feature vector, obtain a vector B composed of N numbers, that is, the projection of B in the feature space. The distance between A and B is the criterion for us to judge whether B is.
However, PCA has a natural disadvantage, that is, the linear vector correlation test has the "Translation independence" advantage while completely ignoring it. In a two-dimensional graph, the order between vector components is meaningful. Different sequences can represent completely different information. In addition, image B must be scaled by a (determined by the feature vector space) to be well projected into the feature vector space of, if B contains a certain rotation factor in a, PCA can be permanently invalidated. Therefore, in actual application, the PCA method is used for image recognition, and the recognition rate is not high. It requires that the image have certain strict direction alignment and normalization. Therefore, PCA is generally used for dimensionality reduction of the feature matrix instead of directly extracting features. Of course, the result of dimensionality reduction is not ideal for classification. We can further perform the Fisher transformation of the distance between classes in the least square. However, the Fisher transform will introduce new weaknesses, that is, the data in the training category becomes more sensitive. The cost of increasing the classification effect is the decline in universality. When the number of types expands sharply, the classification function is still straight down-but it is much better than the direct PCA classification function. PCA "subjective" holds that the n + 1 matrix of a type can be linearly expressed by the known [1, N] matrices by pulling them into vectors. Obviously, this is just a good subjective wish, because even if the new input matrix is the original matrix, some primary transformations of columns and columns, such as exchanges, this linear table after straightening may not exist at all (two-dimensional PCA cannot overcome this objective non-existent setting). Therefore, when applied to reality, we can only try to optimize it. We can use the least square distance to determine. "Think" that matrix belongs to a certain category. Since the feature matrix trained by PCA is a matrix of classes and a matrix, the subspaces composed of these matrices cannot guarantee the orthogonal design, so the projection results do not have the fundamental classification feature. This algorithm is a practical algorithm, but theoretically there is no solution at all.
K-L transformation is an application form of PCA. Assume that image type C has n images, then each image is dropped into a vector, and the vectors of n images form a matrix to obtain the feature vectors (column vectors) of the matrix ). Then we use the original n images multiplied by these column vectors to obtain the average value, which is our feature image. We can see that the feature image is similar to the original image, but some deformation information related to stretching and translation is removed. While being robust, it sacrifices a lot of accuracy. Therefore, it is suitable for verification of images in a specific range, that is, to determine whether image P belongs to type C. Comparison of neural networks: To put it bluntly, the ing of the Function Y = f (x) is changed to the vector ing of [Y] = [F (x. The input and output entries are fixed. The real neural system does not clearly distinguish between internal processing and external interfaces. Therefore, all neural networks are named neural networks, which are essentially far behind each other.

[4. Spectrum]
What is spectrum )? We know that music is a dynamic process, but the music score is on paper and static. For mathematical analysis tools, the time-varying function tools can be used to study the frequency spectrum corresponding to Fourier transformation. For probability problems, although the results of each projection are different, however, the power spectral density of probability distribution can be obtained. As a metaphysical tool, mathematics focuses on the unchanging laws in the changing world.
 
[5. Can it be used for classification?]
The so-called feature matrix is how the original matrix is similar to an X-Dimensional Quantitative matrix. Lamda (I) illustrates the I axis of a similar projection and an x-dimensional linear space. Lamda (I) is a scale-down ratio. The order between Lamda (I) is not important because the interchange between coordinate axes is elementary linear transformation and does not affect the properties of the algebra topology. Feature vector XI shows how a projects a linear combination to a coordinate axis. A feature vector is a set of orthogonal basis sets.
When the image is regarded as a matrix in the problematic domain of image processing, the classification problem of the image is that the similar matrix is considered to have the same or algebraic Approximation "invariant ". Obviously, "similar" is a class defined by subjective assumptions, rather than a class "determined" by computation. This leads to a problem. The so-called different types mean that the subjective Comprehension Ability of a person is a prior, not a posterior obtained through computation, it does not represent any deterministic information in the mathematical logic. If the feature vectors or feature matrix of a matrix are used as the classification information, there is no evidence that different "classes" matrices can have more approximate feature values. The so-called matrix decomposition method, the intra-class least distance method (Fisher), has an unpleasant premise, that is, it must ensure the intra-class matrix, the Euclidean distance is small enough-the Euclidean distance is often different from the human's geometric topology ). Because the matrix itself does not have predefined topology information, when the Euclidean distance between similar images is increased, it cannot be well classified. At the same time, the more classes of images are divided, the more severe the overlap between these subspaces. in a timely manner, we can look for Linearly unchanged sub-spaces or factors from the sub-spaces of each category, nor can this overlap be eliminated-the Fisher algorithm tries to bypass the past, but it has paid the cost of relying heavily on initial data and the cost of losing universality. The PCA algorithm tries to obtain the best classification in the statistical sense, but when the number of types increases, the previous parameters will be voided and no useful computing flow can be obtained. As the overlapping spaces cannot be solved, the classification will continue to decline. Why? It is because classification itself is not obtained based on the algebraic characteristics of linear transformation itself, but a prior non-linear "smart" human judgment. Therefore, binary computation is a cooperative classification of discrete sets, which must be performed in Orthogonal division of linear space. This leads to a logical irreconcilable paradox. Non-linear determination is continuous, geometric topology, infinite vertices, non-separated variables, and cannot be modeled at all. Therefore, it is an unidentifiable problem.
So without the idea of Higher Algebra, can the practical signal processing method extract local features for classification? This still does not answer the question of "A prior" classification, and is still trying to find a way to barely use it on a bad premise. How does one know that the local part of a Matrix actually corresponds to the local location of another matrix? This is still a subjective and intuitive judgment! A computer is just a deformation of paper and pen, and it cannot understand the meaning-even if the result of an operation like 1 + 1 = 2, it cannot determine whether it is right or wrong. If it asks other computers to determine whether it is right or not-how can other computers prove itself right or wrong? No. You have to wait for the "person" of a subject to observe the result. This result will become meaningful. So just like the cat of schörödnex, she smiled at me lazily in the sun. The metaphysical theory is subtle, and it is not beyond the cage of empirical doctrine.
Therefore, I no longer need algorithms or philosophy.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.