This paper realizes PCA principal component analysis based on SVD singular matrix decomposition, uses this algorithm to complete the recognition of human face image, mainly explains the principle of SVD to realize PCA, how to use SVD to realize the dimensionality reduction of image features, and the application of SVD in text clustering, such as weakening synonyms, polysemy, and so on. Solve the problem that traditional text vector space cannot solve.
Demand Analysis
This experiment needs analysis is very simple, realizes the human face image recognition, simply imitates the Google, the Baidu's figure search function.
feature Representation
Features are used to distinguish which category A thing belongs to, such as "whether there are wings" to distinguish between sparrows and huskies. How to determine whether the two picture is the same person, we can select the image of the pixel matrix to form a feature vector, such as a 180 * 200 jpg image, according to its pixel matrix can constitute a eigenvector vector[180 * 200], with the eigenvector we can use Euclidean distance, The cosine similarity calculates the similarity of two graphs to determine whether they belong to the same person.
Feature Selection
Directly using the pixel matrix to form the eigenvector, when the image is very large, the calculation will inevitably take time Limited, we refer to the pixel as the original feature. How to reduce the original eigenvector and preserve its main characteristics, we can associate PCA with principal component analysis. In PCA principal component analysis, there is a singular matrix decomposition SVD algorithm, the original matrix can be decomposed into M = U x S x V ', and then select the first k important features to form a new feature vector to complete the dimensionality reduction. For example, in the example above, k=20 can be composed of a new eigenvector VEC ' [180 * 20 + 200 * 20], the computational amount is greatly reduced without affecting the comparison results. Next, let's take a look at how SVD is a thing.
SVD introductory basic three-step curve:
(1) Mathematics in Machine learning (5)-powerful matrix singular value decomposition (SVD) and its application
(2)"translation" from a geometrical perspective of SVD
(3) singular value decomposition and image compression
After reading the above 3, we probably know what the SVD principle is, the matrix U consists of the left singular eigenvector (describing the trend of the y-axis direction), the matrix S is the importance of the left (right) eigenvector, The Matrix V ' is composed of the right singular eigenvector (describing the direction of the x-axis trend).
It is known from the SVD principle that we can select the front R left singular eigenvector and the front R right singular eigenvector to form a new eigenvector vec[m * k + n * k], so as to complete the dimensionality reduction, reduce the computational amount and bless the original face image characteristics.
Model Building
After the selection of the model eigenvector, we can compare the similarity of the two images of human faces. In the calculation of image similarity, this paper uses the cosine similarity algorithm to compare, the algorithm code implementation is as follows:
Pca.java
package key;import java.awt.image.bufferedimage;import java.io.file;import Java.io.ioexception;import javax.imageio.imageio;import jama.*;p ublic class pca { public double cosine (DOUBLE[]&NBSP;V1,&NBSP;DOUBLE[]&NBSP;V2) { double ans = 0; int n = v1.length; double a = 0; for (int i = 0; i < n; i++) { a += v1[i] * v2[i]; } double b = 0; for (int i = 0; i < n; i++) { b += v1[i] * v1[i]; } b = math.sqrt (b); double c = 0; for (int i = 0; i < n; i++) { c += v2[i] * v2[i]; } c = math.sqrt (c); ans = a / b / c; return ans; } public double[] pcavector (Double[][] pixels, int k) &NBSP;{&NBSP;&NBSP;&NBSp; matrix ps = matrix.constructwithcopy (pixels); if (Ps.getrowdimension () < ps.getcolumndimension ()) { ps = ps.transpose (); } SINGULARVALUEDECOMPOSITION&NBSP;SVD&NBSP;=&NBSP;PS.SVD (); Matrix u = svd.getu (); matrix s = svd.gets (); matrix vt = svd.getv (). Transpose (); double[] vec = new double[ U.getrowdimension () * k + vt.getcolumndimension () * k]; int cur = 0; for (int i = 0; i < k; i++) { for (int j = 0; j < u.getrowdimension (); j++) { vec[cur + i * u.getrowdimension () + j] = u.get (j, i); } } cur += U.getrowdimension () * k; for (int i = 0; i < k; i++) { for (iNt j = 0; j < vt.getcolumndimension (); j++) { vec[cur + i * vt.getcolumndimension () + j] = vt.get (i, j); } } return vec; } public Double[][] pixels (File file) throws IOException { bufferedimage bi = imageio.read (file); int h = bi.getheight (); int w = bi.getwidth (); double arr[][] = new double[w][h]; for (int i = 0; i < w; i++) { for (int j = 0; j < h; j++) { arr[i][j] = bi.getrgb (I,&NBSP;J); } } return arr; }}
Keyven.java
package key;import java.io.file;import java.io.ioexception;public class keyven { public static void main (String[] args) throws IOEXCEPTION&NBSP;{&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;PCA&NBSP;PK&NBSP;=&NBSP;NEW&NBSP;PCA (); int k = 20; file key = new file ("data/key.jpg"); &NBSP;DOUBLE[]&NBSP;VEC_KEY&NBSP;=&NBSP;PK. Pcavector (PK. Pixels (key), k); file folder = new File ("Data/faces"); if (Folder.isdirectory ()) { for (file f : Folder.listfiles ()) { &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;DOUBLE[]&NBSP;VEC_F&NBSP;=&NBSP;PK. Pcavector (PK. Pixels (f), k); system.out.println (F.getname () + ": \ T" &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;+&NBSP;PK. Cosine (vec_key, vec_f)); } } }}
Experimental Results
The original image set and candidate search images are as follows:
Let's take a look at the experimental results:
The ^_^ is quite good.
Test Data Set Download
Experiment Expansion
The left singular eigenvector represents the change trend of the y-axis, the right singular vector represents the change trend of the x-axis, and in the natural language processing text clustering, there is a document-term co-occurrence matrix, using SVD as principal component analysis, it is possible to make the orthogonal vectors in the original co-existing matrix become orthogonal, thus weakening the synonyms, The effect of polysemy, rather than simply relying on whether a word appears in a document to create features ...
Realization of PCA based on SVD image recognition