Broadly speaking, any method applied to matrix eigenvalue decomposition in learning process is called spectral learning method, such as principal component Analysis (PCA), linear discriminant Component Analysis (LDA), spectral embedding method in manifold learning, spectral clustering and so on.
As a result of the science court to shiming teacher courseware above on Ng's spectral clustering algorithm inside and ng God's paper written in the algorithm, resulting in a night of the algorithm and did not bring up the satisfactory results, today on the Internet to find Ng God's original paper read it again, although there are many do not understand the place, Still have their own opinion. The following is the process of the NG algorithm.
The first step of the algorithm through the Gaussian function to calculate the affinity of each point and other points, with their affinity of 0, for each point to retain the first k_nearest nearest neighbor point Affinity (that is, k_nearest the most affinity point), the other affinity is set to 0, here the K as the parameters of the algorithm passed in
The second step of the algorithm is to calculate the degree matrix D,d matrix is a diagonal matrix, each line of the affinity matrix is summed in the corresponding diagonal element of the degree matrix can be calculated after the diagonal matrix and then construct normalized Laplace matrix, because D is a diagonal matrix, so d^ (-0.5) It can be directly understood as the reciprocal opening of the diagonal element, and then the Laplace matrix
The third step of the algorithm is to solve the eigenvalues and eigenvectors of the Laplace matrix, and select the corresponding eigenvector corresponding to the largest k eigenvalues according to the eigenvalue size (k is the number of classes that gather the data points), and each eigenvector is arranged in columns to form n vectors of k-dimensional space, x belongs to Rn*k
The fourth step of the algorithm is to get the matrix yn*k by the normalized length of the x matrix.
The fifth step of the algorithm is to get the normalized length of the matrix data into the K_means algorithm clustering, the resulting category tag is the original data point category label
The individual's understanding of the algorithm is that mapping raw data points to K-dimensional data spaces facilitates better classification of data.
The following is the algorithm implementation of the MATLAB code, Python code, etc. have time to write and then paste it up.
function [AFTER_CLASS_DATA,CLASS_LABEL,ACC] = spectral_clustering (dataset,class_num,sigma,k_near) Affinity_mat = Creat_d (DATASET,K_NEAR,SIGMA); % Create affinity Matrix Laplas = Get_norm_laplas (Affinity_mat); % get normalized laplace matrix [Eig_val,eig_vec] = Get_special_vector (laplas,class_num); The characteristic vector corresponding to the former K-large eigenvalue of the Laplace matrix is obtained [label,after_center] = K_means (eig_vec,class_num); % The eigenvectors are fed into K_means for clustering after_class_data = After_center;class_label = Label;%{n11 = Size (Find (Label (1:100) ==1), 2); N12 = Size (Find (Label (1:100) ==2), 2), N21 = Size (Find (Label (1:100) ==1), 2), N22 = Size (Find (Label (1:100) ==2), 2), n_1 = Max ( N11,N12); n_2 = max (n21,n22); acc = (n_1 + n_2)/size (label,2); % calculate the correct rate of%}% compute affinity Matrix, with K nearest neighbor, the rest of the assignment is 0, each point with its own affinity of 0function Affinity_mat = Creat_d (Dataset,k,sigma) [Row,col] = size ( DataSet);d Is_mat = Zeros (row,row), Index_all = Zeros (row,row),%index_all = [];for i = 1:row for J=1:row if I ~= j Dis_mat (I,J) = exp ((-sum ((DataSet (i,:)-dataset (J,:)). ^2))/(2*sigma.^2)); End Endendaffinity = dis_mat;for t =1:row [sort_dis,index] = sort (Affinity (t,:), ' descend '); Index_all (t,:) = Index;endfor II = 1:row Temp_index = Index_all (ii,:); Temp_clear = Temp_index (K+1:row); %temp_one = Temp_index (1:k); Affinity (Ii,temp_clear) = 0; %affinity (Ii,temp_one) = 1; Affinity (ii,ii) = 0;endaffinity_mat = Affinity; function Laplas = Get_norm_laplas (affinity_mat) row = size (affinity_mat,1);d u = Zeros (row,row);%col_sum = SUM (affinity_ MAT); for I=1:row du (i,i) = SUM (Affinity_mat (i,:)); % of the Matrix ENDDN = du^ ( -0.5); laplas = Dn*affinity_mat*dn; The laplas% of the normalized value is the K minimum eigenvalues and corresponding eigenvectors function [eig_val,special_vector] = Get_special_vector (laplas,k) Eig_con = Eig (Laplas); [Vector,x] = Eig (Laplas); [Sort_vec,index] = sort (Eig_con, ' descend '); eig_val = Eig_con (Index (1:K)); temp_vector = vector (:, index (1:K)); [Row,col] = size (temp_vector), y = zeros (Row,col), for i=1:row s = (sum (temp_vector (i,:). ^2)). ^ (0.5); The normalization of the% eigenvectors for J=1:col y (i,j) = Temp_vector (i,j)/s; Endendspecial_vector = y;
The
k_means Clustering algorithm code is as follows:
function [Class_type,after_center] = K_means (Dataset,class_number) [Data_row,data_col] = size (dataSet);%last_label = Zeros (1,data_row); label = ones (1,data_row); ini_center = Randn (class_number,data_col)%ini_center = [1.1437 0.9726;-0. 5316-0.5223];new_center = zeros (Class_number,data_col); while (SUM (SUM (ABS (Ini_center-new_center) > 1e-5) > 0) New_center = Ini_center; For i=1:data_row Min_dis = INF; Belong_class = 1; % use this variable to store the category belonging to, for example, 1, the first class ... for j=1:class_number Cur_dis = SUM ((Ini_center (J,:)-dataset (i,:)). ^2); If Cur_dis < Min_dis Min_dis = Cur_dis; Belong_class = j; End end Label (i) = Belong_class; End% find the attribution label for each class% recalculate the center point of the attribution class for K=1:class_number Class_index = find (label==k); n = size (class_index,2); %{sum_x = SUM (DataSet (class_index,1)); % calculates the sum of X's and sum_y = Sums (DataSet (class_index,2)) in category K; Ini_centER (k,1) = sum_x/n; Ini_center (k,2) = sum_y/n; % update the Attribution class's Center%} ini_center (k,:) = SUM (DataSet (Class_index,:))./n; Endendclass_type = Label;after_center = New_center;
Use crescent data to test the classification, set the Sigma parameter to 35, when taking k_nearest=10,20,30, the corresponding classification chart is as follows
K_nearest = 10
K_nearest = 20
K_nearest = 30
With the increase of the nearest neighbor parameter k_nearest, the correct classification decreases gradually. Since the initial point of this K_means algorithm uses RANDN generation, and the initial point selection of the results are affected, so in the run program will appear in the results of the situation is not ideal Nan, these more normal, more than a few times can be more stable classification results. Later, Python will be used to implement the algorithm, followed by uploading Python code.
Feeling: Eigenvalue decomposition, feature vector mapping primitive point to K-dimensional space, how the great God thought of it, too magical, personal mathematical thinking and literacy need to further improve, although the algorithm inside some of the mathematical principles of things are not very understanding, but I will refuel.
On the implementation of the NG algorithm for spectral clustering