On the implementation of the NG algorithm for spectral clustering

Last Update:2015-12-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Broadly speaking, any method applied to matrix eigenvalue decomposition in learning process is called spectral learning method, such as principal component Analysis (PCA), linear discriminant Component Analysis (LDA), spectral embedding method in manifold learning, spectral clustering and so on.

As a result of the science court to shiming teacher courseware above on Ng's spectral clustering algorithm inside and ng God's paper written in the algorithm, resulting in a night of the algorithm and did not bring up the satisfactory results, today on the Internet to find Ng God's original paper read it again, although there are many do not understand the place, Still have their own opinion. The following is the process of the NG algorithm.

The first step of the algorithm through the Gaussian function to calculate the affinity of each point and other points, with their affinity of 0, for each point to retain the first k_nearest nearest neighbor point Affinity (that is, k_nearest the most affinity point), the other affinity is set to 0, here the K as the parameters of the algorithm passed in

The second step of the algorithm is to calculate the degree matrix D,d matrix is a diagonal matrix, each line of the affinity matrix is summed in the corresponding diagonal element of the degree matrix can be calculated after the diagonal matrix and then construct normalized Laplace matrix, because D is a diagonal matrix, so d^ (-0.5) It can be directly understood as the reciprocal opening of the diagonal element, and then the Laplace matrix

The third step of the algorithm is to solve the eigenvalues and eigenvectors of the Laplace matrix, and select the corresponding eigenvector corresponding to the largest k eigenvalues according to the eigenvalue size (k is the number of classes that gather the data points), and each eigenvector is arranged in columns to form n vectors of k-dimensional space, x belongs to Rn*k

The fourth step of the algorithm is to get the matrix yn*k by the normalized length of the x matrix.

The fifth step of the algorithm is to get the normalized length of the matrix data into the K_means algorithm clustering, the resulting category tag is the original data point category label

The individual's understanding of the algorithm is that mapping raw data points to K-dimensional data spaces facilitates better classification of data.

The following is the algorithm implementation of the MATLAB code, Python code, etc. have time to write and then paste it up.

function [AFTER_CLASS_DATA,CLASS_LABEL,ACC] = spectral_clustering (dataset,class_num,sigma,k_near) Affinity_mat =     Creat_d (DATASET,K_NEAR,SIGMA);           % Create affinity Matrix Laplas = Get_norm_laplas (Affinity_mat);    % get normalized laplace matrix [Eig_val,eig_vec] = Get_special_vector (laplas,class_num);           The characteristic vector corresponding to the former K-large eigenvalue of the Laplace matrix is obtained [label,after_center] = K_means (eig_vec,class_num);   % The eigenvectors are fed into K_means for clustering after_class_data = After_center;class_label = Label;%{n11 = Size (Find (Label (1:100) ==1), 2); N12 = Size (Find (Label (1:100) ==2), 2), N21 = Size (Find (Label (1:100) ==1), 2), N22 = Size (Find (Label (1:100) ==2), 2), n_1 = Max (       N11,N12); n_2 = max (n21,n22); acc = (n_1 + n_2)/size (label,2); % calculate the correct rate of%}% compute affinity Matrix, with K nearest neighbor, the rest of the assignment is 0, each point with its own affinity of 0function Affinity_mat = Creat_d (Dataset,k,sigma) [Row,col] = size (             DataSet);d Is_mat = Zeros (row,row), Index_all = Zeros (row,row),%index_all = [];for i = 1:row for J=1:row if I ~= j        Dis_mat (I,J) = exp ((-sum ((DataSet (i,:)-dataset (J,:)). ^2))/(2*sigma.^2)); End Endendaffinity = dis_mat;for t =1:row [sort_dis,index] = sort (Affinity (t,:), ' descend ');    Index_all (t,:) = Index;endfor II = 1:row Temp_index = Index_all (ii,:);    Temp_clear = Temp_index (K+1:row);    %temp_one = Temp_index (1:k);    Affinity (Ii,temp_clear) = 0;    %affinity (Ii,temp_one) = 1;        Affinity (ii,ii) = 0;endaffinity_mat = Affinity; function Laplas = Get_norm_laplas (affinity_mat) row = size (affinity_mat,1);d u = Zeros (row,row);%col_sum = SUM (affinity_   MAT); for I=1:row du (i,i) = SUM (Affinity_mat (i,:));        % of the Matrix ENDDN = du^ ( -0.5); laplas = Dn*affinity_mat*dn; The laplas% of the normalized value is the K minimum eigenvalues and corresponding eigenvectors function [eig_val,special_vector] = Get_special_vector (laplas,k) Eig_con = Eig (Laplas); [Vector,x] = Eig (Laplas); [Sort_vec,index] = sort (Eig_con, ' descend '); eig_val = Eig_con (Index (1:K)); temp_vector = vector (:, index (1:K));    [Row,col] = size (temp_vector), y = zeros (Row,col), for i=1:row s = (sum (temp_vector (i,:). ^2)). ^ (0.5);  The normalization of the% eigenvectors for J=1:col y (i,j) = Temp_vector (i,j)/s;  Endendspecial_vector = y;

The

k_means Clustering algorithm code is as follows:

function [Class_type,after_center] = K_means (Dataset,class_number) [Data_row,data_col] = size (dataSet);%last_label = Zeros (1,data_row); label = ones (1,data_row); ini_center = Randn (class_number,data_col)%ini_center = [1.1437 0.9726;-0.    5316-0.5223];new_center = zeros (Class_number,data_col); while (SUM (SUM (ABS (Ini_center-new_center) > 1e-5) > 0)    New_center = Ini_center;        For i=1:data_row Min_dis = INF;             Belong_class = 1;        % use this variable to store the category belonging to, for example, 1, the first class ...            for j=1:class_number Cur_dis = SUM ((Ini_center (J,:)-dataset (i,:)). ^2);                If Cur_dis < Min_dis Min_dis = Cur_dis;            Belong_class = j;    End end Label (i) = Belong_class;        End% find the attribution label for each class% recalculate the center point of the attribution class for K=1:class_number Class_index = find (label==k);        n = size (class_index,2);   %{sum_x = SUM (DataSet (class_index,1));        % calculates the sum of X's and sum_y = Sums (DataSet (class_index,2)) in category K; Ini_centER (k,1) = sum_x/n;   Ini_center (k,2) = sum_y/n;    % update the Attribution class's Center%} ini_center (k,:) = SUM (DataSet (Class_index,:))./n; Endendclass_type = Label;after_center = New_center;

Use crescent data to test the classification, set the Sigma parameter to 35, when taking k_nearest=10,20,30, the corresponding classification chart is as follows

K_nearest = 10

K_nearest = 20

K_nearest = 30

With the increase of the nearest neighbor parameter k_nearest, the correct classification decreases gradually. Since the initial point of this K_means algorithm uses RANDN generation, and the initial point selection of the results are affected, so in the run program will appear in the results of the situation is not ideal Nan, these more normal, more than a few times can be more stable classification results. Later, Python will be used to implement the algorithm, followed by uploading Python code.

Feeling: Eigenvalue decomposition, feature vector mapping primitive point to K-dimensional space, how the great God thought of it, too magical, personal mathematical thinking and literacy need to further improve, although the algorithm inside some of the mathematical principles of things are not very understanding, but I will refuel.

On the implementation of the NG algorithm for spectral clustering

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More