Brief introduction:
This section describes the algorithms in the 12th and 13 episodes of the Stanford Machine learning public class: K-means algorithm, Gaussian mixture model (GMM). (9, 10, 11 episodes do not introduce, skip the ha)
First, K-means algorithm
It belongs to unsupervised learning clustering algorithm, given a set of non-calibrated data (input sample), classify it, the hypothesis can be divided into K class. Because the algorithm is more intuitive, the steps and MATLAB code are given directly. (The K-means algorithm is meaningful in mathematical derivation.)
Matlab code:
%%%k mean cluster clear all;close all;%%n=2;m=200;v0=randn (m/2,2) -1;v1=randn (m/2,2) +1;figure;subplot (221); hold On;plot (V0 (: , 1), V0 (:, 2), ' R. ');p Lot (V1 (:, 1), V1 (:, 2), ' B. '),%axis ([-5 5-5 5]), title (' Classified data '), hold OFF;DATA=[V0;V1];d ata=sortrows (data,1); subplot (222);p Lot (data (:, 1), Data (:, 2), ' G. '), title (' Unclassified Data '),%axis ([-5 5-5 5]);%%[a b]=size (data); M1=data ( M2=data,:);% random center of Gravity Point (k1=zeros),% Random center of gravity (K2=zeros); N1=0;n2=0;subplot (223); hold On;%axis ([-5 5-5 5]); For T=1:10 for I=1:a D1=pdist2 (M1,data (i,:)); D2=pdist2 (M2,data (i,:)); if (D1<D2) k1=k1+data (i,:); n1=n1+1; Plot (data (i,1), data (i,2), ' R. '); else K2=k2+data (i,:); n2=n2+1; Plot (data (i,1), data (i,2), ' B. '); End End M1=k1/n1; m2=k2/n2;% plot (M1, M1, ' G. ');% plot (M2 (n.), m2 (+), ' G. '); (K1=zeros); (K2=zeros); n1=0; N2=0;endplot (M1), M1 (n), ' k* ');p lot (m2 (), M2 (+), ' k* '), title (' K-means cluster '); hold Off
Output (unclassified data is removed from the classified data label, black ※ number indicates a clustering center):
Two, Gaussian mixed model (GMM)
In retrospect, the Gaussian discriminant analysis (GDA) was evaluated by calculating the posterior probability of the sample, which was calculated by assuming the multivariate Gaussian model. The parameters of Gaussian model: mean and covariance are obtained from the calibrated (classified) samples, so it can be regarded as a supervised learning method.
In the GMM model (which belongs to unsupervised learning), given the non-classified m samples (n-dimensional features), the hypothesis can be divided into K classes, which require the GMM algorithm to classify them. If we know the Gaussian parameters of each class, we can calculate the posteriori probability as the GDA algorithm. Unfortunately, Yang's input samples were not calibrated, which means we were not getting Gaussian parameters: mean, covariance. This leads to the EM (expectation maximization algorithm: desired maximization ) algorithm.
The idea of the EM algorithm is a bit like K-means, that is, by iterating over the best parameters, these parameters can be categorized like GDA. The specific steps of GMM and EM are as follows:
The MATLAB code is as follows:
%%%GMM algorithm (Gaussian mixed model) soft assignment (soft division) Clear All;close all;%%k=2;% cluster number n=2;% dimension m=200;% v0=randn (m/2,2) -1;% V1=randn (m/2,2 ) +1;v0=mvnrnd ([1 1],[1 0;0 1],M/2);% generate positive sample 1v1=mvnrnd ([4 4],[1 0;0 1],M/2);% generate negative sample 0figure;subplot (221); hold On;plot (V0 (:, 1), V0 (:, 2), ' R. ');p Lot (V1 (:, 1), V1 (:, 2), ' B. '); Title (' Classified data ') ; hold Off;%%data=[v0;v1];d ata=sortrows (data,1); subplot (222);p Lot (data (:, 1), Data (:, 2), ' G. '); title (' Unclassified Data ');%%MU1 =mean (Data (1:50,:)), Mu2=mean (data (100:180,:)), Sigma1=cov (data (1:50,:)), Sigma2=cov (data (100:180,:));p =zeros (m,k );% probability thresh=0.05;% iteration termination condition iter=0;% record Iteration count while (1) iter=iter+1; A1=1/(((2*PI) ^ (N/2)) * ((det (sigma1) ^ (1/2))); A2=1/(((2*PI) ^ (N/2)) * ((det (sigma2) ^ (1/2))); For I=1:m P (i,1) =a1*exp (( -1/2) * (Data (i,:)-mu1) *sigma1* (data (i,:)-mu1) '); P (i,2) =a2*exp (( -1/2) * (Data (i,:)-mu2) *sigma2* (data (i,:)-mu2) '); Pp=sum (P (i,:)); P (i,1) =p (i,1)/pp;% Normalization, the sum of the probabilities of the sample belonging to a class is 1 P (i,2) =p (i,2)/pp; End Sum1=zeros (n,n); Sum2=zeros (N,n); For I=1:m sum1=sum1+p (i,1) *(Data (i,:)-mu1) ' * (Data (i,:)-mu1); Sum2=sum2+p (i,2) * (Data (i,:)-mu2) ' * (Data (i,:)-mu2); End Sigma1=sum1/sum (P (:, 1)); Sigma2=sum2/sum (P (:, 2)); MU1_PRE=MU1; MU2_PRE=MU2; Mu1= (P (:, 1) ' *data)/sum (P (:, 1)); Mu2= (P (:, 2) ' *data)/sum (P (:, 2)); if ((Pdist2 (MU1_PRE,MU1) <=thresh) | | (Pdist2 (MU2_PRE,MU2) <=thresh)) Break Endend%%subplot (223); A1=1/(((2*PI) ^ (N/2)) * ((det (sigma1) ^ (1/2))); A2=1/(((2*PI) ^ (N/2)) * ((Det (sigma2))), for I=1:m P (i,1) =a1*exp (( -1/2) * (Data (i,:)-mu1) *sigma1* (data (i,:)-mu1 )'); P (i,2) =a2*exp (( -1/2) * (Data (i,:)-mu2) *sigma2* (data (i,:)-mu2) '); If P (i,1) >=p (i,2) plot (data (i,1), data (i,2), ' R. '); else plot (data (i,1), data (i,2), ' B. '); Endendtitle (' gmm category '); hold off;% finish
Output Result:
Machine Learning (12, 13): K-means algorithm, Gaussian mixture model