Machine Learning (12, 13): K-means algorithm, Gaussian mixture model

Source: Internet
Author: User

Brief introduction:

This section describes the algorithms in the 12th and 13 episodes of the Stanford Machine learning public class: K-means algorithm, Gaussian mixture model (GMM). (9, 10, 11 episodes do not introduce, skip the ha)

First, K-means algorithm

It belongs to unsupervised learning clustering algorithm, given a set of non-calibrated data (input sample), classify it, the hypothesis can be divided into K class. Because the algorithm is more intuitive, the steps and MATLAB code are given directly. (The K-means algorithm is meaningful in mathematical derivation.)

Matlab code:

%%%k mean cluster clear all;close all;%%n=2;m=200;v0=randn (m/2,2) -1;v1=randn (m/2,2) +1;figure;subplot (221); hold On;plot (V0 (: , 1), V0 (:, 2), ' R. ');p Lot (V1 (:, 1), V1 (:, 2), ' B. '),%axis ([-5 5-5 5]), title (' Classified data '), hold OFF;DATA=[V0;V1];d ata=sortrows (data,1); subplot (222);p Lot (data (:, 1), Data (:, 2), ' G. '), title (' Unclassified Data '),%axis ([-5 5-5 5]);%%[a b]=size (data); M1=data ( M2=data,:);% random center of Gravity Point (k1=zeros),% Random center of gravity (K2=zeros); N1=0;n2=0;subplot (223); hold On;%axis ([-5 5-5 5]);        For T=1:10 for I=1:a D1=pdist2 (M1,data (i,:));        D2=pdist2 (M2,data (i,:));            if (D1<D2) k1=k1+data (i,:);            n1=n1+1;        Plot (data (i,1), data (i,2), ' R. ');            else K2=k2+data (i,:);            n2=n2+1;        Plot (data (i,1), data (i,2), ' B. ');    End End M1=k1/n1;    m2=k2/n2;% plot (M1, M1, ' G. ');% plot (M2 (n.), m2 (+), ' G. ');    (K1=zeros);    (K2=zeros);    n1=0; N2=0;endplot (M1), M1 (n), ' k* ');p lot (m2 (), M2 (+), ' k* '), title (' K-means cluster '); hold Off 

Output (unclassified data is removed from the classified data label, black ※ number indicates a clustering center):

Two, Gaussian mixed model (GMM)

In retrospect, the Gaussian discriminant analysis (GDA) was evaluated by calculating the posterior probability of the sample, which was calculated by assuming the multivariate Gaussian model. The parameters of Gaussian model: mean and covariance are obtained from the calibrated (classified) samples, so it can be regarded as a supervised learning method.

In the GMM model (which belongs to unsupervised learning), given the non-classified m samples (n-dimensional features), the hypothesis can be divided into K classes, which require the GMM algorithm to classify them. If we know the Gaussian parameters of each class, we can calculate the posteriori probability as the GDA algorithm. Unfortunately, Yang's input samples were not calibrated, which means we were not getting Gaussian parameters: mean, covariance. This leads to the EM (expectation maximization algorithm: desired maximization ) algorithm.

The idea of the EM algorithm is a bit like K-means, that is, by iterating over the best parameters, these parameters can be categorized like GDA. The specific steps of GMM and EM are as follows:

The MATLAB code is as follows:

%%%GMM algorithm (Gaussian mixed model) soft assignment (soft division) Clear All;close all;%%k=2;% cluster number n=2;% dimension m=200;% v0=randn (m/2,2) -1;% V1=randn (m/2,2 ) +1;v0=mvnrnd ([1 1],[1 0;0 1],M/2);% generate positive sample 1v1=mvnrnd ([4 4],[1 0;0 1],M/2);% generate negative sample 0figure;subplot (221); hold On;plot (V0 (:, 1), V0 (:, 2), ' R. ');p Lot (V1 (:, 1), V1 (:, 2), ' B. '); Title (' Classified data ') ; hold Off;%%data=[v0;v1];d ata=sortrows (data,1); subplot (222);p Lot (data (:, 1), Data (:, 2), ' G. '); title (' Unclassified Data ');%%MU1 =mean (Data (1:50,:)), Mu2=mean (data (100:180,:)), Sigma1=cov (data (1:50,:)), Sigma2=cov (data (100:180,:));p =zeros (m,k    );% probability thresh=0.05;% iteration termination condition iter=0;% record Iteration count while (1) iter=iter+1;    A1=1/(((2*PI) ^ (N/2)) * ((det (sigma1) ^ (1/2)));    A2=1/(((2*PI) ^ (N/2)) * ((det (sigma2) ^ (1/2)));        For I=1:m P (i,1) =a1*exp (( -1/2) * (Data (i,:)-mu1) *sigma1* (data (i,:)-mu1) ');        P (i,2) =a2*exp (( -1/2) * (Data (i,:)-mu2) *sigma2* (data (i,:)-mu2) ');        Pp=sum (P (i,:));    P (i,1) =p (i,1)/pp;% Normalization, the sum of the probabilities of the sample belonging to a class is 1 P (i,2) =p (i,2)/pp;    End Sum1=zeros (n,n);    Sum2=zeros (N,n); For I=1:m sum1=sum1+p (i,1) *(Data (i,:)-mu1) ' * (Data (i,:)-mu1);    Sum2=sum2+p (i,2) * (Data (i,:)-mu2) ' * (Data (i,:)-mu2);    End Sigma1=sum1/sum (P (:, 1));    Sigma2=sum2/sum (P (:, 2));    MU1_PRE=MU1;    MU2_PRE=MU2;    Mu1= (P (:, 1) ' *data)/sum (P (:, 1));    Mu2= (P (:, 2) ' *data)/sum (P (:, 2)); if ((Pdist2 (MU1_PRE,MU1) <=thresh) | |        (Pdist2 (MU2_PRE,MU2) <=thresh))    Break Endend%%subplot (223); A1=1/(((2*PI) ^ (N/2)) * ((det (sigma1) ^ (1/2))); A2=1/(((2*PI) ^ (N/2)) * ((Det (sigma2))), for I=1:m P (i,1) =a1*exp (( -1/2) * (Data (i,:)-mu1) *sigma1* (data (i,:)-mu1    )');    P (i,2) =a2*exp (( -1/2) * (Data (i,:)-mu2) *sigma2* (data (i,:)-mu2) ');    If P (i,1) >=p (i,2) plot (data (i,1), data (i,2), ' R. ');    else plot (data (i,1), data (i,2), ' B. '); Endendtitle (' gmm category '); hold off;% finish
Output Result:

Machine Learning (12, 13): K-means algorithm, Gaussian mixture model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.