Machine Learning (12, 13): K-means algorithm, Gaussian mixture model

Last Update:2015-06-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief introduction:

This section describes the algorithms in the 12th and 13 episodes of the Stanford Machine learning public class: K-means algorithm, Gaussian mixture model (GMM). (9, 10, 11 episodes do not introduce, skip the ha)

First, K-means algorithm

It belongs to unsupervised learning clustering algorithm, given a set of non-calibrated data (input sample), classify it, the hypothesis can be divided into K class. Because the algorithm is more intuitive, the steps and MATLAB code are given directly. (The K-means algorithm is meaningful in mathematical derivation.)

Matlab code:

%%%k mean cluster clear all;close all;%%n=2;m=200;v0=randn (m/2,2) -1;v1=randn (m/2,2) +1;figure;subplot (221); hold On;plot (V0 (: , 1), V0 (:, 2), ' R. ');p Lot (V1 (:, 1), V1 (:, 2), ' B. '),%axis ([-5 5-5 5]), title (' Classified data '), hold OFF;DATA=[V0;V1];d ata=sortrows (data,1); subplot (222);p Lot (data (:, 1), Data (:, 2), ' G. '), title (' Unclassified Data '),%axis ([-5 5-5 5]);%%[a b]=size (data); M1=data ( M2=data,:);% random center of Gravity Point (k1=zeros),% Random center of gravity (K2=zeros); N1=0;n2=0;subplot (223); hold On;%axis ([-5 5-5 5]);        For T=1:10 for I=1:a D1=pdist2 (M1,data (i,:));        D2=pdist2 (M2,data (i,:));            if (D1&LT;D2) k1=k1+data (i,:);            n1=n1+1;        Plot (data (i,1), data (i,2), ' R. ');            else K2=k2+data (i,:);            n2=n2+1;        Plot (data (i,1), data (i,2), ' B. ');    End End M1=k1/n1;    m2=k2/n2;% plot (M1, M1, ' G. ');% plot (M2 (n.), m2 (+), ' G. ');    (K1=zeros);    (K2=zeros);    n1=0; N2=0;endplot (M1), M1 (n), ' k* ');p lot (m2 (), M2 (+), ' k* '), title (' K-means cluster '); hold Off

Output (unclassified data is removed from the classified data label, black ※ number indicates a clustering center):

Two, Gaussian mixed model (GMM)

In retrospect, the Gaussian discriminant analysis (GDA) was evaluated by calculating the posterior probability of the sample, which was calculated by assuming the multivariate Gaussian model. The parameters of Gaussian model: mean and covariance are obtained from the calibrated (classified) samples, so it can be regarded as a supervised learning method.

In the GMM model (which belongs to unsupervised learning), given the non-classified m samples (n-dimensional features), the hypothesis can be divided into K classes, which require the GMM algorithm to classify them. If we know the Gaussian parameters of each class, we can calculate the posteriori probability as the GDA algorithm. Unfortunately, Yang's input samples were not calibrated, which means we were not getting Gaussian parameters: mean, covariance. This leads to the EM (expectation maximization algorithm: desired maximization ) algorithm.

The idea of the EM algorithm is a bit like K-means, that is, by iterating over the best parameters, these parameters can be categorized like GDA. The specific steps of GMM and EM are as follows:

The MATLAB code is as follows:

%%%GMM algorithm (Gaussian mixed model) soft assignment (soft division) Clear All;close all;%%k=2;% cluster number n=2;% dimension m=200;% v0=randn (m/2,2) -1;% V1=randn (m/2,2 ) +1;v0=mvnrnd ([1 1],[1 0;0 1],M/2);% generate positive sample 1v1=mvnrnd ([4 4],[1 0;0 1],M/2);% generate negative sample 0figure;subplot (221); hold On;plot (V0 (:, 1), V0 (:, 2), ' R. ');p Lot (V1 (:, 1), V1 (:, 2), ' B. '); Title (' Classified data ') ; hold Off;%%data=[v0;v1];d ata=sortrows (data,1); subplot (222);p Lot (data (:, 1), Data (:, 2), ' G. '); title (' Unclassified Data ');%%MU1 =mean (Data (1:50,:)), Mu2=mean (data (100:180,:)), Sigma1=cov (data (1:50,:)), Sigma2=cov (data (100:180,:));p =zeros (m,k    );% probability thresh=0.05;% iteration termination condition iter=0;% record Iteration count while (1) iter=iter+1;    A1=1/(((2*PI) ^ (N/2)) * ((det (sigma1) ^ (1/2)));    A2=1/(((2*PI) ^ (N/2)) * ((det (sigma2) ^ (1/2)));        For I=1:m P (i,1) =a1*exp (( -1/2) * (Data (i,:)-mu1) *sigma1* (data (i,:)-mu1) ');        P (i,2) =a2*exp (( -1/2) * (Data (i,:)-mu2) *sigma2* (data (i,:)-mu2) ');        Pp=sum (P (i,:));    P (i,1) =p (i,1)/pp;% Normalization, the sum of the probabilities of the sample belonging to a class is 1 P (i,2) =p (i,2)/pp;    End Sum1=zeros (n,n);    Sum2=zeros (N,n); For I=1:m sum1=sum1+p (i,1) *(Data (i,:)-mu1) ' * (Data (i,:)-mu1);    Sum2=sum2+p (i,2) * (Data (i,:)-mu2) ' * (Data (i,:)-mu2);    End Sigma1=sum1/sum (P (:, 1));    Sigma2=sum2/sum (P (:, 2));    MU1_PRE=MU1;    MU2_PRE=MU2;    Mu1= (P (:, 1) ' *data)/sum (P (:, 1));    Mu2= (P (:, 2) ' *data)/sum (P (:, 2)); if ((Pdist2 (MU1_PRE,MU1) <=thresh) | |        (Pdist2 (MU2_PRE,MU2) <=thresh))    Break Endend%%subplot (223); A1=1/(((2*PI) ^ (N/2)) * ((det (sigma1) ^ (1/2))); A2=1/(((2*PI) ^ (N/2)) * ((Det (sigma2))), for I=1:m P (i,1) =a1*exp (( -1/2) * (Data (i,:)-mu1) *sigma1* (data (i,:)-mu1    )');    P (i,2) =a2*exp (( -1/2) * (Data (i,:)-mu2) *sigma2* (data (i,:)-mu2) ');    If P (i,1) >=p (i,2) plot (data (i,1), data (i,2), ' R. ');    else plot (data (i,1), data (i,2), ' B. '); Endendtitle (' gmm category '); hold off;% finish

Output Result:

Machine Learning (12, 13): K-means algorithm, Gaussian mixture model

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning (12, 13): K-means algorithm, Gaussian mixture model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning (12, 13): K-means algorithm, Gaussian mixture model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support