The implementation of GMM's EM algorithm

Source: Internet
Author: User

In the clustering algorithm K-means, K-medoids, GMM, spectral clustering,ncut we give the basic model and likelihood function of GMM algorithm, and explain the realization and convergence proof of EM algorithm in the principle of EM algorithm. This paper focuses on how to use EM algorithm to cluster the mixed Gaussian model to analyze the code.


1. GMM Model:

Each GMM consists of a K-Gaussian distribution, each of which is called a "Component", and these Component linearly add together to form the probability density function of GMM:


According to the above formula, if we want to randomly take a point from the GMM distribution, it can actually be divided into two steps: First randomly selected in this K Gaussian Component, each Component the probability of being selected is actually its coefficient pi (k), selected After Component, it is possible to consider a single point from the distribution of this Component, which has returned to the common Gaussian distribution, and is converted to known problems.

So how to use GMM to do clustering? Actually very simple, now that we have the data, assuming they are generated by GMM, then we just have to launch the probability distribution of GMM based on the data to be able to, and then GMM's K-Component actually corresponding K cluster. Estimating the probability density based on the data is often referred to as density estimation, especially when we are known (or assumed) to be in the form of a probability density function, and the process of estimating the number of the parameters is called "Estimation of the number of parameters".


2. Parameters and likelihood functions:

Now if we have N numbers of points, and if they obey a distribution (recorded as P (x)), now we need to determine the values of some of the parameters inside, for example, in GMM, we need to determine the influence factor Pi (k), the various mean Pmiu (k), and the various covariance psigma (k) of these parameters. Our idea is to find such a set of parameters that the probability distribution it determines is the most likely to generate these given data points, and this probability is actually equal to what we call this product a likelihood function (likelihood functions). Usually the probability of a single point is very small, many very small numbers multiplied in the computer is very easy to cause floating point overflow, so we will generally take the logarithm, the product to add sum, get Log-likelihood function. The next thing we're going to do is to maximize the function (usually by taking the derivative and making it equal to zero and then solving the equation), which is to find a set of parameters that let the likelihood function get the maximum value, and we think it's the most appropriate number, so we're done with the expected number of references.

Let's look at the Log-likelihood function of GMM:


Because there is addition in the logarithmic function, we cannot directly obtain the maximum value directly with the derivation solution equation. To solve the problem, we took the method of randomly selecting from GMM: Two steps, in fact, similar to the two steps of K-means.



3. Algorithm Flow:

1. The probability that the data will be generated by each Component (not the probability of each Component being selected): For each data , the probability that it is generated by the first Component is


Middle N (xi | ΜK,ΣK) is a posteriori probability.


2. The value of the parameter Pmiu,psigma can be obtained by the maximum likelihood estimate by obtaining the number of parameters = 0. Please see the third part of this article for details.


among , and It is also logical to be able to anticipate .


3. iteratively iterate the first two steps until the value of the likelihood function converges.



4. Matlab to achieve the GMM cluster code and interpretation:


Description: FEA is the training sample data and GND is the sample designator. The idea in the algorithm is exactly the same as it was written, and in the final inference accuracy, because clustering differs from classification, it's just getting some cluster, and not knowing what label these cluster should be labeled, or. Because our goal is to measure the performance of the clustering algorithm, it is directly assumed that this step can achieve the optimal corresponding relationship, each cluster corresponding to a class up. One way is to enumerate all possible cases and select the optimal solution, and in addition, we can use Hungarian algorithm to solve this problem. The specific Hungarian code I put in the resource , the calling method is already written in the following function.


Note: I put in the resources of the Kmeans code, we download the time only to use bestmap.m and so on a few files good ~


1. GMM.M, the most core function, to determine the model and the number of parameters.

function varargout = GMM (X, k_or_centroids)% ============================================================% Expectation-maximization Iteration Implementation of% Gaussian Mixture model.%% px = GMM (X, k_or_centroids)% [px Model] =  GMM (X, K_or_centroids) percent-x:n-by-d data matrix.%-K_or_centroids:either K indicating the number of% components or a k-by-d matrix indicating the% choosing of the initial K centroids.%%-px:n-by-k matrix indicating the Proba       Bility of each% component generating point.%-model:a structure containing the parameters for a gmm:% MODEL. Miu:a k-by-d matrix.% MODEL. Sigma:a d-by-d-by-k matrix.% MODEL. Pi:a 1-by-k vector.% ============================================================% @SourceCode author:pluskid (http ://blog.pluskid.org)% @Appended by:sophia_qing (http://blog.csdn.net/abcjennifer) percent Generate Initial centroids th    Reshold = 1e-15;     [N, D] = size (X); If Isscalar (k_or_centroids)%if k_or_centRoid is a 1*1 number K = K_or_centroids; Rn_index = Randperm (N); %random index N Samples centroids = X (Rn_index (1:k),:);%generate K random centroid Else% k_or_centroid is a I         Nitial k centroid k = size (k_or_centroids, 1);    Centroids = K_or_centroids;     End percent initial values [Pmiu pPi Psigma] = Init_params (); Lprev =-inf;         % of the previous cluster error percent of EM algorithm while true estimation Step Px = Calc_prob ();  % new value for Pgamma (n*k), Pgamma (I,K) = Xi is generated by the K-Gaussian probability% or XI has Pgamma (I,K) is generated by the K-Gaussian pgamma = Px. * Repmat (PPi, N, 1); % molecule = PI (k) * N (xi | pmiu (k), Psigma (k)) Pgamma = Pgamma./Repmat (SUM (Pgamma, 2), 1, k);                % denominator = Pi (j) * N (xi | Pmiu (j), Psigma (j)) Sum of all J maximization Step-through maximize likelihood estimation Nk = SUM (pgamma, 1);                %nk (1*k) = K-Gaussian generates the sum of probabilities for each sample, and all Nk sums are n. % update Pmiu Pmiu = diag (1./nk) * pgamma ' * X; %update PMIU through MLE (obtained by making derivative = 0) pPi = nk/n;            % update K psigma for KK = 1:k Xshift = X-repmat (Pmiu (KK,:), N, 1);                Psigma (:,:, KK) = (xshift ' * ... (Diag (Pgamma (:, KK)) * Xshift))        /Nk (KK);        End% Check for convergence L = SUM (log (Px*ppi '));        If L-lprev < threshold break;    End lprev = L;    End If nargout = = 1 Varargout = {Px};        else model = []; Model.        Miu = Pmiu; Model.        Sigma = Psigma; Model.        Pi = PPi;    Varargout = {Px, model}; End Percent function Definition function [Pmiu pPi Psigma] = init_params () Pmiu = Centroids; %k*d, which is the center point of the K class pPi = zeros (1, k); The weights of%k gmm (influence factor) Psigma = zeros (d, D, K);  The covariance matrix of the%k class GMM, each of which is the d*d% distance matrix, computes the matrix of the N*k (x-pmiu) ^2 = X^2+pmiu^2-2*x*miu Distmat = Repmat (sum (x.*x, 2), 1, K) + %x^2, n*1 Matrix Replicatek column Repmat (SUM (Pmiu.*pmiu, 2) ', N, 1)-%pmiu^2,1*k matrix Replicaten line 2*x*pmiu ';            [~, Labels] = min (Distmat, [], 2);%return the minimum from each row for k=1:k Xk = X (Labels = = k,:);            PPi (k) = Size (Xk, 1)/n;        Psigma (:,:, k) = CoV (Xk); End End Function Px = Calc_prob ()%gaussian posterior probability%N (x|pmiu,psigma) = 1/((2PI) ^ (D/2        ) * (1/(ABS (sigma)) ^0.5) *exp ( -1/2* (X-pmiu) ' psigma^ ( -1) * (X-pmiu)) Px = Zeros (N, K);            For k = 1:k Xshift = X-repmat (Pmiu (k,:), N, 1);%x-pmiu Inv_psigma = Inv (Psigma (:,:, K));            TMP = SUM ((xshift*inv_psigma). * Xshift, 2);            Coef = (2*pi) ^ (-D/2) * sqrt (det (inv_psigma));        Px (:, k) = COEF * EXP ( -0.5*tmp); End EndEnd


2. GMM_ACCURACY.M call GMM.M, calculate the accuracy rate:

function [Accuracy] = Gmm_accuracy (Data_fea, Gnd_label, K)%calculate the accuracy Clustered by gmm modelpx = GMM (data_ FEA,K); [~, Cls_ind] = max (px,[],1); %cls_ind = Cluster Label accuracy = cal_accuracy (Cls_ind, Gnd_label);    function [ACC] = cal_accuracy (gnd,estimate_label)        res = Bestmap (Gnd,estimate_label);        ACC = Length (find (GND = = res))/length (GND);    EndEnd


3. Main function call

GMM_ACC = Gmm_accuracy (fea,gnd,n_classes);








Write this article to summarize their very benefit, but also hope that we can good ym under the above Pluskid GMM.M, not only the algorithm, the Matrix processing code is also written very concise, it is worth learning.

also saw two things very beneficial, one is Pluskid Daniel's " Random Talk Clustering (3): Gaussian Mixture Model, one is Jerrylead's EM algorithm specific explanation , everyone is interested also can look, write very good.



About machine learning many other learning materials and related discussions will continue to update, please follow this blog and Sina Weibo Sophia_qing.




The implementation of GMM's EM algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.