GMM's EM algorithm

Source: Internet
Author: User

In the clustering algorithm K-means, K-medoids, GMM, spectral clustering,ncut we give the basic model and likelihood function of GMM algorithm, and explain the realization and convergence proof of EM algorithm in the principle of EM algorithm.

This paper focuses on how to use EM algorithm to cluster the mixed Gaussian model to analyze the code.


1. GMM Model:

Each GMM consists of a K-Gaussian distribution. Each Gaussian is called a "Component", and these Component linearly add together to form the probability density function of GMM:


According to the above formula, if we want to randomly take a point from the GMM distribution, it can actually be divided into two steps: First randomly selected in this K Gaussian Component, each Component the probability of being selected is actually its coefficient pi (k), selected After Component, it is possible to consider a single point from the distribution of this Component, which has returned to the common Gaussian distribution, and is converted to known problems.

So how to use GMM to do clustering? Actually very simple, now that we have the data, assuming they are generated by GMM, then we just have to launch the probability distribution of GMM based on the data to be able to, and then GMM's K-Component actually corresponding K cluster.

Estimating the probability density based on the data is often referred to as density estimation, particularly when we are known (or assumed) in the form of a probability density function. The process of predicting the number of the participants is called the "estimate of the number of references".


2. Parameters and likelihood functions:

Now if we have N numbers of points, and if they obey a distribution (recorded as P (x)), now we need to determine the values of some of the parameters inside, for example, in GMM, we need to determine the influence factor Pi (k), the various mean Pmiu (k), and the various covariance psigma (k) of these parameters. Our idea is to find such a set of parameters. The probability distributions that it determines are the most likely to generate these given data points, and this probability is actually equal to what we call this product a likelihood function (likelihood functions). Usually the probability of a single point is very small, a lot of very small numbers multiplied in the computer very easy to cause floating point overflow, so we will generally take the logarithm, the product into addition. Get Log-likelihood function. The next step is to maximize the function (the usual practice is to take a derivative and make the derivative equal to zero.) then solve the equation), that is, to find such a set of parameters, it allows the likelihood function to obtain the maximum value. We think this is the most appropriate number of participants, so we're done with the expected process.

Let's look at the Log-likelihood function of GMM:


Because there is addition in the logarithmic function. We cannot directly derive the maximum value directly with the derivation solution equation. In order to solve the problem. We took a random point from GMM: In two steps, it was actually similar to K-means's two-step approach.



3. Algorithm Flow:

1. The probability that the data will be generated by each Component (not the probability of each Component being selected): For each data . The probability that it is generated by the first Component is


Middle N (xi | ΜK,ΣK) is a posteriori probability.


2. It is possible to obtain the parameter Pmiu by the maximum likelihood estimate by obtaining the number of parameters = 0. The value of the Psigma.

Please see the third part of this article for details.


among , and It is also logical to be able to anticipate .


3. Iterate over the previous two steps. Until the value of the likelihood function converges.



4. Matlab to achieve the GMM cluster code and interpretation:


Description: FEA is the training sample data and GND is the sample designator. The idea in the algorithm is exactly the same as it was written, and in the final inference accuracy, because clustering differs from classification, it just gets some cluster. And I don't know what labels these cluster should be labeled, or that.

Because our goal is to measure the performance of the clustering algorithm. Therefore, the direct assumption that this step can achieve the optimal corresponding relationship, each of the cluster corresponding to a class up. One way is to enumerate all possible cases and select the optimal solution. In addition, for this problem, we can also use Hungarian algorithm to solve. The specific Hungarian code I put in the resource , the calling method is already written in the following function.



Note: I put a kmeans code in the resource. When you download it, just use bestmap.m to wait for a few files.


1. GMM.M, the most core function, to determine the model and the number of parameters.

function varargout = GMM (X, k_or_centroids)% ============================================================% Expectation-maximization Iteration Implementation of% Gaussian Mixture model.%% px = GMM (X, k_or_centroids)% [px Model] =  GMM (X, K_or_centroids) percent-x:n-by-d data matrix.%-K_or_centroids:either K indicating the number of% components or a k-by-d matrix indicating the% choosing of the initial K centroids.%%-px:n-by-k matrix indicating the Proba       Bility of each% component generating point.%-model:a structure containing the parameters for a gmm:% MODEL. Miu:a k-by-d matrix.% MODEL. Sigma:a d-by-d-by-k matrix.% MODEL. Pi:a 1-by-k vector.% ============================================================% @SourceCode author:pluskid (http ://blog.pluskid.org)% @Appended by:sophia_qing (http://blog.csdn.net/abcjennifer) percent Generate Initial centroids th    Reshold = 1e-15;     [N, D] = size (X); If Isscalar (k_or_centroids)%if k_or_centRoid is a 1*1 number K = K_or_centroids; Rn_index = Randperm (N); %random index N Samples centroids = X (Rn_index (1:k),:);%generate K random centroid Else% k_or_centroid is a I         Nitial k centroid k = size (k_or_centroids, 1);    Centroids = K_or_centroids;     End percent initial values [Pmiu pPi Psigma] = Init_params (); Lprev =-inf;         % of the previous cluster error percent of EM algorithm while true estimation Step Px = Calc_prob ();  % new value for Pgamma (n*k), Pgamma (I,K) = Xi is generated by the K-Gaussian probability% or XI has Pgamma (I,K) is generated by the K-Gaussian pgamma = Px. * Repmat (PPi, N, 1); % molecule = PI (k) * N (xi | pmiu (k), Psigma (k)) Pgamma = Pgamma./Repmat (SUM (Pgamma, 2), 1, k);                % denominator = Pi (j) * N (xi | Pmiu (j), Psigma (j)) Sum of all J maximization Step-through maximize likelihood estimation Nk = SUM (pgamma, 1); %nk (1*k) = K-Gaussian generates the sum of probabilities for each sample, and all Nk sums are n.

% update Pmiu Pmiu = diag (1./nk) * pgamma ' * X; %update Pmiu through MLE (obtained by making derivative = 0) pPi = nk/n; % update K psigma for KK = 1:k Xshift = X-repmat (Pmiu (KK,:), N, 1); Psigma (:,:, KK) = (xshift ' * ... (Diag (Pgamma (:, KK)) * Xshift)) /Nk (KK); End% Check for convergence L = SUM (log (Px*ppi ')); If L-lprev < threshold break; End lprev = L; End If nargout = = 1 Varargout = {Px}; else model = []; Model. Miu = Pmiu; Model. Sigma = Psigma; Model. Pi = PPi; Varargout = {Px, model}; End Percent function Definition function [Pmiu pPi Psigma] = init_params () Pmiu = Centroids; %k*d, which is the center point of the K class pPi = zeros (1, k); The weights of%k gmm (influence factor) Psigma = zeros (d, D, K); The covariance matrix of the%k class GMM, each of which is the d*d% distance matrix. Computes the matrix of the N*k (x-pmiu) ^2 = X^2+pmiu^2-2*x*miu Distmat = Repmat (sum (x.*x, 2),1, K) + ...%x^2, n*1 Matrix Replicatek column Repmat (SUM (Pmiu.*pmiu, 2) ', N, 1)-...%pmiu^2,1*k matrix Replicaten row 2*x*pmiu '; [~, Labels] = min (Distmat, [], 2);%return the minimum from each row for k=1:k Xk = X (Labels = = k,:); PPi (k) = Size (Xk, 1)/n; Psigma (:,:, k) = CoV (Xk); End End Function Px = Calc_prob ()%gaussian posterior probability%N (x|pmiu,psigma) = 1/((2PI) ^ (D/2 ) * (1/(ABS (sigma)) ^0.5) *exp ( -1/2* (X-pmiu) ' psigma^ ( -1) * (X-pmiu)) Px = Zeros (N, K); For k = 1:k Xshift = X-repmat (Pmiu (k,:), N, 1);%x-pmiu Inv_psigma = Inv (Psigma (:,:, K)); TMP = SUM ((xshift*inv_psigma). * Xshift, 2); Coef = (2*pi) ^ (-D/2) * sqrt (det (inv_psigma)); Px (:, k) = COEF * EXP ( -0.5*tmp); End EndEnd



2. GMM_ACCURACY.M call GMM.M, calculate the accuracy rate:

function [Accuracy] = Gmm_accuracy (Data_fea, Gnd_label, K)%calculate the accuracy Clustered by gmm modelpx = GMM (data_ FEA,K); [~, Cls_ind] = max (px,[],1); %cls_ind = Cluster Label accuracy = cal_accuracy (Cls_ind, Gnd_label);    function [ACC] = cal_accuracy (gnd,estimate_label)        res = Bestmap (Gnd,estimate_label);        ACC = Length (find (GND = = res))/length (GND);    EndEnd


3. Main function call

GMM_ACC = Gmm_accuracy (fea,gnd,n_classes);








Write this article to summarize after the very benefit of myself. Also hope that we can good ym under the above Pluskid GMM.M, not only the algorithm. The Matrix processing code is also very concise, it is worth learning.

In addition, saw two things very beneficial. One is Pluskid Daniel's "

p=39 "> Random Talk Clustering (3): Gaussian Mixture Model". One is Jerrylead's em algorithm specific explanation , everybody is interested also can look, writes very well.




About machine learning many other learning materials and related discussions will continue to update, please follow this blog and Sina Weibo Sophia_qing.





GMM's EM algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.