GMM's EM algorithm

Source: Internet
Author: User

In the clustering algorithm K-means, K-medoids, GMM, spectral clustering,ncut we give the basic model and likelihood function of GMM algorithm, and explain the realization and convergence proof of EM algorithm in the principle of EM algorithm. This paper focuses on how to use EM algorithm to cluster the mixed Gaussian model to analyze the code.


1. GMM Model:

Each GMM consists of a K-Gaussian distribution. Each of the Gaussian is called a "Component". These Component linear addition together make up the probability density function of GMM:


According to the above formula, if we want to randomly take a point from the GMM distribution, it can actually be divided into two steps: First randomly selected in this K Gaussian Component, each Component the probability of being selected is actually its coefficient pi (k), selected After Component, it is possible to consider a single point from the distribution of this Component, which has returned to the common Gaussian distribution, and is converted to known problems.

So how to use GMM to do clustering? In fact it is very simple, now that we have the data, assuming that they are generated by GMM, then we just have to launch the probability distribution of GMM based on the data. Then GMM's K-Component in fact the corresponding K-cluster.

Estimating the probability density based on the data is often referred to as density estimation, particularly. When we are known (or assumed) to be in the form of a probability density function, the process of predicting the number of references is called the "estimate of the parameters".


2. Parameters and likelihood functions:

Now if we have N number of points, and if they obey a distribution (recorded as P (x)). Now to determine the values of some of the parameters inside. For example, in GMM, we need to determine the influence factor Pi (k), the various mean Pmiu (k), and the various covariance psigma (k) of these parameters.

Our idea is to find such a set of parameters. The probability distributions that it determines are the most likely to generate these given data points, and this probability is actually equal to what we call this product a likelihood function (likelihood functions).

Usually the probability of a single point is very small, many very small numbers multiplied in the computer is very easy to cause floating point overflow, so we will generally take the logarithm, the product to add sum, get Log-likelihood function. The next thing we want to do is to maximize the function (usually by taking the derivative and making it equal to zero and then solving the equation), which is to find a set of parameters. It allows the likelihood function to get the maximum value. We think this is the most appropriate number of participants, so we're done with the expected process.

Let's look at the Log-likelihood function of GMM:


Because there is addition in the logarithmic function. We cannot directly derive the maximum value directly with the derivation solution equation. To solve the problem, we took the method of randomly selecting from GMM: Two steps, in fact, similar to the two steps of K-means.



3. Algorithm Flow:

1. The probability that the data will be generated by each Component (not the probability of each Component being selected): For each data , the probability that it is generated by the first Component is


Middle N (xi | ΜK,ΣK) is a posteriori probability.


2. The value of the parameter Pmiu,psigma can be obtained by the maximum likelihood estimate by obtaining the number of parameters = 0. Please see the third part of this article for details.


among , and It is also logical to be able to anticipate .


3. iteratively iterate the first two steps until the value of the likelihood function converges.



4. Matlab to achieve the GMM cluster code and interpretation:


Description: FEA is the training sample data and GND is the sample designator. The idea in the algorithm is exactly the same as written above, in the final inference accuracy aspect. Because clustering differs from classification, it just gets some cluster, and doesn't know what label these cluster should be labeled on, or that. Because our goal is to measure the performance of the clustering algorithm. Therefore, the direct assumption of this step can achieve the best corresponding relationship. Each of the cluster corresponding to a class up.

One way is to enumerate all possible cases and select the optimal solution. In addition, for this problem, we can also use Hungarian algorithm to solve. The specific Hungarian code I put in the resource , the calling method is already written in the following function.



Note: I put in the resources of the Kmeans code, we download the time only to use bestmap.m and so on a few files good ~


1. GMM.M, the most core function, to determine the model and the number of parameters.

function varargout = GMM (X, k_or_centroids)% ============================================================% Expectation-maximization Iteration Implementation of% Gaussian Mixture model.%% px = GMM (X, k_or_centroids)% [px Model] =  GMM (X, K_or_centroids) percent-x:n-by-d data matrix.%-K_or_centroids:either K indicating the number of% components or a k-by-d matrix indicating the% choosing of the initial K centroids.%%-px:n-by-k matrix indicating the Proba       Bility of each% component generating point.%-model:a structure containing the parameters for a gmm:% MODEL. Miu:a k-by-d matrix.% MODEL. Sigma:a d-by-d-by-k matrix.% MODEL. Pi:a 1-by-k vector.% ============================================================% @SourceCode author:pluskid (http ://blog.pluskid.org)% @Appended by:sophia_qing (http://blog.csdn.net/abcjennifer) percent Generate Initial centroids th    Reshold = 1e-15;     [N, D] = size (X); If Isscalar (k_or_centroids)%if k_or_centRoid is a 1*1 number K = K_or_centroids; Rn_index = Randperm (N); %random index N Samples centroids = X (Rn_index (1:k),:);%generate K random centroid Else% k_or_centroid is a I         Nitial k centroid k = size (k_or_centroids, 1);    Centroids = K_or_centroids;     End percent initial values [Pmiu pPi Psigma] = Init_params (); Lprev =-inf;         % of the previous cluster error percent of EM algorithm while true estimation Step Px = Calc_prob ();  % new value for Pgamma (n*k), Pgamma (I,K) = Xi is generated by the K-Gaussian probability% or XI has Pgamma (I,K) is generated by the K-Gaussian pgamma = Px. * Repmat (PPi, N, 1); % molecule = PI (k) * N (xi | pmiu (k), Psigma (k)) Pgamma = Pgamma./Repmat (SUM (Pgamma, 2), 1, k);                % denominator = Pi (j) * N (xi | Pmiu (j), Psigma (j)) Sum of all J maximization Step-through maximize likelihood estimation Nk = SUM (pgamma, 1);                %nk (1*k) = K-Gaussian generates the sum of probabilities for each sample, and all Nk sums are n. % update Pmiu Pmiu = diag (1./nk) * pgamma ' * X; %update PMIU through MLE (obtained by making derivative = 0) pPi = nk/n;            % update K psigma for KK = 1:k Xshift = X-repmat (Pmiu (KK,:), N, 1);                Psigma (:,:, KK) = (xshift ' * ... (Diag (Pgamma (:, KK)) * Xshift))        /Nk (KK);        End% Check for convergence L = SUM (log (Px*ppi '));        If L-lprev < threshold break;    End lprev = L;    End If nargout = = 1 Varargout = {Px};        else model = []; Model.        Miu = Pmiu; Model.        Sigma = Psigma; Model.        Pi = PPi;    Varargout = {Px, model}; End Percent function Definition function [Pmiu pPi Psigma] = init_params () Pmiu = Centroids; %k*d, which is the center point of the K class pPi = zeros (1, k); The weights of%k gmm (influence factor) Psigma = zeros (d, D, K); The covariance matrix of the%k class GMM, each of which is the d*d% distance matrix.            Compute n*k Matrix (x-pmiu) ^2 = X^2+pmiu^2-2*x*miu Distmat = Repmat (sum (x.*x, 2), 1, K) + ...%x^2, n*1 Matrix Replicatek column Repmat (SUM (Pmiu.*pmiu, 2) ', N, 1)-.....%pmiu^2.        1*k Matrix Replicaten line 2*x*pmiu ';            [~, Labels] = min (Distmat, [], 2);%return the minimum from each row for k=1:k Xk = X (Labels = = k,:);            PPi (k) = Size (Xk, 1)/n;        Psigma (:,:, k) = CoV (Xk); End End Function Px = Calc_prob ()%gaussian posterior probability%N (x|pmiu,psigma) = 1/((2PI) ^ (D/2        ) * (1/(ABS (sigma)) ^0.5) *exp ( -1/2* (X-pmiu) ' psigma^ ( -1) * (X-pmiu)) Px = Zeros (N, K);            For k = 1:k Xshift = X-repmat (Pmiu (k,:), N, 1);%x-pmiu Inv_psigma = Inv (Psigma (:,:, K));            TMP = SUM ((xshift*inv_psigma). * Xshift, 2);            Coef = (2*pi) ^ (-D/2) * sqrt (det (inv_psigma));        Px (:, k) = COEF * EXP ( -0.5*tmp); End EndEnd


2. GMM_ACCURACY.M call GMM.M, calculate the accuracy rate:

function [Accuracy] = Gmm_accuracy (Data_fea, Gnd_label, K)%calculate the accuracy Clustered by gmm modelpx = GMM (data_ FEA,K); [~, Cls_ind] = max (px,[],1); %cls_ind = Cluster Label accuracy = cal_accuracy (Cls_ind, Gnd_label);    function [ACC] = cal_accuracy (gnd,estimate_label)        res = Bestmap (Gnd,estimate_label);        ACC = Length (find (GND = = res))/length (GND);    EndEnd


3. Main function call

GMM_ACC = Gmm_accuracy (fea,gnd,n_classes);








Write this article to summarize after the very benefit of myself. Also hope that we can good ym under the above Pluskid GMM.M. Not just algorithms. The Matrix processing code is also very concise, it is worth learning.

Also saw two things very beneficial, one is Pluskid Daniel's "

p=39 "> Random Talk Clustering (3): Gaussian Mixture Model", one is Jerrylead's em algorithm specific explanation , everybody is interested also can look, writes very good.



About machine learning many other learning materials and related discussions will continue to update, please follow this blog and Sina Weibo Sophia_qing.





GMM's EM algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.