EM Algorithm Implementation of GMM

Source: Internet
Author: User

In the K-means, K-medoids, GMM, spectral clustering, and ncut clustering algorithms, we provide the basic models and likelihood functions of GMM algorithms, the implementation and convergence proof of the EM algorithm are described in the principle of the EM algorithm. This article mainly describes how to use the EM algorithm to analyze the code of clustering under the Gaussian mixture model.


1. GMM model:

Each GMM is composed of K Gaussian distributions, and each Gaussian is called a "component". These linear addition components constitute the probability density function of GMM:


Based on the formula above, if we want to randomly obtain a vertex from the GMM distribution, we can actually divide it into two steps: first, randomly select one of the K Gaussian component, the probability of each component being selected is actually its coefficient Pi (k). After component is selected, we can select a point from the component distribution separately -- here we have returned to the normal Gaussian distribution and converted it to a known issue.

So how to use GMM for clustering? In fact, it is very easy. Now we have data. If they are generated by GMM, we only need to launch the GMM probability distribution based on the data, then K components of GMM actually correspond to k clusters. Estimating Probability Density Based on Data is usually called Density Estimation. In particular, when we know (or assume) the probability density function form, the process of predicting the number of workers is called "Estimated number of workers ".


2. likelihood and likelihood functions:

Now, if we have n data points, and if they follow a certain distribution (recorded as p (x), we need to determine the values of some vertices in them, for example, in GMM, we need to determine the influence factors Pi (K), all kinds of mean pmiu (K) and various covariance psigma (K) of the Gini number. Our idea is to find such a group of workers, and the probability distribution it determines will generate the highest probability of these given data points, which is actually equal, we call this product the likelihood function ). Generally, the probability of a single vertex is very small. Many very small numbers are multiplied to easily cause floating point overflow in the computer. Therefore, we generally take the logarithm of a number and convert the product into a sum, obtain the log-likelihood function. Next, we only need to maximize this function (the common practice is to evaluate and make the derivative equal to zero, and then solve the equation), that is, to find such a set of values, which allows the likelihood function to obtain the maximum value, we think this is the most appropriate number of shards, so that we have completed the prediction process of the number of shards.

Let's take a look at GMM's log-likelihood function:


Because there is addition in the logarithm function, we cannot directly obtain the maximum value by using the method of deriving the equation. In order to solve the problem, we use the method of randomly selecting points from GMM: we divide it into two steps, which is actually similar to the K-means two steps.



3. algorithm flow:

1. predict the probability that data is generated by each component (not the probability that each component is selected): For each data, the probability that it is generated by the first component is


Where n (XI | μ K, Σ K) is the posterior probability.


2. the maximum likelihood is expected to obtain the pmiu and psigma values of the number of workers by making the number of workers = 0. For details, see the third part of this article.


And can also be properly predicted.


3. Repeat the previous two steps until the value of the likelihood function converges.



4. GMM clustering code and Interpretation Using MATLAB:


Note: FEA indicates the training sample data And Gnd indicates the sample number. The idea in the algorithm is exactly the same as that written above. In the final inference of accuracy, because clustering and classification are different, we only get some clusters, but do not know what labels or labels these clusters should be labeled. Because our purpose is to measure the performance of the clustering algorithm, we assume that this step can achieve the optimal relationship, and each cluster is assigned to a class. One way is to enumerate all possible cases and select the optimal solution. In addition, we can use Hungarian algorithm to solve this problem. I put the specific Hungarian code in the resource, and the calling method has been written in the following functions.


Note: I put the kmeans code in the resource. You only need to use several files such as bestmap. m to download the code ~


1. GMM. m, the core function, determines the model and number of measures.

Function varargout = GMM (x, k_or_centroids) % ===================================================== ===============================% expectation-maximization iteration Implementation of % Gaussian mixture model. % PX = GMM (x, k_or_centroids) % [PX model] = GMM (x, k_or_centroids) %-X: N-by-D data matrix. %-k_or_centroids: either K indicating the number of % components or a K-by-D matrix indicating the % choosing of the INITIA L k centroids. %-Px: N-by-K matrix indicating the probability of each % component generating each point. %-Model: A structure containing the parameters for a GMM: % model. miu: a K-by-D matrix. % model. sigma: a D-by-K matrix. % model. pi: A 1-by-k vector. % ===================================================== ====================================%@ sourcecode Author: pluskid (http://blog.pluskid.org) % @ appended by: sophia_qi Ng (http://blog.csdn.net/abcjennifer) % generate initial centroids Threshold = 1e-15; [N, d] = size (x); If isscalar (k_or_centroids) % if k_or_centroid is a 1*1 number k = k_or_centroids; rn_index = randperm (n); % Random Index n samples centroids = x (rn_index (1: K ),:); % generate K random centroid else % k_or_centroid is a initial K centroid K = size (k_or_centroids, 1); centroids = k_or_centroids; en D % initial values [pmiu PPI psigma] = init_params (); lprev =-INF; % error % EM algorithm while true % estimation step PX = calc_prob (); % new value for pgamma (N * K), pgamma (I, K) = The probability that Xi is generated by the K Gaussian or that pgamma (I, k) in Xi is generated by the K Gaussian pgamma = px. * repmat (PPI, N, 1); % molecule = Pi (k) * n (XI | pmiu (K), psigma (k) pgamma = pgamma. /repmat (sum (pgamma, 2), 1, k); % Denominator = Pi (j) * n (XI | pmiu (J), PSI GMA (j) for all J summation % maximization step-through maximize likelihood estimation nk = sum (pgamma, 1); % NK (1 * K) = The sum of the probability of generating each sample with K Gaussian numbers. The sum of all NK values is N. % Update pmiu = diag (1. /NK) * pgamma '* X; % update pmiu through MLE (obtained by the derivative = 0) PPI = NK/N; % update K psigma for KK = 1: k xshift = x-repmat (pmiu (KK, :), N, 1); psigma (:,:, KK) = (xshift '*... (DIAG (pgamma (:, KK) * xshift)/nk (kk); end % check for convergence L = sum (log (PX * PPI ')); if l-lprev <threshold break; end lprev = L; end if nargout = 1 varargout = {PX}; else model = []; Model. Miu = pmiu; model. sigma = psigma; model. pi = PPI; varargout = {PX, model}; end % function definition function [pmiu PPI psigma] = init_params () pmiu = centroids; % K * D, that is, the center point of K class PPI = zeros (1, k); % K class GMM weight (Influence Factor) psigma = zeros (D, D, k ); % K is the covariance matrix of GMM. Each is the % distance matrix of D * d, and the matrix of N * k is calculated (X-pmiu) ^ 2 = x ^ 2 + pmiu ^ 2-2 * x * Miu distmat = repmat (sum (X. * X, 2), 1, k) +... % x ^ 2, N * 1 matrix replicatek column repmat (sum (PM IU. * pmiu, 2) ', N, 1)-... % pmiu ^ 2,1 * k matrix replicaten Row 2 * x * pmiu ';[~, Labels] = min (distmat, [], 2); % return the minimum from each row for k = 1: k xk = x (labels = K ,:); PPI (K) = size (XK, 1)/n; psigma (:,:, K) = Cov (XK); end function PX = calc_prob () % Gaussian posterior probability % n (x | pmiu, psigma) = 1/(2PI) ^ (D/2) * (1/(ABS (SIGMA) ^ 0.5) * exp (-1/2 * (X-pmiu) 'psigma ^ (-1) * (X-pmiu) Px = zeros (n, k); For k = 1: k xshift = x-repmat (pmiu (K, :), N, 1); % x-pmiu inv_psigma = inv (psigma (:,:, k )); TMP = sum (xshift * inv_psigma ). * xshift, 2); coef = (2 * PI) ^ (-D/2) * SQRT (det (inv_psigma); PX (:, K) = coef * exp (-0.5 * TMP); End endend


2. gmm_accuracy.m calls GMM. m to calculate the accuracy:

function [ Accuracy ] = gmm_accuracy( Data_fea, gnd_label, K )%Calculate the accuracy Clustered by GMM modelpx = gmm(Data_fea,K);[~, cls_ind] = max(px,[],1); %cls_ind = cluster label Accuracy = cal_accuracy(cls_ind, gnd_label);    function [acc] = cal_accuracy(gnd,estimate_label)        res = bestMap(gnd,estimate_label);        acc = length(find(gnd == res))/length(gnd);    endend


3. Main function call

Gmm_acc = gmm_accuracy (FEA, Gnd, n_classes );








I am very benefited from the summary in this article, and I hope you will be able to use the GMM of pluskid in ym. M is not just an algorithm, but the matrix processing code in it is also very concise and worth learning.

I also benefited a lot from reading two articles. One is "talking about clustering (3): Gaussian mixture model" by pluskid, and the other is the specific explanation of jerrylead's EM algorithm, if you are interested, you can take a look at it. It is very well written.



Many other learning materials and discussions about machine learning will be updated. Please stay tuned to this blog and Sina Weibo sophia_qing.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.