Machine learning lesson three (EM algorithm and Gaussian mixture model)

Source: Internet
Author: User

em algorithm , which is a more well-known CV world algorithm, although very early heard, but really dig into the recent days to see the Stanford Public Lecture notes. The reason that EM and Mog are put together is because we need em to solve the MOG model, so here we introduce the EM algorithm first.

Before introducing the EM algorithm, we will first popularize the knowledge of the Jensen inequalities. First, let's give a definition of the Jensen inequality:

The theorem is very simple, summed up is so several points. If f is a convex function and the second derivative is greater than 0 (raised above), then there is. Further, the Jouquie derivative is constant greater than 0, then the inequality equals is established when and only if X=E[X], that is, X is a fixed value. The Jouquie derivative is reversed in the direction of the non-equal sign, then the inequality direction reverses. Such as:

Okay, now that we know the Jensen inequality, the general form of EM algorithm is discussed below.

Suppose we have a training set consist of M independent examples, assuming that the class Z of the sample obeys an unknown distribution, then for the model of this implied variable we can find its likelihood function as (the likelihood function is to solve We assume each parameter in the model, when we solve a classification or regression problem, we usually need to select a model, such as nb,gda,logistic regression, and then use the maximum likelihood to solve the parameters of the model):

Here we do not know the distribution of obedience, only know that they obey a certain probability distribution is sufficient. Next we need to solve the parameters to make the above maximum, due to the addition of the logarithmic function of the situation makes the solution very difficult. So we converted to the following processing:

We introduce an unknown distribution of Z (how to choose the following), that is, continue to deduce we have

We used the Jensen inequality in the above equation, because the log function is a concave function, so the inequality is reversed. In short, it is:

In other words, there is a lower bound, and the logarithm in the nether is already placed in the summation, so it is easier to find the bias guide. So can we turn minimum into minimum lowbound? With this idea, we just need proof. Assuming that the current parameters are, the new parameters of the maximum-likelihood function are computed on the nether, if we can guarantee it, we just need to make a maximum likelihood estimate on the nether. The proof is as follows:

This one is not difficult to understand before, the key is the last equation, how can we guarantee it? Haha, remember Jensen inequality inside the condition of the formation of the equation, right, is this x=e[x], corresponding to the EM algorithm is to make

Coupled with the conditions, further derivation of this formula, we know, then there is, we can choose Q (Z):

So we can pick the probability of the z-estimate Q. So we got half of the EM algorithm ill-conceived, as follows:

For the sake of understanding, draw a picture here to deepen the impression of everyone

To this end, we use a nether lowbound, by solving the maximum likelihood function on the lowbound, so as to update the parameters and finally solve the problem of solving the parameters of EM algorithm.

Mixtures of Gaussians (GDA)

Mixed Gaussian distribution (MoG) is also an unsupervised learning algorithm, which is often used for clustering. It is often more appropriate to use MoG when the dimensions of the clusters are different and there are correlations between the clusters. For a sample, MoG gets the probability that it belongs to each class (by calculating a posteriori probability), rather than being completely a class, and this clustering method becomes a soft cluster. Generally speaking, the probability distribution of arbitrary shape can be approximated by multiple Gaussian distribution functions, so the application of MoG is more extensive.

Let's take an example to help you understand, such as:

This is a two-dimensional Gaussian mixture distribution, where the data points are generated by two Gaussian distributions ( -1,-2) and (.). The clustering results are shown by classifying data points according to the posterior probability size of the data points belonging to two Gaussian distributions:

In Mog, because we do not know the distribution of data in advance, we need to make two assumptions first:

Suppose 1:z follows a polynomial distribution, namely:

Hypothesis 2: When z is known, x obeys a normal distribution, that is, conditional probability P (x|z) is normally distributed, i.e.:

The probability function of the joint distribution of X and Z is:

Next, the likelihood function

Using likelihood function to solve the value of parameter

But the problem now is that our two assumptions are not necessarily set up, so how do we solve the values of each parameter if we don't know the distribution of the sample and its class beforehand? Think of no, just we talked about the EM algorithm is to solve such a problem, so we think of her-em, is the matter of the inevitable. Since the EM algorithm we have already explained, then now directly to solve Mog is, the steps are as follows:

Specifically, in E-step, the probability of z is updated as follows:

As the hypothesis suggests, it is a normal distribution, which is a polynomial distribution.

In M-step, the parameters are re-estimated according to the distribution of Z obtained by E-step:

This allows us to solve the parameters of the MOG model by constantly iterating and updating the parameters. From the analysis process, the MOG will be better for the sample processing of the uncertain distribution.

Machine learning lesson three (EM algorithm and Gaussian mixture model)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.