From MLE to EM algorithm

Source: Internet
Author: User

Maximum likelihood estimation (MLE) provides a method for evaluating model parameters with a given observation data, as in the case of MLE, given a data set from a random variable $X $ $\left \{x_1,x_2,..., x_n \right \}$, $X $ probability density function $ F (X|\theta) $, where $\theta$ is an unknown parameter for the probability density, can now be $\theta$ according to the MLE parameter.

In fact, the MLE is a method of minimizing empirical risk (emperical risk minimization,erm), in machine learning, ERM is to minimize the loss of the obtained model on a given finite data set, which is written as a formula:

\[\min_{f \in \mathbb{f}} \frac{1}{n} \sum_{i=1}^{n}l (Y_i,f (x_i)) \]

where $ \mathbb{f} $ is a hypothetical space, $L (Y_i,f (x_i)) $ is a manually defined loss function, $f (x) $ is a hypothetical function, also known as a model, which can be seen when the sample capacity is large enough, ERM will guarantee a good solution, but the sample capacity N is very small, erm may have over The phenomenon of-fitting. For MLE, when the model is a conditional probability, the loss function is a logarithmic loss function, which is equivalent to ERM, proving as follows: $ (x_i,y_i) $ for a single sample, when the model is $f (x_i) = P (X_i|\theta) $, the logarithmic loss function is $L (y_i,f (x_i)) =-logf (x_i) =-log \ P (X_i|\theta) $, at this time for all sample data $\left \{x_1,x_2,..., x_n \right \}$ are:

\[\min_{\theta}-\frac{1}{n} \sum_{i=1}^{n}log \ P (x_i|\theta) \leftrightarrow \max_{\theta} \frac{1}{N} \sum_{i=1}^{N }log \ P (x_i|\theta) \]

The above is the log likelihood logarithm of MLE, there is no $\frac{1}{n}$ effect on the result. Next, the general form of MLE is given: for data $\left \{x_1,x_2,..., x_n \right \}$, the density function is $f (X|\theta) $, the union density function of the dataset is $f (X_1,x_2,..., X_n|\thet a) = f (X_1|\theta) F (X_2|\theta) f (x_n|\theta) = \prod_{i=1}^{n}f (X_i|\theta) $, the maximum value is required, you can take the log at the same time on both sides, and then find the maximum value of the log function, that is

\[\max_{\theta}l (\theta) = \max_{\theta}log (\prod_{i=1}^{n}f (x_i|\theta)) = \max_{\theta}\sum_{i=1}^{n}log F (x_i|\ theta) \]

Obviously, the more data you have for ERM,

In the case of known data probability density,

Jensen Inequalities

Expectation of random variables

Expectation of random variable function

The maximal expectation algorithm (expectation maximization algorithm, and the expectation maximization algorithm) is an iterative algorithm for maximum likelihood estimation or maximal posteriori probability estimation of probabilistic parametric models with implicit variables (latent variable).

From MLE to EM algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.