Maximum likelihood estimation (MLE) provides a method for evaluating model parameters with a given observation data, as in the case of MLE, given a data set from a random variable $X $ $\left \{x_1,x_2,..., x_n \right \}$, $X $ probability density function $ F (X|\theta) $, where $\theta$ is an unknown parameter for the probability density, can now be $\theta$ according to the MLE parameter.
In fact, the MLE is a method of minimizing empirical risk (emperical risk minimization,erm), in machine learning, ERM is to minimize the loss of the obtained model on a given finite data set, which is written as a formula:
\[\min_{f \in \mathbb{f}} \frac{1}{n} \sum_{i=1}^{n}l (Y_i,f (x_i)) \]
where $ \mathbb{f} $ is a hypothetical space, $L (Y_i,f (x_i)) $ is a manually defined loss function, $f (x) $ is a hypothetical function, also known as a model, which can be seen when the sample capacity is large enough, ERM will guarantee a good solution, but the sample capacity N is very small, erm may have over The phenomenon of-fitting. For MLE, when the model is a conditional probability, the loss function is a logarithmic loss function, which is equivalent to ERM, proving as follows: $ (x_i,y_i) $ for a single sample, when the model is $f (x_i) = P (X_i|\theta) $, the logarithmic loss function is $L (y_i,f (x_i)) =-logf (x_i) =-log \ P (X_i|\theta) $, at this time for all sample data $\left \{x_1,x_2,..., x_n \right \}$ are:
\[\min_{\theta}-\frac{1}{n} \sum_{i=1}^{n}log \ P (x_i|\theta) \leftrightarrow \max_{\theta} \frac{1}{N} \sum_{i=1}^{N }log \ P (x_i|\theta) \]
The above is the log likelihood logarithm of MLE, there is no $\frac{1}{n}$ effect on the result. Next, the general form of MLE is given: for data $\left \{x_1,x_2,..., x_n \right \}$, the density function is $f (X|\theta) $, the union density function of the dataset is $f (X_1,x_2,..., X_n|\thet a) = f (X_1|\theta) F (X_2|\theta) f (x_n|\theta) = \prod_{i=1}^{n}f (X_i|\theta) $, the maximum value is required, you can take the log at the same time on both sides, and then find the maximum value of the log function, that is
\[\max_{\theta}l (\theta) = \max_{\theta}log (\prod_{i=1}^{n}f (x_i|\theta)) = \max_{\theta}\sum_{i=1}^{n}log F (x_i|\ theta) \]
Obviously, the more data you have for ERM,
In the case of known data probability density,
Jensen Inequalities
Expectation of random variables
Expectation of random variable function
The maximal expectation algorithm (expectation maximization algorithm, and the expectation maximization algorithm) is an iterative algorithm for maximum likelihood estimation or maximal posteriori probability estimation of probabilistic parametric models with implicit variables (latent variable).
From MLE to EM algorithm