1. The sample produced in the maximum likelihood estimation needs to satisfy an important hypothesis that all sampled samples are distributed independently.
2. Maximum likelihood estimation is the estimation of the specific parameters in the model when the model is defined and the parameters are unknown.
3. The core of the maximum likelihood estimation is the maximum probability that the sampled sample is generated. That is, using known sample result information, the inverse pushes the parameter values of the model that are most likely to cause these sample results to occur.
Now that things have happened, why not let this result be the most likely to happen? This is the core of the maximum likelihood estimation.
General steps to find the maximum likelihood function estimate:
(1) write out the likelihood function; the size of the likelihood function value means that the probability of the occurrence of this set of sample values , is a probabilistic value.
(2) The likelihood function takes ln \ln logarithm, and organizes the simplification; the logarithm function is a monotonically increasing function , so when the logarithm function takes the maximum value, the original function also obtains the maximum value. (Logarithmic function Y=logax y=\log_a{x}, monotonically incrementing when A>1 a>1, and monotonically decreasing when 0<a<1 0 (3) the derivative is 0 to obtain the likelihood equation;
(4) to solve the likelihood equation, get The parameters are the desired, and the
The
below uses the maximum likelihood method to estimate the parameters in logistic regression:
1. Assume that the current sample is {(X1,y1=1), (x2,y2=0), (X3,y3=1), (x4,y4=0), (x5,y5=0)} \{(X_1,y_1=1), (x_ 2,y_2=0), (X_3,y_3=1), (x_4,y_4=0), (x_5,y_5=0) \}, the sample is to satisfy the independent distribution.
2. The model is P (y=1|x) =11+exp (−w⋅x) p (y=1|x) =\frac{1}{1+\exp (-w \cdot x)}, which represents the probability that a sample x x predicts a value of 1 under the model, satisfying the principle that the model is determined and the parameters are unknown.
3. As the sample is independent of the same distribution, then the total probability of the above sample distribution is the lower one, which requires the greatest probability of generating this set of samples.
P=p (Y=1|X=X1) p (y=0|x=x2) p (y=1|x=x3) p (y=0|x=x4) p (y=0|x=x5) p=p (y=1|x=x_1) p (y=0|x=x_2) p (y=1|x=x_3) p (y=0|x=x_4 ) P (y=0|x=x_5)
Set