Independent identically distributed random events
How do we calculate the probability of random events for n independent and identically distributed random events? For example, if we toss the same coin 100 times, there will be 52 heads and 48 tails. What is the probability of tossing the coin head?
Frequency school thought
The frequency school thinks that the probability of event a (such as the probability of coin tossing in the example) is certain, but we do not know that after a large number of repeated experiments, the probability of event a is roughly equal to the frequency of event a in the experiment, which is also the idea of the law of large numbers. As follows, μ represents the expected number of occurrences of event
In practice, it is difficult to carry out a large number of repeated events, but the frequency school thinks that we have reason to believe that the current experimental results are the most likely results under the probability. The likelihood function represents the probability function of the current experimental results
X is the known result of the experiment. If we find the value under the extreme point, we can get the maximum likelihood probability. The frequency distribution is used to represent the probability of event a.
The thought of frequency school is a natural thought, and we use it unconsciously in our life. For example, we have the above example
Logarithm of both sides
At the extreme point, the derivative of pair is equal to 0, which is easy to calculate as 0.52. It can be seen that the probability obtained by maximizing the likelihood function is consistent with the direct use of the law of large numbers.
Bayesian thought
Bayes believes that since our number of experiments will never be infinite, we should not give a definite value. As in the above example, only 100 experiments have been carried out, and the probability school thinks that the probability of a coin toss to be positive is 0.52, which is ridiculous. 52 is consistent with the law of large numbers, but 100 times is obviously far from infinity. (in fact, the frequency school introduces confidence. They don't think that 0.52 is the correct value, and 0.52 is correct under a certain probability.)
Therefore, Bayes put forward the Bayes formula. He thought that under our limited observation times, the probability of event a should obey a certain probability distribution. In the above example, Bayes thinks that the probability of positive coin toss is about 0.5 and 0.8, but there may be 0.2 probability of coin toss with positive probability of 0.2 or 0.8 (the probability here is roughly written, don't take it seriously). The Bayesian formula is as follows
In discrete case, Bayesian formula is expressed in this way
Here, it represents the target probability (the event probability we want to get), represents the probability distribution of the target probability before the experiment (prior probability distribution), represents the probability distribution of the target probability after the experiment (posterior probability distribution), and represents the event probability obtained in the experiment (calculated by using the likelihood function). Again, Bayesian wants to calculate the probability distribution of probability.
The posterior probability of the nth experiment is the prior probability of the N + 1 experiment. The prior probability function before the first experiment can be set according to experience. If there is no experience to refer to, we may as well assume that the distribution is uniform
In fact, when there are enough repeated experiments, the initial prior probability has little effect on the final results.
Here we can see another essence of Bayesian thought. The probability distribution of Bayesian probability fluctuates with the experiment, and with the increase of the number of experiments, the probability distribution of probability will slowly converge, and finally meet the law of large numbers.
The advantages of Bayesian thought
1. For a kind of independent repeated random event, if the maximum likelihood method is used to calculate two extreme points, such as 99 and 100, then the maximum likelihood method will only take the probability value of the maximum point 100. But using Bayesian thought, we can consider the probability of extreme points 99 and 100 at the same time.
In practical application, the probability of event a may not be invariable (the experiment is difficult to repeat independently, or the probability of event a is random). For example, consider the probability of a person getting sick. The probability of illness is high in childhood, low in middle age, high in old age, or high in winter and low in summer. Frequency school thought that probability is a fixed attribute of event a, which is not applicable in these cases. Strictly speaking, you can't guarantee that the probability of event a is fixed in any scenario.
2. The maximum likelihood method used by frequency school can only obtain the maximum likelihood estimation of probability. However, after obtaining the posterior distribution function through Bayesian formula, we can carry out various processing, such as taking probability expectation, probability median, probability maximum and so on.
3. In the following sections, we can also see the conjugate distribution based on Bayesian formula, which is very convenient for the calculation of posterior probability, which is also a major advantage of Bayesian formula.