has been to Bayesian inside the likelihood function (likelihood functions), a priori probability (prior), posterior probability (posterior) understanding is not very good, today seems to have a new understanding, record.
Reading the paper, read this sentence:
The original only focus on the formula, so the belt. Re-read the description of the formula before, and think extremely afraid.
The likelihood function of the parametersθ= {w,α,β} given the observations D can be factored as.
Two questions: Why is likelihood function written in the form of conditional probabilities? Given is clearly D, why to the back of the formula, but become given θ it?
Baidu a bit, first put on Wikipedia's explanation:
Https://zh.wikipedia.org/wiki/%E4%BC%BC%E7%84%B6%E5%87%BD%E6%95%B0
Let's take a look at your understanding and borrow the example of a coin from Wikipedia.
often said probability refers to the probability of predicting an impending event after a given parameter. in the case of a coin, we know that the positive and negative probabilities of a homogeneous coin are 0.5, to predict the probability of tossing two coins and the coins facing up:
H stands for Head, which means head facing up.
P (HH | pH = 0.5) = 0.5*0.5 = 0.25.
This kind of writing is actually a bit misleading, the later P is actually as a parameter, rather than a random variable, so it can not be counted as the conditional probability, the more plausible wording should be P (hh;p=0.5).
The likelihood probability is just the opposite of this process, the amount of our attention is no longer the probability of the occurrence of the event, but some events are known to occur, and we want to know what the parameters should be.
Now that we have thrown two coins and know that the result is two times the head up, I would like to know the probability that the coin will be thrown upside down to 0.5. What is the probability of 0.8 facing up?
If we want to know the probability that the probability of 0.5 is positive, this thing is called the likelihood function, which can be said to be the probability of a conjecture (p=0.5) of a certain parameter, so that the probability of being a (conditional) is
L (ph=0.5| HH) = P (hh|ph=0.5) = (another notation) p (hh;ph=0.5).
Why could it be written like this? I think it's possible to think:
The likelihood function itself is also a probability that we can put L (ph=0.5| HH) written in P (ph=0.5| HH); And according to the Bayesian formula, P (ph=0.5| HH) = P (ph=0.5,hh)/P (HH); Since HH is an event that has occurred, it is natural that P (hh) = 1, so:
P (ph=0.5| HH) = P (ph=0.5,hh) = P (hh;ph=0.5).
On the right side of this calculation we are familiar with, is known as the head up to the probability of 0.5, to throw two times is the probability of H, that is, 0.5*0.5=0.25.
So, we can safely get:
L (ph=0.5| HH) = P (hh|ph=0.5) = 0.25.
This 0.25 means that the probability of PH = 0.5 is equal to 0.25 in cases where two positive is known to be thrown.
Another count.
L (ph=0.6| HH) = P (hh|ph=0.6) = 0.36.
The curve of the likelihood function obtained by ph from the value of 0~1 is drawn out to obtain such a picture:
(From Wikipedia)
It can be found that the probability of PH = 1 is the largest.
i.e. L (PH = 1| HH) = 1.
Then the problem of maximum likelihood probability is well understood.
The maximum likelihood probability is to find the parameter value that makes the likelihood probability the most if the observed data is known.
It is not difficult to understand that in the data mining domain, many of the methods of parameter determination finally come to the problem of maximizing likelihood probability.
Back to the example of this coin, in the case where HH is observed, PH = 1 is the most reasonable (but not necessarily true, because the amount of data is too small).
Understand so much first.
Understanding of the likelihood function