Difference between p (x | theta) and p (x; theta), theta
Http://blog.csdn.net/pipisorry/article/details/42715245
Two representation methods are available for maximum likelihood estimation.
From: Gregor Heinrich-Parameter estimation for text analysis
From: http://blog.csdn.net/pipisorry/article/details/42649657
There are two reasons for the above two methods
P (x | theta) does not always represent the conditional probability. That is, p (x | theta) is equivalent to p (x; theta) when it does not represent the conditional probability.
Generally
Vertical bars indicate the probability of a condition, which is a random variable;
Write the semicolon p (x; theta) to indicate the parameter to be evaluated (it is fixed, but it is currently unknown). It should be considered as p (x) directly and added; here, we have a theta parameter. p (x; theta) indicates the probability of a random variable X = x. In bayesian theory, we call it the prior probability of X = x.
Explanations of P (y | x; theta)
From: The Representation Method in andrew ng's machine learning handout
Differences between the frequency School and Bayesian School in the two notation
The frequency school considers the parameter as a fixed value, which refers to a value in the real world.
Bayes believes that the parameter is a random variable, which indicates that this value has a certain probability.
From: http://blog.csdn.net/pipisorry/article/details/42715245