Chapter 1 Summary:mle (Maximum-likelihood Estimate) and Bayesian approach
Chapter 1 Summary:mle (Maximum-likelihood Estimate) and Bayesian approach
Christopher M. Bishop, PRML, Chapter 1 introdcution
1. Notations and Logical Relation
- Training Data : input values and their corresponding target values . For simplicity, written as .
- Goal of Making prediction : To is able to make predictions for the target variable given some new value of the input variable .
- assumption of the predictive distribution over : We shall assume that, given the value of, the Corresponding value of have a Gaussian distribution with a mean equal to the value y (x, W) of the polynomial cur ve given by (1.1). Thus we have
- likelihood function of i.i.d. Training Data :
- MLE of parameters and :
- ML Plugin Prediction for new values of : Substituting the maximum likelihood parameters into (1.60) to G Ive
- Prior distribution over : For simplicity, let us consider a Gaussian distribution of the form where
- Strong>hyperparameter is the precision of the distribution,
- M +1 are the total number of elements in the Vector for a order polynomial.
- Posterior distribution for : Using Bayes ' theorem,
- map: A step towards a more Bayesian approach, note MAP is still a point estimate. We find that the maximum of the posterior are given by the minimum of
Although we have included a prior distribution , we is so far still making a point estimate of and so this does not yet amount to a Bayesian treatment . In a fully Bayesian approach, we should consistently apply the sum and product rules of probability , whic h requires, as we shall see shortly, which we integrate over all values of W . Such marginalizations lie at the heart of Bayesian methods for pattern recognition.
- Fully Bayesian approach :
- Here we shall assume that the parameters and is fixed and known in advance (in later chapters we shall discuss what such parameters can be Inferred from data in a Bayesian setting).
- A Bayesian treatment simply corresponds to A consistent application of the sum and PR Oduct rules of probability, which allow the predictive distribution to being written in the form
- Result of Integration in (1.68):
- (1.66): This posterior distribution are a Gaussian and can be evaluat Ed analytically.
- (1.68) can also be performed analytically with the result, the predictive distribution is given by a Gaussian of T He form where the mean and variance are given by here The Matrix S was given by where is the unit matrix, and we have defined the vector with elements for .
2. Flowchartthe relation between all of those equations or notions above:
CCJ prml Study note-chapter 1 summary:mle (Maximum-likelihood Estimate) and Bayesian approach