by Yunduan Cui
This is my own PRML study note, which is currently in the update.
Chapter II probability distribution of probability distributions
This chapter introduces the probability distribution model to be used in the book, which is the basis of the later chapters. Known as a finite set \ (\{x_{1}, x_{2},..., x_{n}\}\), the probability distribution is used to create a model: \ (p (x) \). This problem is also known as density estimation ( density estimation ).
Main content
1. Binomial and multinomial distributions Bernoulli distribution and polynomial distribution for discrete random variables
2. Gaussian distribution Gaussian distribution for continuous random variables
3. Parameter estimation for Gaussian distribution: Frequency School/Bayesian School
4. Conjugate priori, and unification of each probability distribution
5. Parameter/No parameter method
2.1 Binary Variables binary variable
- Bernoulli distribution (Bernoulli distribution)
Define binary random variables \ (x \in \{0, 1\}\), the Bernoulli distribution satisfies:
\ (Bern (X|\MU) =\mu^{x} (1-\MU) ^{1-x}\)
where \ (\mu\) is the parameter that controls the distribution, in accordance with:
\ (P (X=1|\MU) =\mu\).
The expectation and variance of the Bernoulli distribution satisfies:
\ (\mathbb{e}[x] = \mu\)
\ (Var[x] = \mu (1-\MU) \)
When there is an observation set \ (\mathcal{d}=\{x_{1}, x_{2},..., x_{n}\}\) and assuming that the observations are independent of each other, we can get a likelihood function (likelihood function) about \ (\mu\):
\ (P (\mathcal{d}|\mu) =
\displaystyle{\prod_{n=1}^{n}}p (X_{N}|\MU) =\displaystyle{\prod_{n=1}^{n}}\mu^{x_{n}} (1-\MU) ^{1-x_{n}}\)
In the case of the maximum likelihood function, this form is very inconvenient, and we calculate the logarithm of \ (P (\MATHCAL{D}|\MU) \) (The conversion is connected by adding):
\ (\ln{p (\MATHCAL{D}|\MU)}=
\displaystyle{\sum_{n=1}^{n}}\ln{p} (X_{N}|\MU) =\displaystyle{\sum_{n=1}^{n}}\{x_{n}\ln{\mu}+ (1-x_{n}) \ln{(1-\ MU)}\}\)
To obtain the maximum value, get \ (\mu_{ml}=\frac{1}{n}\displaystyle{\sum_{n=1}^{n}}x_{n}\) This is the maximum likelihood estimate of the Bernoulli distribution on the observation set. equivalent to minimizing the risk of experience
The maximum likelihood estimate also has the flaw, if the observation set is too few, the overfitting is very easy to occur (for example, throws the coin three times if is the head face up, the maximum likelihood estimate will directly judge upward probability is \ (100\%\), this obviously is not correct). We can avoid this situation by introducing a priori \ (\mu\). becomes the maximum posterior estimate, the structural risk minimization --See the beta distribution later
- Two items distributed (binomial distribution)
The observed set in the Bernoulli distribution \ (\mathcal{d}\) is given, and we can deduce two distributions when we only know \ (x=1\) The number of observations is \ (m\):
\ (Bin (m| N,\MU) =\binom{n}{m}\mu^{m} (1-\MU) ^{n-m}=\frac{n!} {(n-m)!m!} \mu^{m} (1-\MU) ^{n-m}\)
This is the probability of how many times an event occurs. The expectation and variance of the two-item distribution are satisfied:
\ (\mathbb{e}[m] = \displaystyle{\sum_{m=0}}mbin (m| N,\MU) =n\mu\)
\ (Var[m] = \displaystyle{\sum_{m=0}} (M-\mathbb{e}[m]) ^{2}bin (m| N,\MU) =n\mu (1-\MU) \)
- Beta distribution (distribution)
This section considers how to introduce a priori information into a binary distribution and introduce a conjugate priori (conjugacy prior)
Beta distribution is introduced as a priori probability distribution, which is controlled by two hyper-parameters \ (A, b\).
\ (Beta (\mu|a,b) =\frac{\gamma (a+b)}{\gamma (a) \gamma (b)}\mu^{a-1} (1-\MU) ^{b-1}\)
\ (\gamma (x) \equiv \int_{0}^{\infty}u^{x-1}e^{-u}du\)
The coefficients guarantee the normalization of beta distribution (\int_{0}^{\infty}beta (\mu|a,b) d\mu=1\). The expectation and variance of the beta distribution are met:
\ (\mathbb{e}[\mu] = \frac{a}{a+b}\)
\ (Var[m] = \frac{ab}{(a+b) ^{2} (a+b+1)}\)
Cond
Pattern Recognition and machine learning (mode recognition and computer learning) notes (1)