Statical model
- Regression$y_i=f_{\theta} (x_i) +\epsilon_i,e (\epsilon) =0$
1.$\epsilon\sim N (0,\sigma^2) $2. Using maximum likelihood estimation $\rightarrow$ least squares
$y \sim N (F_{\theta} (x), \sigma^2) $
$L (\theta) =-\frac{n}{2}log (2\PI)-nlog\sigma-\frac{1}{2\sigma^2}\sum_i\left (Y_i-f_{\theta} (x_i) \right) ^2$
- Classification $p _{\theta} (g_i=k| x=x_i), K=1\cdots k$
Using the maximum likelihood estimate here is equivalent to cross entropy and KL divergence
For a single data point $ (x,g=k) $, its owning category $g=k$ is 1 and the remaining category is 0
- $L (\theta) =logp (g=k|x) $ needs to be maximized
- $CE (P,Q) =-\sum_x p (x) logq (x) $
Corresponds to this example $ce=-\sum_i P (g=i) Logp (g=i|x_i) =-logp (g=k|x) $ needs to be minimized
- $KL (P,Q) =\sum_x p (x) log\frac{p (x)}{q (x)}$
Corresponds to this example $kl=\sum_i P (g=i) log\frac{p (g=i)}{p (g=i|x)}=log\frac{1}{p (g=k|x)}=-logp (g=k|x) $ need to be minimized
2.6. Statistical Models, supervised learning and Function approximation