Statical model

- Regression$y_i=f_{\theta} (x_i) +\epsilon_i,e (\epsilon) =0$

**1.$\epsilon\sim N (0,\sigma^2) $2. Using maximum likelihood estimation $\rightarrow$ least squares**

$y \sim N (F_{\theta} (x), \sigma^2) $

$L (\theta) =-\frac{n}{2}log (2\PI)-nlog\sigma-\frac{1}{2\sigma^2}\sum_i\left (Y_i-f_{\theta} (x_i) \right) ^2$

- Classification $p _{\theta} (g_i=k| x=x_i), K=1\cdots k$

**Using the maximum likelihood estimate here is equivalent to cross entropy and KL divergence**

For a single data point $ (x,g=k) $, its owning category $g=k$ is 1 and the remaining category is 0

- $L (\theta) =logp (g=k|x) $ needs to be maximized

- $CE (P,Q) =-\sum_x p (x) logq (x) $

Corresponds to this example $ce=-\sum_i P (g=i) Logp (g=i|x_i) =-logp (g=k|x) $ needs to be minimized

- $KL (P,Q) =\sum_x p (x) log\frac{p (x)}{q (x)}$

Corresponds to this example $kl=\sum_i P (g=i) log\frac{p (g=i)}{p (g=i|x)}=log\frac{1}{p (g=k|x)}=-logp (g=k|x) $ need to be minimized

2.6. Statistical Models, supervised learning and Function approximation