Chapter III Generalized linear model (GLM)

Source: Internet
Author: User

Generalized linear Model

We have given examples of regression and classification in the preceding example. In the example of regression, $y \mid x;\theta \sim N (U,\sigma ^{2}) $, in the classification example, $y \mid x;\theta \sim Bbernoulli (\phi) $

The generalized linear model is based on the family of exponential functions, and the Gong family prototype is:

$p (Y;\eta) = B (y) exp (\eta^{t}t (y)-A (\eta)) $

$\eta$ is a natural parameter, $T (y) $ for full statistics, generally $t (y) =y$. Select a fixed t,a,b to define a distribution with a parameter of $\eta$.

For the Bernoulli distribution (the mean is $\phi$), there are:

$p (Y=1,\phi) =\phi;p (Y=0;\phi) =1-\phi$

$p (Y;\phi) = \phi^{y} (1-\phi) ^{1-y}$

$p (Y;\phi) = exp (Ylog\phi + (1-y) log (1-\phi)) $

$p (Y;\phi) = exp ((log (\frac{\phi}{1-\phi})) Y+log (1-\phi)) $

So there are:

$T (y) = y$

$a (\eta) =-log (1-\phi) $

$a (\eta) = log (1+e^{\eta}) $

$b (y) =1$

For Gaussian distributions, there are:

$p (Y;u) = \frac{1}{\sqrt{2\pi}}exp (-\frac{1}{2} (y-u) ^{2}) $

$p (Y;u) = \frac{1}{\sqrt{2\pi}}exp (-\frac{1}{2}y^{2}) \cdot exp (uy=\frac{1}{2}u^{2}) $

So there are:

$\eta = u$

$T (y) = y $

$a (\eta) = \frac{u^{2}}{2} = \frac{\eta^{2}}{2}$

$b (Y) = (\frac{1}{\sqrt{2\pi}}) exp (-\frac{1}{2}y^{2}) $

structure GLM

1. $y \mid x;\theta \sim exponentialfamily (\eta) $

2. Given x, our goal is to predict T (y), in most cases T (y) =y, so we can choose to predict output h (x), $h (x) =e\left [y \mid x \right]$

3. The natural parameter $\eta$ and input x are linearly correlated, $\eta = \theta^{t}x$

Ordinary least squares

Ordinary least squares is a special case of the GLM model: Y is continuous, and the conditional distribution of y after the given x is the Gaussian distribution $n (U,\sigma^{2}) $. Therefore, the distribution of exponential function family is Gaussian distribution. As before, the Gaussian distribution U as a family of exponential functions $u =\eta$. So there are:

$h _{\theta} (x) = E\left [y \mid x; \theta \right] = u = \eta =\theta^{t}x$

Logistic regression

In logistic regression, Y takes only 0 and 1, so the Bernoulli distribution is used as the distribution of the exponential function family, so $\phi = \frac{1}{1+e^{-\eta}}$. Further, by $y \mid X;\theta \sim Bernoulli (\phi) $, then $e\left [y \mid x;\theta \right] = \phi $, get:

$h _{\theta} (x) = E\left [y \mid x; \theta \right] $

$h _{\theta} (x) = \phi $

$h _{\theta} (x) = \frac{1}{1+e^{-\eta}}$

$h _{\theta} (x) = \frac{1}{1+e^{-\theta^{t}x}}$

Softmax regression

In logistic regression, the Y discrete value is only two, now consider when y takes multiple values, $y \in {,..., k}$.

To parameterize a polynomial with k-possible outputs, we can use K-parameters $\phi_{1},..., \phi_{2}$ to represent the probability of each output. However, these parameters are redundant because the sum of the K parameters is 1. So we just need to parameterize k-1 variables: $\phi_{i} = P (y=i;\phi) ~ ~ P (y=k;\phi) = 1-\sum_{i=1}^{k-1}\phi_{i}$, for convenience, we make $\phi_{k}= 1-\sum_{i=1} ^{k-1}\phi_{i}$, but remember that it is not a parameter but is determined by other k-1 parameter values.

To make the polynomial a family distribution of exponential functions, define the following $t (y) \in r^{k-1}$:

$ T (1) =\begin{bmatrix} 1\\ 0\\ 0\\ \vdots \\0 \end{bmatrix}$

$ T (2) =\begin{bmatrix} 0\\ 1\\ 0\\ \vdots \\0 \end{bmatrix}$

$ T (k-1) =\begin{bmatrix} 0\\ 0\\ 0\\ \vdots \\1 \end{bmatrix}$

$ T (k) =\begin{bmatrix} 1\\ 0\\ 0\\ \vdots \\0 \end{bmatrix}$

Unlike before, T (y) is not equal to Y,t (y) Here is a k-1-dimensional vector, not a real number. A $ (T (y)) _{i}$ represents the first element of $t (y) $.

Then define a function $1{\cdot}$, when the argument is true, the value of the function is 1, and vice versa is zero. such as 1{2=3}=0.

Therefore, $ (t (y)) _{i}=1{y=i}$, further we have $e[(t (y)) _{i}]=p (y=i) =\phi_{i}$.

It follows that the polynomial also belongs to the exponential function family:

$p (Y;\phi) = \phi_{1}^{1\{y=1\}} \phi_{2}^{1\{y=2\}} \cdots \phi_{k}^{1\{y=k\}}$

$p (Y;\phi) = \phi_{1}^{1\{y=1\}} \phi_{2}^{1\{y=2\}} \cdots \phi_{k}^{1-\sum_{i=1}^{k-1} (T (y)) _{i}}$

$p (Y;\phi) = \phi_{1}^{(t (y)) _{1}} \phi_{2}^{(t (y)) _{2}} \cdots \phi_{k}^{1-\sum_{i=1}^{k-1} (t (y)) _{i}}$

$p (Y;\phi) = exp ((t (y)) _{1}log (\phi_{1}) + (t (y)) _{2}log (\phi_{2}) + \cdots + (1-\sum_{i=1}^{k-1} (t (y)) _{i}) log (\phi_{ K})) $

$p (Y;\phi) =exp ((t (y)) _{1}log (\phi_{1}/\phi_{k}) + (t (y)) _{2}log (\phi_{2}/\phi_{k}) +\cdots+ (t (y)) _{k-1}log (\phi_{ K-1}/\phi_{k}) +log (\phi_{k})) $

$p (Y;\phi) = B (y) exp (\eta^{t}t (y)-A (\eta)) $

which

$ \eta =\begin{bmatrix} log (\phi_{1}/\phi_{k}) \ log (\phi_{2}/\phi_{k}) \ \vdots \\log (\phi_{k-1}/\phi_{k}) \end{ bmatrix}$

$a (\eta) =-log (\eta_{k}) $

$b (y) =1$

Therefore, the following function relationships are available:

$\eta_{i}= \frac{\phi_{i}}{\phi_{k}}$

For convenience, we define:

$\eta_{k} = 0$

So we get the following relational formula:

$e ^{\eta_{i}}= \frac{\phi_{i}}{\phi_{k}}$

$\phi_{k}e^{\eta_{i}} = \phi_{i}$

$\phi_{k}\sum_{i=1}{k}e^{\eta_{i}}=1$

So we get the following response function:

$\phi_{i}= \frac{e^{\eta_{i}}}{\sum_{j=1}^{k}e^{\eta_{j}}}$

This mapping function $\eta$ to $\phi$ is called the Softmax function.

Make $\eta_{i}=\theta_{i}^{t}x ~ ~ (i=1,2,..., k-1), \theta_{1},..., \theta_{k-1}\in r^{n+1}$

Therefore, the following conditions are distributed:

$p (Y=1 \mid x;\theta) = \phi_{i}$

$p (Y=1 \mid x;\theta) = \frac{e^{\eta_{i}}}{\sum_{j=1}^{k}e^{\eta_{j}}}$

$p (Y=1 \mid x;\theta) = \frac{e^{\theta_{i}^{t}x}}{\sum_{j=1}^{k}e^{\theta_{j}^{t}x}}$

Loss function:

Maximum likelihood estimate:

Chapter III Generalized linear Model (GLM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.