Exponential distribution Family
The first need to mention the exponential distribution family, which refers to a series of distributions, as long as its probability density function can be written in the following form:
\ (\begin{aligned} p (Y;\eta) =b (y) exp (\eta^tt (y)-A (\eta)) \end{aligned}\)
In general, many distributions (such as Gaussian distribution, Poisson distribution, two-item distribution, gamma distribution, etc.) belong to the exponential distribution family. The distribution family has many good properties, see the Generalized Linear Models (2nd ed.) Chapter 3.3 of the book.
Construction hypothesis of generalized linear model
The generalized linear model is mainly based on the following assumptions:
The distribution of 1.\ (y|x;\theta\) belongs to the exponential distribution family
2. The predicted value is \ (t (y) \), so the model is \ (E (t (y) |x) \)
3. Model linearity, i.e. \ (\eta=\theta^tx\)
Derivation of linear regression and logistic regression model
in linear regression, assuming that \ (y|x;\theta\) obeys the Gaussian distribution \ (N (\mu,\sigma^2) \), it is written as an exponential distribution family as follows:
\ (\begin{aligned} p (Y|x;\theta) &={\frac{1}{\sqrt{2\pi}\sigma}}exp (-\frac{(Y-\MU) ^2}{2\sigma^2}) \\&={\ Frac{1}{\sqrt{2\pi}\sigma}}e^{-\frac{y}{2\sigma^2}}exp (\frac{{\mu}y}{\sigma^2}-\frac{\mu^2}{2\sigma^2}) \end{ aligned} \)
Note that here \ (\eta\) and \ (t (y) \) can have a variety of ways to satisfy the above, but according to the second hypothesis above, because we need to predict \ (y\), then \ (t (y) =y\), so there is
\ (\begin{aligned} b (y) = {\frac{1}{\sqrt{2\pi}\sigma}}e^{-\frac{y}{2\sigma^2}}\end{aligned}\)
\ (\begin{aligned} \eta =\frac{\mu}{\sigma^2}\end{aligned}\)
\ (\begin{aligned} A (\eta) =\frac{\mu^2}{2\sigma^2}\end{aligned}\)
thus:
\ (\begin{aligned} h_\theta (x) =e (Y|x;\theta) =\mu=\sigma^2\eta=\sigma^2\theta^tx \end{aligned}\)
Here \ (\begin{aligned}\sigma^2 \end{aligned}\) is a constant, the upper type can be written as:
\ (\begin{aligned} h_\theta (x) =\theta^{' t}x \end{aligned}\)
This is the source of the linear model used in linear regression. Similarly, for logistic regression, there is
\ (\begin{aligned} p (Y|x;\theta) &=\phi^y (1-\phi) ^{1-y}\\&=exp (ylog{\phi}+ (1-y) log{(1-\phi)}) \\&=exp ( Log{\frac{\phi}{1-\phi}}+log (1-\phi)) \end{aligned} \)
The
\ (\begin{aligned} T (y) =y \end{aligned}\)
\ (\begin{aligned} b (y) = 1 \end{aligned}\)
\ (\begin{aligned} \eta =log{\frac{\phi}{1-\phi}} \end{aligned}\)
\ (\begin{aligned} A (\eta) =-log (1-\phi) \end{aligned}\)
This can be done by:
\ (\begin{aligned} \phi=\frac{1}{1+e^{-\eta}} \end{aligned}\)
Therefore there are
\ (\begin{aligned} h_\theta (x) =e (Y|x;\theta) =\phi=\frac{1}{1+e^{-\eta}}=\frac{1}{1+e^{-\theta^t{x}}} \end{aligned }\)
This is the model used for logistic regression.
Similarly, for other distributions, we can also write a corresponding regression model. The model of linear regression and logistic regression is given, and the parameters can be obtained by maximum likelihood estimation and gradient descent method.
Problems and thinking
1. Build the three assumptions of GLM, where it is not understood to assume what role this model has played in the building. If the distribution does not belong to the exponential distribution family, is it possible to construct other forms of linear models? There is understanding of the students look at the liberal enlighten.
2. Assuming that three is the linear hypothesis of the model, it also shows that logistic regression can only deal with linear sub-conditions .
Stanford Machine Learning Implementation and analysis four (generalized linear model)