In the linear regression problem, we assume that in the classification problem, we assume that they are all examples of generalized linear models, and the generalized linear model is the estimation of the linear predictor function of the independent variable as the dependent variable. Many models are based on generalized linear models, such as the traditional linear regression model, the maximum entropy model, the logistic regression, and the Softmax regression.
Exponential distribution Family
Before understanding the generalized linear model, first look at the exponential distribution family (the exponential family)
The exponential distribution family prototype is as follows
If a distribution can be expressed in the form above, then this distribution belongs to the exponential distribution family, first to define the above form of the symbol:
η: Natural parameters of the distribution (natural parameter) or called standard parameters (canonical parameter)
T (y): sufficient statistics, usually with t (y) = y
A (η): Logarithmic partition function (log partition functions)
: Essentially a normalized constant that ensures probability and is 1.
When a T is given, A and B define an exponential distribution with the η parameter. We change η to get the different distributions of exponential distribution families.
Proving that Bernoulli distribution and Gaussian distribution are exponential distribution families
, the Bernoulli distribution mean φ, recorded as Bernoulli (φ), y∈{0, 1}, so p (y = 1;φ) =φ; p (y = 0;φ) = 1−φ
An expression that compares an exponential distribution family can be:
η= log (φ/(1-φ)) we will φ with η, then:φ=1/(1+e-η), is not found and sigmoid function.
This shows that, when given t,a,b, the Bernoulli distribution can be written in the form of exponential distribution families, also known as Bernoulli distributed exponential distribution family.
In the same vein, in the Gaussian distribution , there are:
Comparing exponential distribution families, we get:
Because the variance of the Gaussian distribution is independent of the hypothesis function, we set the variance =1for the sake of calculation, so we get :
So it also shows that the Gaussian distribution is also a kind of exponential distribution family.
Constructing generalized linear models (constructing Glms)
How to construct generalized linear model by exponential distribution family? To build a generalized linear model, we are based on the following three assumptions:
- Given the characteristic attributes and parameters, the conditional probability obeys the exponential distribution family, namely.
- Prediction of expectations, that is, calculation. #h (x) = E[y|x]
- and is linear, that is.
Constructing the least Squares model
In retrospect, in linear regression, the cost function y is obtained by the least squares method. The least squares model is constructed by the generalized linear model.
In linear regression, assuming that the y|x;θ obeys the Gaussian distribution N (μ,σ2) n (μ,σ2), according to our previous derivation, we know µ=η, so according to three fake with
Description
The first equals sign according to our hypothesis 2, =y, that is
The second equals sign according to the expectation of the Gaussian distribution for μ to get
The third equal sign is based on what we can get before, assuming that 1
The fourth equals sign is obtained according to Assumption 3.
At this point, the least squares model is built, which is the source of the linear model used in linear regression. The next task is to use the gradient descent, Newton method to solve θ
Building Logistic Regression
Logistic regression can be used to solve two classification problems, and the objective function of two classification problem is discrete value, we know that we can choose Bernoulli distribution to construct the model of logistic regression by statistical knowledge.
In the previous argument we get η= log (φ/(1-φ)) we will φ with η, then:φ=1/(1+e-η). according to three assumptions, we have
The build is complete, which is the model used in logistic regression.
Build Softmax Regression
Now we consider a multi-classification problem, that is, the response variable Y has k values, that is, Y∈{1 2, ..., k}, first of all, we have to prove that the distribution is also an exponential distribution family.
The output of the multi-classification model is the probability that the sample belongs to the K category, and we can use φ1, ..., φk to represent the probability of this k sample output. Φ1, ..., φk satisfied , but this parameter is somewhat redundant, so we use φ1, ..., φk−1 to express, then
Define T (Y) ∈rk-1 as follows:
Attention:
here T (y) is not equal to Y, where it is a vector of k-1 dimensions, not a real number. Specifies (T (y)) I denotes the element I of the vector T (y)
In addition, a new symbol is introduced, if true in curly braces, then the formula equals 1, and vice versa is 0, for example 1{2 = 3} = 0,1{3 =5−2} = 1,
So the relationship between T (y) and y can be expressed as ,
The relationship can be expressed as:
So there are:
So, polynomial distribution we can also be written as an exponential distribution family, that is, many distributions are also exponential distribution family. So we can use the generalized linear model to fit the
By the expression of the η can get:ηi=log (φi/φk) This is ηi about φi expression, convert it to φi about ηi , for convenience, we make, so there is
So we can find , in the equation of the red box , the function of this about is called the Softmax function (Softmax functions) .
Let's use the generalized linear construction model
According to hypothesis 3, there is ηi =θit x (For i = 1, ..., k−1), whereθ1, ..., θk-1∈rn+1 also here we define θk = 0 So you can get: ηk =θkt x = 0
So the distribution of y in the model under the given x condition is:
This model, which is applied to the multi-classification model, is called Softmax regression, which is the generalization of logistic regression.
For hypothetical functions, we have assumed that 2 can be obtained
So now the final step in solving the objective function is to fit the parameters. Maximum likelihood estimation is obtained
Maximum likelihood function to solve the optimal parameter θ, as described earlier, can use gradient ascending or Newton method.
Generalized linear model (generalized Linear Models)