Reprint Please specify source: http://www.cnblogs.com/BYRans/
The previous article has introduced a regression and an example of a classification. In the logistic regression model we assume that:
In the classification problem we assume that:
They are all examples of generalized linear models that need to understand the exponential distribution family before understanding the generalized linear model.
Exponential distribution family (the exponential Family)
If a distribution can be expressed in the following formula, then this distribution belongs to the exponential distribution family:
Y is a random variable in the formula, and H (x) is called the base measure (base measure);
η is called the distribution of natural parameters (natural parameter), also known as standard parameters (canonical parameter);
T (y) is called sufficient statistics, usually T (y) =y;
A (η) is called a logarithmic partition function (log partition functions);
Essentially a normalized constant that ensures the probability and is 1.
When T (y) is fixed, a (η), B (y) defines an exponential distribution with the η parameter. We change the η to get the different distributions of this distribution.
The Bernoulli distribution belongs to the exponential distribution family. The Bernoulli distribution mean is φ, written as Bernoulli (φ), is a two-value distribution, y∈{0, 1}. So p (y = 1;φ) =φ; P (y = 0;φ) = 1−φ. When we change φ, we get the Bernoulli distribution of different mean values. The process of converting the Bernoulli distribution expression into an exponential distribution family expression is as follows:
which
Another example of Gaussian distribution, the Gaussian distribution is also an exponential distribution family. The linear model can be deduced from the Gaussian distribution (the derivation process will be explained in the EM algorithm), and the hypothetical function of the star model can tell that the variance of the Gaussian distribution is independent of the hypothesis function, so we set the variance = 1 for the sake of calculation. The derivation of the Gaussian distribution into an exponential distribution family is as follows:
which
Many other divisions also belong to the exponential distribution family, for example: Bernoulli distribution (Bernoulli), Gaussian distribution (Gaussian), polynomial distribution (multinomial), Poisson distribution (Poisson), gamma distribution (gamma), Exponential distribution (exponential), β distribution, Dirichlet distribution, Wishart distribution.
Building a generalized linear model (constructing Glms)
In the classification and regression problem, we predict y by constructing a model about X. This problem can be solved by using generalized linear models (Generalized linear models,gmls) . Building a generalized linear model based on three assumptions, we can also understand that we are based on three design decisions, and these three decisions help us build a generalized linear model:
- , assuming that an exponential distribution of the parameters is satisfied. For example, given the input x and the parameter θ, you can construct the Y-about expression.
- Given x, our goal is to determine T (y), that is. In most cases t (y) =y, then what we actually want to determine is. That is, given the X, let's say our objective function is. (The expected value in logistic regression is so that the objective function h is φ; the expected value in linear regression is μ, while the Gaussian distribution, therefore, is the objective function in linear regression).
- Suppose that the natural parameters η and X are linearly correlated, that is to say:
Suppose you have a prediction problem: predict the number of shoppers in a store at any one hour based on feature store promotions, recent ads, weather, days of the week, and so on.
According to the probability knowledge, x and Y meet the Poisson distribution. The Poisson distribution belongs to the exponential distribution family, we can construct a generalized linear model to construct the predictive model using the above 3 hypotheses.
Glms constructing the minimum two models
The optimization target Y (loss function) in linear regression is obtained by the least squares method, and the least squares model can be constructed using the generalized linear model. Three assumptions:
- The goal variable y obtained by the least squares method is a continuous value, and we assume that the distribution of y under the given x conforms to the Gaussian distribution. Suppose that the exponentialfamily (η) in 1 is a Gaussian distribution.
- in the Gaussian distribution , the target function
- Assume:
The derivation process is as follows:
The first step transforms according to hypothesis 2:
The second step transforms according to Y|x; Θ ∼ N (μ,σ2), the expected value of the Gaussian distribution is μ
The third step is based on hypothesis 1: Gaussian distribution
The fourth step is based on hypothesis 3:
The least squares model has now been constructed using generalized linear models, and the next task is to use the gradient descent, Newton method to solve θ. For the content of the gradient descent and Newton method, please refer to the previous handout.
Glms Building Logistic Regression
Logistic regression can be used to solve the two classification problem, whereas the classification problem objective function y is a discrete value of two values. According to statistical knowledge, two classification problem can choose Bernoulli distribution to build the model.
In the exponential distribution family expression of the Bernoulli distribution we have known:, thus obtained.
Three assumptions for building a generalized linear model:
- Assuming that the Bernoulli distribution is met,
- , in Bernoulli distribution
The derivation process is as follows:
As with the least squares model, the next work is done by gradient descent or Newton's method.
Note the above push to the result, recall, in the logistic regression, we choose the sigmoid function.
The reason for using this g (z) as sigmoid function in logistic regression is supported by a set of theories, which is generalized linear model.
Generalized linear model-Andrew ng Machine Learning public Lesson Note 1.6