Both logistic regression and linear regression are one of the generalized linear models, and then let's explain why this is the case.
1. Exponential family distribution
Exponential family distribution and exponential distribution are not the same, in the probability of statistical distribution can be expressed in the exponential family distribution, such as Gaussian distribution, Bernoulli distribution, polynomial distribution, Poisson distribution. The expression of an exponential family distribution is as follows
whichηIt's natural parameter,T(y) is sufficient statistics,exp? A(η) is to play the role of normalization. Identified the T , a , b " > T a , b , we can determine that a parameter is & #x03B7; " > The exponential family distribution of the η .
T ( y ) "> exp & #x2212; a ( & #x03B7; ) ) "> T , a , b " > & #x03B7; "> A lot of the familiar probability distributions in statistics are specific forms of exponential family distribution. Here we describe the Bernoulli distributions and Gaussian distributions, thus deriving the expressions for logistic regression and linear regression
& #x03B7; "> T ( y ) "> exp & #x2212; a ( & #x03B7; ) ) "> T , a , b " > & #x03B7; "> 1) Bernoulli distribution
& #x03B7; "> T ( y ) "> exp & #x2212; a ( & #x03B7; ) ) "> T , a , b " > & #x03B7; "> We will represent the Bernoulli distribution in the form of exponential family distribution
& #x03B7 ; " > T ( y ) "> exp & #x2212; a ( & #x03B7; ) ) "> T , a , b " > & #x03B7; ">
The Bernoulli distribution is written in the form of exponential family distribution, and each of the exponential family distribution is split, then there is
We can derive the expression of φ according to the above formula, the form of which is the form of the sigmoid function.
2) Gaussian distribution
To represent the Gaussian distribution in the form of an exponential family
Here we assume that the variance is 1, simplifying the formula, which is convenient for our derivation. Splits each item in the exponential family distribution
2. Generalized linear model
Whether we are doing classification or regression, we are predicting a function relationship between a random variable y and a random variable x. Before we deduce a linear model, we need to make three assumptions:
1) P (y|x;θ) subject to exponential family distribution
2) Given x, our aim is to predict the expectation of T (Y) under condition x. In general t (y) = y, which means we want to predict H (x) = E[y|x]
3) The parameter η and the input x are linearly Related: & #x03B7; = & #x03B8; T x " >η=θtx
With these three assumptions, we can begin to derive our linear model, which is called a generalized linear model for such a linear model.
least squares (linear regression)
Assumep (y| X; θ) ~N< Span id= "mathjax-span-394" class= "Mo" > (μ , σ2 μ may be dependent on X, then there are
Because the output obeys the Gaussian distribution, it is expected to be μ, and then combined with the above three-day hypothesis, can deduce the expression of the linear regression. Therefore, the response variables of the linear regression model are subjected to Gaussian distribution (normal distribution).
Logistic regression (LR)
Logistic regression is a two classification problem,Y∈{0,1}, for the two classification problem, we assume thatp(y| x; θ) ~ Bernoulli ( ?), i.e. the response variable obeys the Bernoulli distribution. So there are
Therefore, we can see how the expression of logistic regression is derived, and why the sigmoid function is used to deal with the nonlinear problem.
3. Logistic regression
Logistic regression is evolved on the basis of linear regression, the logistic regression is actually a model to deal with two classification problem, output y∈{0, 1}, in order to satisfy such output, we introduce the sigmoid function to control the output value of the row number in the range of (0, 1) , the sigmoid function expression is as follows
Because the logistic regression is a two classification problem, obeys the Bernoulli distribution, the output result is expressed in the form of probability, can write the expression
To facilitate the subsequent analysis, we integrate the segmented function
For a given training sample, this is what has happened, in the probability of statistics that has happened should be the most probability of the event (the probability of a small event is not easy to happen), so you can use the maximum likelihood method to solve the model parameters, we will all samples of the joint distribution probability given
To facilitate the calculation, we convert the likelihood function into a logarithmic likelihood function
The function above is the maximum value, and our usual loss function is to find the minimum value, so we can go into
For the loss function J (W) is more complex, using the normal equation to obtain the solution of the parameter is very difficult, so the introduction of gradient descent method (the negative direction of the gradient is the loss function of the fastest descent direction), using gradient descent to minimize the loss function.
Machine Learning Algorithm Summary (eight)--Generalized linear model (linear regression, logistic regression)