Stanford CS229 Machine Learning course Note II: GLM Generalized linear model and logistic regression

Source: Internet
Author: User

has been heard of logistic regression logistic regression, such as Dr. Wu in the "beauty of mathematics" mentioned that Google is the use of logistic regression to predict the click-through of search ads. Because I have been interested in personalized advertising, so crazy Google over the logical return of data, but not a Web page data can be very good to tell the logic of the return is what. Fortunately, in the third section of CS229 introduced the logistic regression, the fourth lesson introduced the generalized linear model, the synthesis finally let me have a certain understanding to the logistic regression. Contrary to the order of the course, I think we should first understand the generalized linear model and then look at the logistic regression, perhaps this is why the logical regression of the Web page data always feel foggy.

Generalized linear model of generalized Linear model (GLM)

This section is mainly about the definition and hypothesis of the generalized linear model, in order to see the logical regression, we have to read the patience.

1.The Exponential family Index distribution family

Since the generalized linear model is centered around exponential distribution, it is necessary to introduce, in the words of Andrew the Great God, "although not all, most of the distributions we have seen belong to the exponential distribution family, for example: Bernoulli Bernoulli distribution, Gaussian Gaussian distribution, multinomial distribution, Poisson poisson distribution, gamma distribution, exponential distribution, Dirichlet distribution ... "The condition of being subject to exponential distribution is that the probability distribution can be written in the following form:

η is called natural parameter, which is the only parameter of the exponential distribution family.
T (y) is called sufficient statistic, and in many cases T (y) =y A (η) is called the log partition function
t function, a function, B function together determine a distribution
Next, let's see why the normal distribution (Gaussian distribution) belongs to the exponential distribution family:
Normal distribution (normal distribution has two parameters μ mean and σ standard deviation, when doing linear regression, we are concerned about the mean and the standard deviation does not affect the model of learning and parameter θ choice, so here σ set to 1 easy to calculate)

2. Three assumptions that form a generalized linear model
    • P (y | x;θ) ∼exponentialfamily (η). The conditional probability distribution of output variable based on input variable follows exponential distribution family
    • Our goal are to predict the expected value of T (y) given x. For a given input variable x, the goal of learning is to predict the expected value of T (y), and T (y) is often y
    • The natural Parameterηand the inputs x is related LINEARLY:Η=ΘT x.η and the association of input variable x is linear: Η=θt x

These three assumptions actually indicate how to map from input variables to output variables and probabilistic models, for example: the conditional probability distributions of linear regression are normal distributions belonging to the exponential distribution family (refer to the likelihood function part of linear regression in note one); Our goal is to predict T (Y) expectations, by the above calculation we know T (y) =y , and Y's expectation is the normal distribution of the parameter μ; by the above calculation we know μ=η, and η=θt x. Therefore, linear regression is a special case of generalized linear regression, and its model is:

Logistic Regression Logistic regression 1. Model

Logistic regression solves the problem of classification, and is divided into two categories. For example: Will the user click on an ad link? Will users return? Will the front of a coin be thrown upwards? Therefore, from the point of view of probability, we should immediately think of the Bernoulli distribution to estimate the probability of the occurrence of events. Constructs a logistic regression model from the perspective of a generalized linear model:
1.1 Job distribution belongs to the exponential distribution family (parameter φ refers to the probability of Y=1, that is, the probability of event occurrence)

1.2 The goal of learning is to predict the expected value of T (y), while the Bernoulli distribution is T (y) =y, and we know that the expectation of Bernoulli distribution is the parameter Φ, E (y) =φ.
1.3 by η= log (φ/(1-φ) can be launched φ= 1/(1+e-η) (This is the so-called logistic function, is also the reason for the logical regression name), and then the η=θt x into the formula, and finally we get the model of logistic regression:

Because the Bernoulli distribution parameter φ is both the expected distribution and the probability of the occurrence of the event, the meaning of the logistic regression model is: The probability of an event occurring in the output variable (two-variable) under the condition of the given input variable combination. For example: Predict when the user is the first time to visit (input variable 1), the ad link is the hot copy (input variable 2) under the conditions of the ad link is clicked (output variable) the probability of how much. See here, I believe you should be able to understand: why the logical function to grow like this, why the logistic regression can work.

2. Strategy

The strategy used by logistic regression is to maximize the logarithmic likelihood function, and its likelihood function and logarithmic likelihood function are:

3. Algorithms

3.1 Gradient Ascent gradient rise
We can use gradient descent to find the minimum point, which in turn can be used to find the maximum point with a gradient rise: first, add the logical function used in the calculation, and the derivative of the logical function:


Based on this, we obtain a gradient for the derivation of the logarithmic likelihood function.

This derivative is the same as the derivative in linear regression, but it is important to note that the model hθ (x) is not the same, so the iterative rules that eventually use the random gradient rise are as follows:

3.2 Newton ' s method Newton methods
Knowing the extremum point from the high number is the place where the derivative is 0, so the other method of maximizing the logarithmic likelihood function is the point of finding the derivative of the logarithmic likelihood function as 0. Newton's method is the method of obtaining a point with a 0 derivative of the logarithmic likelihood function:

When the parameter θ has only one, the iterative rules of the Newton method are:

When the parameter θ is more than one, the Newton method iterates over the rule:

Newton's method usually has a faster convergence rate than the batch gradient, and it takes a much smaller number of iterations to get close to the minimum value. However, when the parameters of the model are many (n), the computational cost of the Hessian matrix will be large, resulting in a slower convergence rate, but when the number of arguments is not long, the Newton method is usually much faster than the gradient descent.

Summarize
      1. So many mainstream probability distributions belong to the exponential distribution family.
      2. Remembering the three assumptions that make up the generalized linear model is actually a bridge to build the model
      3. Understanding the logistic regression model is a probabilistic model based on the Bernoulli distribution, meaning that the probability of the occurrence of one element in the output variable (two variables) under the condition of the given input variable combination. It is therefore suitable for predicting ad clicks.
      4. There are gradient descent algorithms with gradient rise algorithms, the difference between the two is only on the +/-number. In addition, the Newton method can be used to determine the maximum/minimum value of the model by obtaining a point with a derivative of 0.

Stanford CS229 Machine Learning course Note II: GLM Generalized linear model and logistic regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.