Classification and logistic regression (classification and logistic regression)

Source: Internet
Author: User

The classification problem is similar to the linear regression problem, but in the classification problem, we predict that the Y value is contained in a small discrete data set. First, to recognize the two-dollar classification (binary classification), in the two-dollar category, the value of Y can only be 0 and 1. For example, we want to do a spam classifier, the message is the characteristics, and for Y, when it is 1 spam, 0 indicates that the message is a normal message. So 0 is called the Negative class (negative Class), 1 is the positive class (positive classes)

Logistic regression

First of all, whether a tumor is a classification of malignant tumors, we may initially think of a linear regression method to solve, such as:

We know that the linear regression problem can only predict continuous values, and the classification problem, we predict the value can only be 0 or 1, so we may take a critical point, greater than take 1, and vice versa take 0. The hθ (x) above seems to solve the problem very well. So as

It is not appropriate to use the linear regression model to solve this problem, because the predicted value can go beyond the [0,1] range. Here we introduce a new model, the logistic regression, whose output variable range is always between 0 and 1. As follows:

G (z) is called a logistic function or sigmoid function, and its image is as follows:

From the image can be seen when z→∞ g (z) →1,z→−∞ when G (z) →0. So make x0 = 1, then θ T x =θ0 +∑nj=1θjxj.

Before we get to the point, let's start by looking at a useful feature of the logistic function. As follows

Now back to the point, how do we fit θ to a given logistic regression?

Assume:

P (y = 1 | x;θ) = hθ (x) # hθ (x) is, for a given input variable, calculates the probability of the output variable =1 based on the selected parameter (estimated probablity)
P (y = 0 | x;θ) = 1−hθ (x)

Combine the above two formulas to get: P (y | x;θ) = (hθ (x)) Y (1−hθ (x)) 1−y

Gradient Rise Method

In linear regression, our idea is to construct the likelihood function, then to calculate the maximum likelihood estimate, finally we get the iterative rule of θ, then in the logistic regression, we also have the same method, because finally we demand maximum likelihood estimate, so the algorithm used is the gradient rise.

Assuming that the training samples are independent of each other, the likelihood function is expressed as:

Now we take the logarithm of the likelihood function, as follows

Now all we need to do is maximize the likelihood estimate, and here we need to use the gradient rise method . So with vectors, the update rules are as follows

Note: Because we are the maximum likelihood estimate, so here is the exact, not minus sign.

For example, let's use the gradient rise rule for the following one by one training samples:

The second step in the above operation is to apply the feature g′ (z) = g (z) (1−g (z)) that we have previously pushed, so we get the update rule:

We found that this update rule is consistent with the update rules of the LMS algorithm, but it should be noted that this is a two completely different algorithm. Here is about the nonlinear function.

This is not just a coincidence, the deeper reason is mentioned in the generalized linear model GLM.

Maximized in front? (θ) We use a gradient rise, here, to introduce a maximum? Method of (θ)---Newton's method (Newton ' s methods)

Newton's method (Newton ' s method)

Given the function:We're going to find a theta so that f (θ) = 0 is established, note the θ∈rhere, then the Newton method update rule is as follows:

The execution of Newton's law is as follows:

By asking us to give the derivative of the point the tangent to the x-axis is the point at which the iteration is repeated, until f (θ) = 0 (Infinity approximation)

So for the f (θ) = 0, the Newton method is a kind of, then, how to use Newton method to solve the maximization? (θ)?

Along the line of thought, when? ( Θ) The largest time, =0 (θ), so this gets updated as follows:

In logistic regression,θ is a vector , so the Newton method at this time can be expressed as:

∇θ? (θ)? (θ) of the partial derivative of θi ' s, called the Black plug matrix (Hessian matrix) , is a n*n matrix, n is the number of characteristic quantities,

Newton's convergence rate is faster than the batch processing gradient, it can be very close to the minimum value only by iteration, but when n is very large, the inverse cost of the black plug matrix and black plug matrix is very large for each iteration.

Finally, simply mention the perceptual machine algorithm

Perceptual Machine Algorithm (the Perceptron learning algorithm)

Modify the logistic regression, now force its output is not 0 is 1, then the G is a critical function (threshold functions)

hθ (x) = g (θ T x) we get the update rule as follows:

This is the perceptual machine algorithm.

Classification and logistic regression (classification and logistic regression)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.