Logistic regression (logisticregression)--python implementation

Source: Internet
Author: User

1. Overview

Logistic regression (logistic regression) is the most commonly used machine learning method in the industry to estimate the likelihood of something.

In the classic "Mathematical Beauty" also saw it used in advertising prediction, that is, according to an ad by the user click on the possibility of the most likely to be clicked by the user ads placed in the user can see the place, and then called him "you point me ah!" "The user points, you have the money to collect." That's why our computers are now awash with ads. There is a similar possibility that a user buys a product, the likelihood of a patient suffering from a certain disease, and so on. The world is random (except, of course, man-made deterministic systems, but there may be noise or wrong results, but the likelihood of this error is too small, as small as tens of millions of years, as small as negligible), so the occurrence of everything can be expressed by probability or probability (Odds). "Probability" refers to the ratio of the likelihood of something occurring to the probability that it does not occur.

Logistic regression can be used for regression, but also for classification, mainly two classification.

2. Basic theory

2.1Logistic regression and sigmoid functions

Regression: Suppose there are some data points, and we fit them in a straight line (the bar is called the best fit Line), and this fitting process is called regression. The idea of using logistic regression to classify the classification boundary line is to set up the regression formula according to the existing data. The word "regression" here stems from the best fit, which means that the best fit parameter is found, using the optimization algorithm.

The sigmoid function is calculated in the following formula:

    

z=w0x0+w1x1+w2x2+...+wnxn, z=wtx where W is the best parameter (coefficient) We are looking for, X is the input data feature of the classifier.

When x is 0 o'clock, the value of the sigmoid function is 0.5, and as x increases, the corresponding sigmoid value is approximated to 1, and as the x decreases, the sigmoid value is approximated to 0. If the horizontal scale is large enough (as shown), the sigmoid function looks much like a step function.

To implement a logistic regression classifier, we can multiply a regression coefficient on each feature and then add all the result values, substituting the sum into the sigmoid function, resulting in a value between 0-1. Any data greater than 0.5 is divided into 1 classes, and less than 0.5 is classified into Class 0. Therefore, logistic regression can also be considered as a probability estimate.

2.2 Optimization theory

Given the above question, our question now becomes: how much is the optimal regression factor?

z=w0x0+w1x1+w2x2+...+wnxn, z=wTx

  Vector x is the input data of the classifier, the vector w is the best parameter (coefficient) we want to find, so that the classifier is as accurate as possible, in order to find the best parameters, we need some knowledge of the optimization theory.

The following first describes the optimization method of gradient rise , we will learn how to use this method to obtain the best parameters of the data set. Next, we show how to draw the decision boundary graph produced by the gradient rise method, which can visualize the classification effect of the gradient rise method. Finally, we will learn the random gradient ascent method and how to modify it to achieve better results.

2.2.1 Gradient Ascending method

The idea of the gradient rise method is that to find the maximum value of a function, the best way is to explore it along the gradient direction of the function. The gradient of the function f (x, y) is represented by the following formula:

This gradient means moving in the x direction, moving in the y direction, where the function f (x, y) must be defined and micro at the point to be computed. The specific function example is as follows:

      

Note: The gradient ascent algorithm will re-estimate the direction of movement when it reaches each point. Starting with P0, the gradient of the point is calculated, and the function moves to the next point P1 according to the gradient. At the P1 point, the gradient is recalculated again and moves along the new gradient direction to P2. This loops the iteration until the stop condition is met. During the iterative process, the gradient operator always ensures that we can select the best moving direction.

As you can see, the gradient operator always points to the fastest growing direction of the function value. This is the moving direction, not the size of the moving volume. This measure is called the foot length, which is recorded as

Logistic regression (logisticregression)--python implementation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.