Logistic regression LR

Last Update:2016-09-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Logical regression algorithm believe that many people are familiar with, but also I am more familiar with one of the algorithms, graduation thesis at the time of the project is to use this algorithm. This algorithm may not want random forest, SVM, neural network, GBDT and other classification algorithms so complex and so sophisticated, but definitely not underestimate this algorithm, because it has several advantages is that several algorithms can not be achieved, one is the logical regression algorithm has been relatively mature, the prediction is more accurate The second is that the coefficient of the model is easy to understand, easy to explain, not a black box model, especially in banking, 80% of the prediction is the use of logistic regression; The result is the probability value, can do ranking model, four is the training fast. Of course, it also has shortcomings, more than the classification of Y is not very suitable. Let me introduce the model below.

First, Logistic regression LR Introduction

The first thing to know is that logistic regression is considered when your target variable is a categorical variable and is used primarily for two classification problems. For example, doctors want to judge whether the tumor is malignant or benign by the tumor's size x1, length x2, species x3 and so on, when the target variable y is the categorical variable (0 benign tumors, 1 malignant tumors). Obviously we want to make predictions like a linear regression, which is a linear relationship between columns X and Y, but because y is a categorical variable, its value can only be 0, 1, or 0,1,2, and so on, it can not be negative infinity to positive infinity, how to solve this problem? At this time introduced a sigmoid function, the nature of this function is very good, the input of x can be negative infinity to positive infinity, and the output y is always [0,1], and when x=0, the value of Y is 0.5, in the form of a probability. X=0 time y=0.5 This is the decision boundary. When you want to determine whether a tumour is benign or malignant, we are actually trying to find the boundary that separates the two types of samples, called the decision boundary.

and through the sigmoid function, we can embed the function of the linear representation we like, when the value of Theta*x is greater than 0, then H (x) obtains a probability value greater than 0.5, indicating that it belongs to the classification; When the value of theta*x is less than 0, H (x) is less than 0.5, indicating that it does not belong to that category. This also forms the logistic regression we see, specifically as follows:

Where Theta is a vector,

Second, logistic regression estimation (minimizing loss function loss functions)

Loss function is the most common concept in machine learning, it is used to measure mean square error [(model estimate-model actual value) ^2/n] minimum, that is, the accuracy of prediction, so the loss function needs to be minimal, the obtained parameters are optimal. (The least squares estimate in linear regression is also the result) but because this loss function of the logistic regression is non-convex, the global lowest point cannot be found. Therefore, it is necessary to use another method to convert it to the maximum likelihood, as follows, the specific derivation of the solution can be found in the blog http://blog.csdn.net/suipingsp/article/details/41822313 Another good way to understand is that If y=1, you dare give an H (x) a small probability such as 0.01, then the loss function will become very large:

At this time the loss function becomes the convex function, the theta solution, is the gradient descent method to obtain the minimum value, at this time joins the regularization item, is solves the overfitting question. (overfitting problem: If our model has a lot of characteristics, the model is very complex, the model has a good fit to the original data, but loses its generality, it is very bad to predict the new predictor variable.) (heard the cold Yang small teacher gave a very good understanding of the example, is that the student is only hard to remember the topic, and did not grasp the law, the college entrance exam is not necessarily good) how to solve it? Limit the parameter temple tower, the loss function plus the restrictions on theta, that is, if the theta too much too large, then the penalty is given. L2 regularization of the. ）

The formula will always be iterated until convergence is reached (() decreases in each iteration, and if a step decreases by less than a small value () (less than 0.001), ) or a stop condition (such as the number of iterations reaching a specified value or the algorithm reaching an allowable error range). The process of converting to vectors can also be found in the above blog:http://blog.csdn.net/suipingsp/article/details/41822313

Third, LR Application Experience

If continuous variables, pay attention to scaling, the scale unit is standardized. LR is sensitive to sample distribution, so pay attention to sample balance (Y=1 cannot be too small) sample size is sufficient in the case of the use of the next sampling, the case of insufficient sampling.

LR is very important for feature processing, including the introduction of personalized factors by combining features (FM,FFM), the frequency of attention features, clustering and hashing. But LR is not afraid of the characteristics of large, gbdt more afraid.

LR and FM for sparse high-dimensional feature processing is stress-free, GBDT for continuous values will find the appropriate segmentation point, xgboost can also handle the category type of feature, without one-hot, flat expansion of the high-dimensional sparse feature is not good for it.

In the aspect of algorithm tuning, choose appropriate regularization, regularization coefficient, convergence threshold E, Iteration wheel number, adjust loss function given different weights, bagging or other modes of model fusion, optimal algorithm selection (' NEWTON-CG ', ' lbfgs ', ' Liblinear '); bagging or other model fusion. The LR in Sklearn is actually the liblinear encapsulation.

The evaluation of the model mainly uses the ROC curve.

Logistic regression LR

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Logistic regression LR

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Logistic regression LR

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support