Several variants of the Logistic regression

Last Update:2015-12-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: http://blog.xlvector.net/2014-02/different-logistic-regression/

In recent years, advertising system has become one of the important systems of many companies, targeted advertising technology is an important technology in the advertising system, CTR Estimation is an important part of the targeted advertising technology, the Logistic regression is the most commonly used machine learning algorithm to solve the CTR estimation. So this article describes the logistic Regression (hereinafter referred to as LR).

The problem solved

LR is mainly used to solve two kinds of classification problems. The following problems are typical of two types of classification problems:

When a user sees an ad, it will point or not.
Whether a man is a man or a woman
The image in a picture is not a human face
Will a person who borrows money still

The problem of two kinds of classification is the basic problem of machine learning, and all classification algorithms can at least solve two kinds of classification problems, such as:

Decision tree, Random forest, GBDT
SVM, Vector Machine
Gauss Process
Neural network

So why does the CTR estimate problem choose LR, mainly because:

Data size is large, and LR is very low in terms of the computational complexity of training and forecasting
Features are many, the characteristics of the linear transformation, so the problem is basically linear, linear classifier can solve
LR can predict not only what kind of a type this belongs to, but also the probabilities that belong to each class.
The LR model is simple enough to explain the predicted results.
The LR model is simple, which makes parallelization relatively easy

Different types of LR

Since LR was introduced, the academic improvement in it is based on two main aspects:

With what regularization, early is L2 regularization, and recently used more is L1 regularization
With what optimization algorithm, how to converge to the optimal solution in the shortest period of time

Regularization

Regularization is an important technique in machine learning, and its main purpose is to prevent a model from overfitting. At present, the more commonly used regularization has L1, and L2:

L2 regularization that the prior distribution of the weight of a feature is a Gaussian distribution around 0.
L1 regularization that the prior distribution of the weight of a feature is a Laplace distribution around 0

L1 regularization relative and L2 regularization has one advantage, is to join the L1 regularization of the loss function after optimization, the majority of the characteristics of the weight is 0. This feature can significantly reduce the memory footprint of online estimates and increase the speed of predictions because

The characteristic vector x of the main calculation sample on-line and the point multiplication of the model's characteristic weight vector w
W vectors generally need to be stored with HashMap, and a feature with a weight of 0, does not need to be stored, because HashMap does not exist in the feature is the weight of 0
So L1 regularization can reduce the memory consumption of W, while W decreases, the speed of calculating W and X will also increase.

Optimization method

The loss function of the L2 regularization LR is a convex function that can be derivative, which can be optimized by the steepest descent method (gradient method). There are 3 kinds of general gradient method

Batch
Mini Batch
SGD (random gradient method)

These 3 methods are the first proposed optimization methods. By using the gradient method, the Newton method can be used to obtain the characteristic of super linear convergence, so the conjugate gradient method and the Lbfgs are also used to optimize LR. LBFGS is based on L2 regularization, if based on L1 regularization, Microsoft proposed OWLQN algorithm (http://blog.csdn.net/qm1004/article/details/18083637).

Both the gradient method and the quasi-Newton method are both optimized for the frequency school. They are in fact maximum likelihood estimates using different optimization algorithms. Therefore, Bayesian school also proposed the optimization algorithm of Bayesian

Ad Predictor: This is an algorithm proposed by Microsoft Researcher, the paper can refer to Web-scale Bayesian Click-through rate prediction for sponsored Search advertising I N Microsoft ' s Bing Search Engine.

Ad Predictor has several better features

It only needs to scan the data set to converge to the optimal solution, instead of iterating over the data set like the gradient method or quasi-Newton method.
It can not only predict the probability that a sample is a positive sample, but also give the confidence of the probability prediction value.

Ad Predictor is good, but it is based on L2 regularization, which is always unsatisfactory. Google published a paper in 2013 (AD Click prediction:a View from the trenches), introduced a L1 regularization based LR optimization algorithm ftrl-proximal, and has the above Ad Two advantages of predictor.

Parallelization of

There are two kinds of parallelization of algorithms

Lossless parallelization: The algorithm can be parallel in nature, parallel only increases the speed of computation and solves the problem, but it is the same as the result of normal execution.
lossy parallelization: The algorithm itself is not natural parallel, need to do some approximation of the algorithm to achieve parallelization, so that after parallelization and normal execution of the results are not consistent, but similar.

In the algorithm mentioned earlier, Batch-based algorithms (BATCH-GD, LBFGS, owlqn) can be parallelized in a lossless format. The SGD-based algorithm (Ad Predictor, Ftrl-proximal) can only perform lossy parallelization.

Several variants of the Logistic regression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Several variants of the Logistic regression

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support