Summary:
1. Algorithm overview
2. Algorithm derivation
3. Algorithm features and advantages and disadvantages
4. Precautions
5. Implementation and specific examples
6. Applicable occasions
Content:
1. Algorithm overview
The most basic LR classifier is suitable for classifying the two classification (class 0, Class 1) target, which takes the linear combination of sigma (theta * Xi) as the sample feature as an independent variable, and uses the logistic function to map the argument to (0,1).
where the logistic function (sigmoid function is):
The function graph is:
This gives the model function of LR as:, which is to be determined.
2. Algorithm derivation
The established likelihood function:
To calculate the logarithm of the above function:
Do the following function transformation:
The minimum value is obtained by gradient descent method . The initial value of θ can be all 1.0, the update process is: (J Table Sample J Properties, a total of n;
Derivation:
Therefore, the update process of θ (which can have an initial value of all 1.0) can be written as:
(I represents the first statistical sample, J of the sample J property; A For step)
Note: The loss function of LR can be regarded as logarithmic loss or model derivation process is a process of estimating parameters using maximum logarithmic likelihood method.
the solution of matrix form (vectorization) :
The matrix form of the contract training data is as follows, each action of X is a training sample, and each column is given a different special value:
The parameter A of G (a) is a column vector, so the G function is implemented to support the column vector as a parameter, and the column vector is returned. The hθ (x)-Y can be evaluated by the G (A)-y calculation.
The θ update process can be changed to:
In summary, the following steps are vectorization after θ update:
(1) a=x*θ (matrix multiplication here, X is (m,n+1) dimension vector, θ is (n+1,1) Willi Vector, A is (m,1) dimension vector)
(2) Seeking E=g (A)-Y (E, Y is (m,1) Willi Vector)
(3) Request (A for step)
3. Algorithm features and advantages and disadvantages
The LR classifier is suitable for data types: numeric and nominal data.
can be used for probabilistic predictions , and can also be used for classification.
The advantage is that the calculation cost is not high, easy to understand and realize, its disadvantage is that it is easy to fit, the classification accuracy may be not high.
There is no need to satisfy the conditional independent hypothesis (compared to NB) between feature, but the contribution of each feature is calculated independently (compared to DT).
4. Precautions
The choice of step A: The value is too small to converge slowly, the value is too large to ensure that the iterative process converges (over the minimum).
Normalization : The training data of multidimensional features should be solved by the gradient method, the eigenvalues must be scaled to ensure that the value range of the feature will be convergent in the same scale in the calculation process.
Optimization method Selection: L-BFGS, fast convergence speed; (this doesn't quite understand)
regularization : L1 can select features to remove collinearity effects, and L1 regularization is used in loss functions to avoid over-fitting the simultaneous output sparse model;
(from Http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)
5. Implementation and specific examples
Main uses of logistic regression:
Looking for risk factors: Looking for a disease risk factors, etc.;
Prediction: According to the model, the probability of the occurrence of a disease or a certain condition is predicted in the case of different self-variables.
- Ctr Forecast : http://www.flickering.cn/uncategorized/2014/10/conversion rate estimate-2 logistic regression technology/?utm_source=tuicool&utm_medium =referral
- Examples of feature selection using LR L1 regular entries: https://github.com/Tongzhenguo/Python-Project/blob/master/learntoscikit/ lrforfeatureselect.py
- An example of a bank's wind control: http://www.weixinla.com/document/44745246.html
6. Applicable occasions
Support for large-scale data: supported, and distributed implementations
Feature dimension: can be very high
Whether there is an Online algorithm: available (refer to from)
Feature processing: Support for numeric data, category type requires 0-1 encoding
Logistic regression (LR) Summary review