Logistic regression is a generalized linear regression analysis model, and the dependent variables of logistic regression can be classified as two or multi-classification, but two is more commonly used.
First, the logistic regression definition
Suppose in more than one independent argument?? 1,?? 2, ... , the probability of remembering Y to take 1 is p=p (y=1| X), the probability of taking 0 is 1-p
The ratio of the probability of taking 1 and taking 0 is p/(1-p), which is called the advantage ratio of the event (odds), and the natural logarithm of the odds is the logistic transformation logit (P) = ln (p/(1?p))
Make logit (p) = ln (p/(1?p)) = Z, p= 1/(1+E-Z) is a logistic function
The above 1/(1+E-Z) is the sigmoid function, where z=β0+β1?? 1+β2?? 1+β3?? 1+ ...
The use of the sigmoid function, is to let the sample point after the operation of the results to limit the 0~1 between the compression of the large vibration of the data, so as to facilitate the classification of the sample Point label (classification with the sigmoid function is calculated to be greater than 0.5 basis)
Note: The dependent variable is a categorical variable and the argument is mapped to the (0,1) interval
Second, the idea of arithmetic
Lineage regression classification algorithm is the application of linear regression in the classification scene
In this scenario, the result is to get a classification label for the sample data, rather than get the regression line
1) Algorithm target
Calculates the vertical distance of the Y value of each point to the fitted line, if
Distance >0, divided into Class A
Distance <0, divided into Class B
2) How to get the fitting line
Can only be assumed, because the line or polygon functions can be expressed as
Y (Fit) =β0+β1?? 1+β2?? 1+β3?? 1+ ...
Where beta is a pending parameter
and?? is the characteristic value of each dimension of the data
So the above problem becomes the sample y (x)-Y (Fit) >0? A:b
3) How to solve a set of optimal beta parameters
Basic idea: Substituting "prior data" to solve the problem by inverse pushing
But it is extremely difficult to solve the parameters for inequalities
A common solution is to make a conversion to the solution of inequalities:
- Compresses the difference between sample y (x)-Y (Fit) to a 0~1 cell,
- Then substituting a large number of sample eigenvalues to obtain a series of output results;
- The output results are compared with the prior classes of the samples, and the parameter values of the fitting line are adjusted according to the comparison, which is the optimal parameter approximation of the fitting line.
To transform the problem into a typical mathematical problem of approximate solution
Third, the implementation of the algorithm
Mathematical representations of Algorithmic thinking:
Set the characteristic value of the dataset to?? 1,?? 2,?? 3 ...
To find out their regression coefficients βi
Set z=β0+β1?? 1+β2?? 1+β3?? 1+., and then substituting Z-values into the sigmoid function and judging the result, you can get the classification label
The question is how to get a proper set of parameter βi?
The analytic approach is difficult to solve, and the iterative method can be used to find the optimal solution conveniently.
To put it simply, we are constantly substituting the sample eigenvalues, calculating the results followed by their actual labels, correcting the parameters according to the difference, and then substituting the new sample values for calculation, and repeating them until there is no need to fix or have reached the preset iteration count.
Note: This process is achieved by the gradient rise method, which obtains the maximum value of the logarithmic likelihood function.
The logistic regression of the algorithm