first, the basic principle
logical regression and linear regression
The principles of Logistic regression and linear regression are similar, and, in my own understanding, can be described simply as such a process:
(1) Find a suitable predictive function (called Hypothesis in the public class of Andrew NG), which is generally expressed as the H function, which is the sort of function we need to find to predict the results of the input data. This process is very critical, you need to have a certain understanding or analysis of the data, know or guess the predictive function of the "approximate" form, such as linear or non-linear functions.
(2) Construct a cost function (loss function) that represents the deviation between the predicted output (h) and the Training data Class (Y), either the difference between the two (h-y) or other forms. Consider the "loss" of all training data, sum or average the cost, and write the J (Theta) function, which represents the estimate of the deviation of all training data predictive value and the actual category, called the risk function or the expected loss function.
(3) Obviously, the smaller the value of the J (Theta) function means the more accurate the predictive function (i.e. the more accurate the H function), so this step is to find the minimum value of the J (Theta) function. There are different methods to find the minimum value of the function, and there are some gradient descent method (gradient descent) when the Logistic regression is realized. classification problem and sigmoid function
Σ (z) =11+e−z \sigma (z) = \frac{1}{1+e^{-z}}
The sigmoid function looks much like a step function.
Hevy--heaviside Step function
The argument is 0, and the function value is 0.5.
The independent variable tends to be infinite, and the function value is nearly 1.
The independent variable tends to negative infinity, and the function value is nearly 0.
(in order to achieve the logistic regression classifier, we can multiply each feature by a regression coefficient, and then add all the result values,) this sum into the sigmoid function, and then get a range of values between the 0~1. Any data greater than 0.5 is divided into 1 categories, and less than 0.5 is grouped into 0 categories. Therefore, logistic regression can also be considered as a probability estimate.
The parentheses will speak later. In short, the sigmoid is used by the classifier, and the logistic regression calculates the regression coefficients of the best fit. the parameters or coefficients of a linear regression
The result is a linear combination of several attribute (characteristic) values Z=w0∗x0+w1∗x1+...+wn∗xn z=w_0*x_0+w_1*x_1+...+w_n*x_n
Write as Vector:
Z=WTX (Type 1) z=w^tx \tag{type 1}
Where the vector x is the classifier's input data, Vector w is the best parameter predictive function We're going to find.
Combining the last two summaries, the predictive function with logistic regression is:
hθ (x) =11+e−θt (x) h_\theta (x) = \frac{1}{1+e^{-\theta^t (x)}}, where Θ\theta is the estimated value of W in the previous section. Cost function
Loss function: Indicates the deviation between the predicted output (h) and the Training data Class (Y), either the difference between the two (h-y) or other forms
The most common form of loss function is (h (i) −y (i)) (h^{(i)}-y^{(i)}),
Superscript (i) represents the first sample, not the exponent
The common forms of risk functions are:
J (θ) =1m∑i=1n (hθ (x (i)) −y (i)) 2