1. Use it for classification, not for Regression
Your goal is to set the labels 0, 1 on the test data table.
2. A case
We have a bunch of data, assuming there is only one feature, and the tumor size tumorsize. We want to determine whether it is benign (benign, 0) or malignant (malignant, 1 ).
The data is assumed as follows:
We can use a linear function h (x) to divide this space. One side is benign and the other side is malignant.
, X0 = 1.
The coefficient ω can be obtained through gradient descent.
Due to the influence of an odd dollar instance on the rightmost side, the straight line we get is likely to be like this. For many instances in the training set, the wrong side is located. It is not always a good idea to use the linear model for classification.
Furthermore, the value of h (x) may be greater than 1, which may be less than 0. We want 0 <= h (x) <= 1.
Therefore, we introduced the sigmoid function.
3. Sigmoid Function
The sigmoid function is a function that defines the entire real number field and the value field is (0, 1. When X-> + infinity, sig (x)-> 1; when X->-infinity, sig (x)-> 0.
Order
In fact, the output of this function can be viewed as P (y = 1 | X, ω ). If y =-1 and y = 1 are output:
That is:
The image of the former is that the image of the latter is symmetric.
We have a new hypothesis. The output is between (0, 1). When h '(x)> 0.5, we think the tumor is malignant (1 ), when h '(x) <0.5 is benign. When h '(x) = 0.5, the result is random.
4. Decoding Algorithm
For logistic regression without regularization, we can use the gradient descent algorithm to minimize a negative log-likelihood:
The above Code adds a prior probability and implements regularization. Assume that ω follows a normal distribution. The reason for regularization is to prevent overfitting.
Take the above as the minimum goal and use gradient descent to obtain the minimum ω '. This is the final parameter of the trained model.
For new data x', we use the new f (x') = H' (x', ω '). If f (x')> 0.5, the prediction result is malignant. If f (x') <0.5, the prediction result is benign. At the same time, this function value is also the probability that the result is malignant.