1 sigmoid function 2 Maximum likelihood estimate Mle and loss function 3 gradient drop 4 Another form of loss function and its gradient
1.1 sigmoid function
Since the two classification result is 1 or 0, which is very similar to the mathematical step function, but the step function in the position of the x=0 mutation, the mutation is difficult to deal with mathematically. Therefore, the sigmoid function is generally used to fit:
G (z) =11+e−z (1) g (z) ={\frac 1{1+e^{-z}}}\tag{1}
Specifically applied to the logistic regression algorithm:
Z=Ω0+Ω1X1+Ω2X2+......+ΩNXN=∑I=0NΩIXI=ΩTX (2) z={\omega}_0+{\omega}_1x_1+{\omega}_2x_2+......+{\omega}_nx_n=\sum_ {i=0}^n{\omega}_ix_i=\mathbf{\omega^tx}\tag{2}
Where Xi x_i represents the value of the sample property (for us, the tag IP), Ωi \omega_i represents the coefficient that the attribute corresponds to (that is, what the algorithm needs to calculate). Note that the x0 x_0 and ω0 \omega_0 are also introduced into the above formula, of which the former constant is 1. The problem then becomes that in the training sample, when the known attribute X and the final classification result Y (1 or 0), how to obtain these coefficients ωi \omega_i, so that the loss is minimal. 1.2 Maximum likelihood estimation of mle and loss function
In machine learning theory, loss functions (loss function) are used to measure the degree of inconsistency between the model's predicted value F (x) f (x) and the real value y y, which is a nonnegative real value function, the smaller the loss function, the better the model (and the problem of fitting, etc.). Loss function is the core part of experiential risk function and also an important part of structural risk function. The structural risk function of the model includes empirical risk items and regular items, which can usually be expressed as follows
ω∗=argminω