Classification and logistic regression
Next we discuss the classification problem, which is similar to the regression problem, except that the value of Y is only a few discrete values. Now let's consider the two classification problem, when Y has only 0 and 12 values.
Logistic regression
Construct the hypothetical function $h_{\theta} (x) $:
$h _{\theta} (x) =g (\theta^{(x)}) =\frac{1}{1+e^{-\theta^{t}x}}$
which
$g (z) =\frac{1}{1+e^{-z}}$
$g ^{'} (z) =g (z) (1-g (z)) $
The $g (z) $ function image is as follows:
$g ^{'} (z) $ function image as follows:
Assume:
$P (y=1\mid X;\theta) = H_{\theta} (x) $
$P (y=0\mid X;\theta) = 1-h_{\theta} (x) $
Equivalent to:
$P (y \mid x;\theta) = (H_{\theta} (x)) ^{y} (1-h_{\theta} (x)) ^{1-y}$
Y take 0 or 1
If there is a M training sample, you can write the probability formula of the parameter:
$L (\theta) = P (\vec{y} \mid X; \theta) $
$L (\theta) = \prod_{i=1}^{m}p (y^{(i)}\mid x^{(i)};\theta) $
$L (\theta) = \prod_{i=1}^{m} (H_{\theta} (x^{(i))) ^{y^{(i)}} (1-h_{\theta} (x^{(i)})) ^{1-y^{(i)}} $
To facilitate the solution, take the logarithm first:
$\iota (\theta) = LOGL (\theta) $
$\iota (\theta) = \sum_{i=1}^{m}y^{(i)}logh (x^{(i)}) + (1-y^{i ()}) log (1-h (x^{(i)})) $
For a single training sample, the gradient is solved:
$\frac{\partial}{\partial \theta_{j}}\iota (\theta) = (-h_{\theta}^{x}) x_{j}$
Therefore, the stochastic gradient descent method is:
$\THETA_{J}: = \theta_{j} + \alpha (y^{(i)}-H_{\theta} (x^{(i)}) x_{j}^{(i)}$
The upper-style is much like the LMS update rule, and when you learn the GLM model, you know why.
Chapter II Classification and logistic regression