Boundaries: Perception
In logistic regression, the probability of $p (Y=1 \mid X;\theta) $ is modeled by $h_{\theta} (x) =g (\theta^{t}x) $. When $h_{\theta} (x) \geq 0.5$, the output of the prediction X is 1. Or, when $\theta_{x} \geq 0$, the output of the prediction X is 1. So when $\theta_{x} \gg 0$, we think the accurate predictive output is 1.
For example, point A can be accurately predicted as x, and the C point is too close to the decision boundary to make accurate predictions. Therefore, we want to find a decision boundary so that accurate predictions can be made based on training samples.
Mark
In order to discuss SVM more conveniently, a new tagging method is introduced. For label y and feature X in the two classification problem, we make $y \in { -1,1}$ to represent the label of the class, unlike the previous linear classifier using the $\theta$ vector, where the parameter $w,b$, the classifier is as follows:
$h _{w,b} (x) =g (w^{t}x+b) $
The $b$ in the formula corresponds to the previous $\theta_{0}$, $w $ equivalent to $[\theta_{1},..., \theta_{n}]^{t}$. And when $z \geq 0$ there is $g (z) =1$, $z < 0$ with $g (z) =0$
Functions and geometric boundaries
Defines the function boundary for $ (w,b) $:
$\hat {\gamma}^{(i)}=y^{(i)} (W^{T}X+B) $
If $y^{(i)}=1$, to a large function boundary, you need $w^{t}x+b$ to be a large positive number.
For the above given $g$, when we change $w$ to $2w$, $b $ becomes $2b$, $g (w^{t}x+b) $ becomes $g (2W^{T}X+2B) $, which does not alter $H_{W,B} (x) $, because it depends on the $w^{t}x+b$ symbol instead of the amplitude. So we can arbitrarily increase the function boundary.
Intuitively, it may make sense to introduce some regularization conditions, such as $\left \| W \right \|_{2}=1$, then $ (w,b) $ becomes $ (w/\left \| w \right \|_{2},b/\left \| w \right \|_{2}) $.
Given training set $s={(x^{(i)},y^{(i)}), I=1,..., m}$, defines the smallest boundary function in w,b for a separate sample (personal understanding is the minimum distance from all sample points to the decision boundary):
$\hat {\gamma} = \underset{i=1,..., m}{min}\hat {\gamma}^{(i)}$
Next, we discuss the geometry boundary:
For example, $ (w,b) $ corresponds to the decision boundary. $w $ orthogonal to the split hyper-plane. Point A represents the input of the $x^{(i)}$ labeled $y^{(i)}=1$, the distance from the point to the decision boundary is $\gamma^{(i)}$, expressed as a line segment AB.
So how do you calculate the value of $\gamma^{(i)}$? $w/| | w| | $ is a unit vector in the same direction as W. Since point a represents $x^{(i)}$, then we can find that the B point can be $x^{(i)}-\gamma^{(i)} \cdot w/| | w| | $ is indicated as follows:
For all point X on the decision boundary there is $w^{t}x+b=0$, so:
$W ^{t} (x^{(i)}-\gamma^{(ii)}\frac{w}{| | w| |}) +b=0$
Solution to:
$w ^{t} (x^{(i)}-\gamma^{(i)} \frac{w}{| | 2| |}) +b=0$
$w ^{t}x^{(i)}+b=w^{t}\gamma^{(i)} \frac{w}{| | w| |} $
Because $v^{(i)}$ is a numeric value, $w $ is a column vector, then:
$w ^{t}x^{(i)}+b=\gamma^{(i)} w^{t}\frac{w}{| | w| |} $
$w ^{t}x^{(i)}+b=\gamma^{(i)} | | w| | $
This is the result of a point on the side of the Y=1, the more general formula is as follows:
Optimal boundary classifier
Fifth chapter support Vector machine