Two very common non-linear elements: rectified linear units (Relus) and leaky Relus
We select binary hinge loss for two classification
For multiple classifications, we can define multiclass hinge Loss
Define Ω as the parameter space for the network, L(ω) is loss.
Since we chose the Relu nonlinear element as the loss, then L(ω) is piecewise linear . For the parameter space, we can divide it into one,
Divided into a finite open cells ωu and a boundary N, the loss function L(ω) is inside the cell is smooth and is not micro on the boundary.
Below we limit loss to one cell ωu and loss own multilinear form. Since the Multilinear form is harmonic , strong maximum principle know that the extremum point must be Nat the boundary . In other words, the ReLU neural network with hinge loss L(ω) does not exist with differentiable local extremum points .
So far, we can tell that there are two cases of local extremum,
Type I (Flat). Local extremum in cell, loss is a constant value.
Type II (Sharp). The local extremum is on the boundary N .
Main Result 1. in the Type II local extremum point ,L(ω) >0.
In other words, if there is an extremum of 0, then the Type II extremum point is sub-optimal .
If we consider a more general situation: fully connected networks with leaky ReLU nonlinearities. So we have the following results,
Main Result 2. at Type I Local extremum point ,L(ω) =0. In the Type II local extremum point ,L(ω) >0.
In the case of extreme value 0 ,flat local minima are optimal , and sharp local minima are sub-optimal . If there is no extremum 0, all local extremum points are sharp .
Not to be continued ...
The multilinear Structure of ReLU Networks