Focal loss:focal Loss for dense Object detection paper link: https://arxiv.org/abs/1708.02002
As we all know, the current target detection (objece detection) algorithm is mainly divided into two main categories: Two-stage detector and One-stage detector. Two-stage detector mainly includes rcnn, FAST-RCNN, faster-rcnn and RFCN, one-stage detector and SSD, the former is high precision but slow detection speed, the latter is low precision but fast.
For Two-stage detector, the proposals is usually generated by RPN and rcnn by proposals and classifcation Box bounding. One advantage of this is that it facilitates the feature alignment between the sample and the model, thus making classification and bounding box regression easier; In addition, there is an imbalance between positive and negative samples in RPN and RCNN, RPN directly limit positive and negative samples of the proportion of 1:1, for fixed rpn_batch_size, the case is not enough to use negative samples to fill, rcnn is a direct limit of positive and negative samples of the ratio of 1:3 or the use of Ohem.
For One-stage detector, the feature alignment between the sample and the model can only be realized through the reception field and predicted directly by regression, there is the problem of this serious positive and negative sample data imbalance (1:1000), The proportion of negative samples is too high to occupy the majority of the loss, and most of them are easy to classify, which makes the training of the model move in the direction of the hope. The author considers that the serious imbalance of this data is the main cause of the low precision of one-stage detector, so the focal loss is proposed to solve the problem.
Through manual control of positive and negative sample ratio or ohem can solve the problem of data imbalance, but both of these methods are more brutal, the use of this "one-size-fits-all" approach may be some hard examples ignored. Therefore, the author proposes a new loss function, focal Loss, which does not ignore any samples, while at the same time makes the model training more focused on the hard examples. Simply explain the principle of the following focal loss.
The Focal loss is improved on the basis of standard cross entropy loss. Taking two classification as an example, the standard cross entropy loss function is
For the category imbalance, for different categories of loss contribution to control, that is, add a control weight αt, then the improved balanced cross entropy loss for
But balanced cross entropy loss had no way to focus on hard examples when training. In fact, the larger the probability of the correct classification of a sample, the more likely it is that the sample will be more easily divided. So, the final focal loss is
The Focal loss exist in these two parameters (Hyperparameter), with different αt and Gamma, as shown in Figure 1 for the loss. From Figure 4, we can see that the effect of γ changes on the cumulative error of positive (Forground) samples is not significant, but the cumulative error of negative (background) samples is very large (γ=2, nearly 99% of the background sample loss is very small).
Next, in order to verify the focal Loss, the author proposes a new One-stage detector architecture retinanet, using RESNET_FPN, and scales to 15, as shown in Figure 3
Table 1 gives some experimental results of the retinanet and Focal loss, from which we can see an increase of α-class equalization, an AP increase of 0.9, and a gamma control, AP up to 37.8.Focal local compared to ohem,ap up 3.2. As can be seen from table 2, the increase in training time and the adoption of scale Jitter,ap eventually reached 39.1.
The principle analysis and experimental results of the focal loss are now over, so let's look at the reverse propagation of the focal loss. First, the inverse gradient propagation formula of Softmax activation is given, which is
With the inverse gradient propagation formula of Softmax activation, according to the chain rule, the inverse gradient propagation formula of Focal loss is
Conclusion: Focal loss is mainly used to solve the problem of data imbalance, which can be regarded as an extension of Ohem algorithm. The authors use the focal loss for One-stage detector, but in practice this method of resolving data imbalances is also effective for two-stage detector.
RELATED links:
Focal Loss
Focal Loss Paper Reading notes
Hand Play example step by step take you understand Softmax function and related derivation process
How to evaluate kaiming focal Loss for dense Object detection.