Talk about Focal loss and its reverse propagation _focal

Source: Internet
Author: User

Focal loss:focal Loss for dense Object detection paper link: https://arxiv.org/abs/1708.02002

As we all know, the current target detection (objece detection) algorithm is mainly divided into two main categories: Two-stage detector and One-stage detector. Two-stage detector mainly includes rcnn, FAST-RCNN, faster-rcnn and RFCN, one-stage detector and SSD, the former is high precision but slow detection speed, the latter is low precision but fast.

For Two-stage detector, the proposals is usually generated by RPN and rcnn by proposals and classifcation Box bounding. One advantage of this is that it facilitates the feature alignment between the sample and the model, thus making classification and bounding box regression easier; In addition, there is an imbalance between positive and negative samples in RPN and RCNN, RPN directly limit positive and negative samples of the proportion of 1:1, for fixed rpn_batch_size, the case is not enough to use negative samples to fill, rcnn is a direct limit of positive and negative samples of the ratio of 1:3 or the use of Ohem.

For One-stage detector, the feature alignment between the sample and the model can only be realized through the reception field and predicted directly by regression, there is the problem of this serious positive and negative sample data imbalance (1:1000), The proportion of negative samples is too high to occupy the majority of the loss, and most of them are easy to classify, which makes the training of the model move in the direction of the hope. The author considers that the serious imbalance of this data is the main cause of the low precision of one-stage detector, so the focal loss is proposed to solve the problem.

Through manual control of positive and negative sample ratio or ohem can solve the problem of data imbalance, but both of these methods are more brutal, the use of this "one-size-fits-all" approach may be some hard examples ignored. Therefore, the author proposes a new loss function, focal Loss, which does not ignore any samples, while at the same time makes the model training more focused on the hard examples. Simply explain the principle of the following focal loss.

The Focal loss is improved on the basis of standard cross entropy loss. Taking two classification as an example, the standard cross entropy loss function is


For the category imbalance, for different categories of loss contribution to control, that is, add a control weight αt, then the improved balanced cross entropy loss for


But balanced cross entropy loss had no way to focus on hard examples when training. In fact, the larger the probability of the correct classification of a sample, the more likely it is that the sample will be more easily divided. So, the final focal loss is


The Focal loss exist in these two parameters (Hyperparameter), with different αt and Gamma, as shown in Figure 1 for the loss. From Figure 4, we can see that the effect of γ changes on the cumulative error of positive (Forground) samples is not significant, but the cumulative error of negative (background) samples is very large (γ=2, nearly 99% of the background sample loss is very small).

Next, in order to verify the focal Loss, the author proposes a new One-stage detector architecture retinanet, using RESNET_FPN, and scales to 15, as shown in Figure 3


Table 1 gives some experimental results of the retinanet and Focal loss, from which we can see an increase of α-class equalization, an AP increase of 0.9, and a gamma control, AP up to 37.8.Focal local compared to ohem,ap up 3.2. As can be seen from table 2, the increase in training time and the adoption of scale Jitter,ap eventually reached 39.1.



The principle analysis and experimental results of the focal loss are now over, so let's look at the reverse propagation of the focal loss. First, the inverse gradient propagation formula of Softmax activation is given, which is


With the inverse gradient propagation formula of Softmax activation, according to the chain rule, the inverse gradient propagation formula of Focal loss is


Conclusion: Focal loss is mainly used to solve the problem of data imbalance, which can be regarded as an extension of Ohem algorithm. The authors use the focal loss for One-stage detector, but in practice this method of resolving data imbalances is also effective for two-stage detector.

RELATED links:

Focal Loss

Focal Loss Paper Reading notes

Hand Play example step by step take you understand Softmax function and related derivation process

How to evaluate kaiming focal Loss for dense Object detection.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.