Weakly supervised deep Detection Networks,hakan Bilen,andrea Vedaldi
Https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Bilen_Weakly_Supervised_Deep_CVPR_2016_paper.pdf
Highlight
- The problem of weak supervisory detection is interpreted as proposal sorting, and a comparatively correct sort is obtained by comparing all proposal categories, which is consistent with the calculation method of evaluation standard in testing.
Related work
The MIL strategy results in a non-convex optimization problem; In practice, solvers tend to get stuck in local optima
Such the quality of the solution strongly depends on the initialization.
- Developing various initialization strategies [19, 5, 32, 4]
- [Propose] a self-paced learning strategy
- [5] Initialize object locations based on the objectness score.
- [4] propose a multi-fold split of the training data to escape local optima.
- On regularizing the optimization problem [31, 1].
- [+] Apply Nesterov ' s smoothing technique to the latent SVM formulation
- [1] Propose a smoothed version of MIL that softly labels object instances instead of choosing, the highest scoring ones.
- Another line of the based for WSD is the identifying the similarity between image parts.
- [Propose a] discriminative graph-based algorithm that selects a subset of Windows such so each window was connected to Its nearest neighbors in positive images.
- [Extend] This method to discover multiple co-occurring part configurations.
- [approx] propose an iterative technique/applies a latent semantic clustering via latent semantic analysis (PLSA)
- [2] propose a formulation that jointly learns a discriminative model and enforces the similarity of the selected object re Gions via a discriminative convex clustering algorithm
Method
The method used in this paper is very simple and easy to understand, mainly divided into the following three parts:
- Enter the results of the feature and region proposal into the spatial pyramid pooling layer, take out the area-dependent eigenvectors, and enter two FC tiers
- Category: FC layer output by the Softmax classifier, the region category is calculated
- Detection: FC layer output through the Softmax classifier, unlike the above is normalized when not with the category normalization, but with all areas of the fraction to be normalized, through the comparison between regions to find the region containing the most information of this category
- A region R belongs to a Class C score, which is the product of the latter two parts
- Full-image category score for all regions that belong to the category of the sum of the scores
The loss function of the training is as follows
The last item is a calibration item (slightly changed according to understanding, feeling the paper notation a bit of a problem), the purpose is to narrow the feature distance constrained by the smoothness of the solution (i.e., the proposal with the correct solution should also get high score).
Experimental results
In this paper, 4 kinds of model:s (vgg-f), M (vgg-m-1024), L (VGG-VD16) and Ens (the first three models of ensemble) are given according to the different basenet.
- Ablation:
- Object proposal
- Baseline map:selective Search S 31.1%, M 30.9%, L 24.3%, Ens. 33.3%
- Edge Box: +0~1.2%
- Edge Box + Edge box score: +1.8~5.9%
- Spatial Regulariser (compared with edge box + Edge box score) MAP +1.2~4.4%
- VOC2007
- MAP on Test:s +2.9%, M +3.3%, L +3.2%, Ens. +7.7% compared with [approx] + context
- Corloc on Trainval:s +5.7%, M +7.6%, L +5%, Ens. +9.5% compared with [36]
- Classification AP on test:s +7.9% compared with vgg-f, M +6.5% compared with vgg-m-1024, L +0.4% compared with vgg-vd16, Ens. -0.3% compared with vgg-vd16
- VOC2010
- MAP on test: +8.8% compared with [4]
- Corloc on Trainval: +4.5% compared with [4]
Disadvantages
One obvious drawback of this article is that only one occurrence of a class of objects in a graph is considered (only the maximum and the surrounding boxes are limited in regulariser), which is also reflected in the failure cases in the text.
[CVPR 2016] Weakly supervised deep Detection networks paper notes