Highlight
- A good name gives the reader a reason to start reading
- The positioning method of global max pooling over sliding window is worthy of reference
Method
The aim of this paper is to design a weak supervised classification network, and note that the aim of this paper is to promote classification. Because it is the 2015 article, the method is relatively simple and primitive.
Following three modifications to a classification network.
- Treat the fully connected layers as convolutions, which allows us to deal with nearly arbitrary-sized images as input.
- The aim is to apply the network to bigger images in a sliding window manner thus extending its output to Nxmxk, where n a nd m denote the number of sliding window positions in the X-and y-direction in the image, respectively.
- 3xhxw-> convs-> KXMXN (K:number of classes)
- Explicitly search for the highest scoring object position in the image by adding a single global max-pooling layer At the output.
- Kxmxn-> kx1x1
- The max-pooling operation hypothesizes the location of the "object" in the "the position with the maximum score
- Use a cost function this can explicitly model multiple objects present in the image.
Because there may be many objects in the graph, the multi-class classification loss is not applicable. The author sees this task as multiple two classification questions, loss function and classification score as follows
Training
Muti-scale Test
Experiment
Classification
- MAP on VOC test: +3.1% compared with [56]
- MAP on VOC test: +7.6% compared with kx1x1 output and single scale training
- MAP on VOC: +2.6% compared with RCNN
- MAP on COCO 62.8%
Localisation
- Metric:if the maximal response across scales falls within the ground truth bounding box of an object of the same class WI Thin pixels tolerance, we label the predicted location as correct. If not, then we count the response as a false positive (it hits the background), and we also increment the false negative C Ount (no object was found).
- Metric on VOC -0.3% val: Compared with rcnn
- MAP on COCO 41.2%
Disadvantages
- The metric of location evaluation is not authoritative
- Will max pooling change to average pooling be better for multiple instance
[CVPR2015] is object localization for free?–weakly-supervised learning with convolutional neural networks paper notes