Target detection--pvanet:deep But lightweight neural Networks for real-time object detection

Source: Internet
Author: User

Reprinted from:

In this paper, a variety of target detection for the problem, combined with the current technical achievements, to achieve a good result.

We obtained solid results on well-known object detection benchmarks:81.8% MAP (mean average precision) on VOC2007 and 82. 5% MAP on VOC2012 (2nd place) while taking only 750ms/image on Intel i7-6700k CPU with a single core and 46ms/image on NV Idia Titan X GPU. Theoretically, our network requires only 12.3% of the computational cost compared to RESNET-101, the winner on VOC2012

For the overall detection framework: CNN feature extraction + region proposal + RoI classification
We mainly optimize feature extraction because region proposal part is faster and does not occupy any time. The classification section can effectively compress the model complexity through SVD. Our design principle is: Less feature types, more points layer. Less channels with more layers. The design network uses concatenated Relu, Inception, and Hypernet, which are trained with batch normalization, residual connections, and learning rate Duling based on plateau detection.

2 Details on Network
2.1 C.relu:earlier Building blocks in feature generation

C.relu is mainly used in the first several layers of the convolution, reducing the output channel half, and then by taking the negative to get the corresponding output channel, which will increase the speed of one times.
C.relu reduces the number of output channels by half, and doubles it through simply concatenating the same outputs with Negatio N, which leads to 2x speed-up of the early stage without the losing.

2.2 Inception:remaining Building blocks in feature generation

Inception for small targets and large targets can be a good solution, mainly by controlling the volume of nuclear dimensions to experiment.

2.3 Hypernet:concatenation of Multi-scale intermediate outputs

It is mainly to combine the convolution feature layer of different scales. Multi-scale target detection is possible.

2.4 Deep Network Training

Here we join residual structures between the inception layers. Add the Batch normalization layer before all the Relu activation layers. Based on plateau detection dynamic control learning rate.

3 Faster r-cnn with our feature extraction network
We combine the convolution 3_4 layer (sample), the convolution layer 4_4 the convolution layer 5_4 (upper sampling) to the 512-channel multi-scale output feature as the input of the faster R-CNN model.
Three Intermediate outputs from Conv3_4 (with down-scaling), Conv4_4, and Conv5_4 (with up-scaling) are to the 512-channel multi-scale Output Features

4 Experimental results

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.